Overview
CogView is THUDM’s text to image system that uses a transformer over discrete image tokens. It is strong with Chinese and English prompts, can render readable Chinese text, and supports image generation, captioning, and simple edits in later versions.
Description
CogView models images as sequences of codebook tokens produced by a vector-quantized autoencoder, then trains a large autoregressive transformer to map prompts to those tokens. This lets the system compose complex scenes, handle bilingual prompts, and render on brand typography in Chinese more reliably than many early peers. The pipeline supports text to image from scratch, image-to-image refinements through partial token replacement, and captioning by reversing the mapping. Later releases improve speed and resolution through staged decoding and better tokenizers, so drafts arrive quickly and upscale cleanly for delivery. In practice, teams use CogView for concept art, posters with Chinese copy, product visuals, and multilingual content where prompt fidelity and typography matter.
About Microsoft
Microsoft is a technology company that offers a wide range of software, cloud computing services, hardware, and artificial intelligence solutions.
Industry:
Software Development
Company Size:
10001+
Location:
Redmond, Washington, US
View Company Profile