Overview
Latent Diffusion by CompVis is a text to image method that runs the diffusion process in a compressed latent space for speed and quality. A VAE encodes and decodes images, a U Net denoiser operates on latents, and a text encoder guides generation. It enables fast synthesis, image to image, and inpainting on modest hardware.
Description
Latent Diffusion compresses an image into a latent representation with a variational autoencoder, then performs denoising in that space while conditioning on text embeddings through cross attention. After the iterative denoise, the decoder reconstructs the final image. Working on latents cuts memory and compute while preserving detail, so higher resolutions and larger batches are practical even on consumer GPUs. The same architecture supports conditioning from prompts, masks, and reference images, which makes tasks like image to image translation, style transfer, inpainting, and super resolution feel unified. The approach became the backbone for Stable Diffusion and many open tools, since it is easy to fine tune, adapt with lightweight modules, and integrate into creative pipelines without heavy infrastructure.
About CompVis
CompVis is a research group focusing on computer vision and deep learning.
Industry:
Artificial Intelligence
Company Size:
N/A
Location:
Heidelberg, DE
View Company Profile