TAAFT
Free mode
100% free
Freemium
Free Trial
Deals
Create tool

Latent Diffusion

By CompVis
New Image Gen 2
Released: April 1, 2022

Overview

Latent Diffusion by CompVis is a text to image method that runs the diffusion process in a compressed latent space for speed and quality. A VAE encodes and decodes images, a U Net denoiser operates on latents, and a text encoder guides generation. It enables fast synthesis, image to image, and inpainting on modest hardware.

Description

Latent Diffusion compresses an image into a latent representation with a variational autoencoder, then performs denoising in that space while conditioning on text embeddings through cross attention. After the iterative denoise, the decoder reconstructs the final image. Working on latents cuts memory and compute while preserving detail, so higher resolutions and larger batches are practical even on consumer GPUs. The same architecture supports conditioning from prompts, masks, and reference images, which makes tasks like image to image translation, style transfer, inpainting, and super resolution feel unified. The approach became the backbone for Stable Diffusion and many open tools, since it is easy to fine tune, adapt with lightweight modules, and integrate into creative pipelines without heavy infrastructure.

About CompVis

CompVis is a research group focusing on computer vision and deep learning.

Industry: Artificial Intelligence
Company Size: N/A
Location: Heidelberg, DE
Website: compvis
View Company Profile

Related Models

Last updated: October 15, 2025