Sana

Sana

Sana is a highly efficient text-to-image system that combines a 32× deep compression autoencoder, linear-attention DiT and a small decoder-only LLM text encoder. It produces 4096×4096 images with strong text alignment, competitive FID and GenEval scores, and can run a 0.6B model on a 16 GB laptop GPU at under one second for 1024×1024. Flow-DPM-Solver and smart caption selection cut sampling steps and training cost.

Overview

Sana is NVIDIA and MIT’s text-to-image framework that uses a linear Diffusion Transformer and a deep compression autoencoder to generate up to 4K images efficiently, matching much larger models while running on a laptop GPU.

🖼️Image generation

About NVIDIA

Industry: Computer Hardware Manufacturing

Company Size: 36000

Location: Santa Clara, California, US

Website: nvidia.com

View Company Profile

Tools using Sana

No tools found for this model yet.

Last updated: February 25, 2026

Search

Overview

About NVIDIA

Tools using Sana

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: