TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Manzano

By Apple
Manzano couples a hybrid image tokenizer with a unified autoregressive LLM and diffusion decoder so the same model can both understand and generate images. A single vision encoder feeds two light adapters that output continuous embeddings for image-to-text tasks and discrete tokens for image generation in a shared semantic space. Trained on both understanding and generation data, Manzano reduces the usual trade off between these abilities and achieves state-of-the-art results among unified models, competitive with specialist systems, especially on text rich benchmarks.
New Multimodal Gen 3
Released: January 21, 2026

Overview

Manzano is Apple’s unified multimodal model that shares a hybrid vision tokenizer for both image understanding and text-to-image generation, using one autoregressive LLM plus a diffusion decoder to reach state-of-the-art unified performance.

About Apple

Industry: Technology, Information and Media
Company Size: 12000
Location: Cupertino, California, US
Website: apple.com
View Company Profile

Tools using Manzano

No tools found for this model yet.

Last updated: January 21, 2026
0 AIs selected
Clear selection
#
Name
Task