TAAFT
Free mode
100% free
Freemium
Free Trial
Create tool

MM1.5

By Apple
New Text Gen 3
Released: September 30, 2024

Overview

MM1.5 is Apple Research’s refinement of the MM1 multimodal recipe. It keeps the same encoder–decoder architecture but upgrades data curation, image resolution, and multi-image/document training, yielding stronger OCR, layout understanding, chart/diagram reasoning, and more grounded visual answers.

Description

MM1.5 builds directly on Apple’s MM1 blueprint, preserving a strong vision encoder connected to a language model through cross-attention while focusing the gains on data and training strategy rather than a wholesale architectural change. The team tightens curation, increases image resolution, and emphasizes interleaved sequences where text and multiple images appear together. That mix—combined with document-centric training and careful cropping/tiling—teaches the model to read fine print, follow complex layouts, and maintain references across several images without losing context.

With a light layer of instruction tuning, MM1.5 follows multimodal prompts more reliably and grounds explanations in specific regions of a page or frame. It delivers clearer step-by-step reasoning for tables, charts, and diagrams, improves robustness to screenshot UI patterns, and remains a transparent, reproducible recipe aimed at showing which ingredients actually move practical performance. The result is a cleaner path to assistants and tools that can genuinely “look, read, and reason” for documents, dashboards, and everyday visual QA.

About Apple

No company description available.

View Company Profile

Related Models

Last updated: October 3, 2025