Overview
MM1.5 is Apple Research’s refinement of the MM1 multimodal recipe. It keeps the same encoder–decoder architecture but upgrades data curation, image resolution, and multi-image/document training, yielding stronger OCR, layout understanding, chart/diagram reasoning, and more grounded visual answers.
Description
MM1.5 builds directly on Apple’s MM1 blueprint, preserving a strong vision encoder connected to a language model through cross-attention while focusing the gains on data and training strategy rather than a wholesale architectural change. The team tightens curation, increases image resolution, and emphasizes interleaved sequences where text and multiple images appear together. That mix—combined with document-centric training and careful cropping/tiling—teaches the model to read fine print, follow complex layouts, and maintain references across several images without losing context.
With a light layer of instruction tuning, MM1.5 follows multimodal prompts more reliably and grounds explanations in specific regions of a page or frame. It delivers clearer step-by-step reasoning for tables, charts, and diagrams, improves robustness to screenshot UI patterns, and remains a transparent, reproducible recipe aimed at showing which ingredients actually move practical performance. The result is a cleaner path to assistants and tools that can genuinely “look, read, and reason” for documents, dashboards, and everyday visual QA.
With a light layer of instruction tuning, MM1.5 follows multimodal prompts more reliably and grounds explanations in specific regions of a page or frame. It delivers clearer step-by-step reasoning for tables, charts, and diagrams, improves robustness to screenshot UI patterns, and remains a transparent, reproducible recipe aimed at showing which ingredients actually move practical performance. The result is a cleaner path to assistants and tools that can genuinely “look, read, and reason” for documents, dashboards, and everyday visual QA.
About Apple
No company description available.
Website:
podcasts.apple.com
Related Models
Last updated: October 3, 2025