TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

MM1.5

Text Gen 3
Released: September 30, 2024

Overview

MM1.5 is Apple Research’s refinement of the MM1 multimodal recipe. It keeps the same encoder–decoder architecture but upgrades data curation, image resolution, and multi-image/document training, yielding stronger OCR, layout understanding, chart/diagram reasoning, and more grounded visual answers.

Description

MM1.5 builds directly on Apple’s MM1 blueprint, preserving a strong vision encoder connected to a language model through cross-attention while focusing the gains on data and training strategy rather than a wholesale architectural change. The team tightens curation, increases image resolution, and emphasizes interleaved sequences where text and multiple images appear together. That mix—combined with document-centric training and careful cropping/tiling—teaches the model to read fine print, follow complex layouts, and maintain references across several images without losing context.

With a light layer of instruction tuning, MM1.5 follows multimodal prompts more reliably and grounds explanations in specific regions of a page or frame. It delivers clearer step-by-step reasoning for tables, charts, and diagrams, improves robustness to screenshot UI patterns, and remains a transparent, reproducible recipe aimed at showing which ingredients actually move practical performance. The result is a cleaner path to assistants and tools that can genuinely “look, read, and reason” for documents, dashboards, and everyday visual QA.

About Apple Inc.

Industry: Technology, Information and Media
Company Size: 12000
Location: Cupertino, California, United States
Website: apple.com
View Company Profile
Last updated: October 15, 2025
0 AIs selected
Clear selection
#
Name
Task