FastVLM
Overview
FastVLM is Apple’s lightweight vision-language model built for real-time multimodal apps. It ingests images alongside text and returns grounded answers fast—OCR, charts/diagrams, screenshots, and general visual QA—while supporting long context, tool/function calling, and structured JSON outputs.
