TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Aya Vision

By Cohere
New Text Gen 3
Released: March 4, 2025

Overview

Aya Vision is the multimodal sibling of the Aya family. It processes images alongside text prompts and produces grounded text answers, designed for tasks like document OCR, chart/diagram analysis, UI/screenshot reasoning, and visual Q&A across multiple languages.

Description

Aya Vision extends Aya Expanse into the visual domain by combining a strong image encoder with Aya’s multilingual language backbone. The model can “see and read” documents, tables, dashboards, and photos, and then generate accurate, step-by-step explanations or structured outputs such as JSON. It’s tuned for OCR, layout understanding, small-text recognition, and cross-image referencing, which makes it useful for document automation, chart interpretation, screenshot/UI analysis, and multimodal retrieval-augmented generation.

In production, Aya Vision integrates neatly with tool/function calling and schema-based outputs so it can act as part of an agent stack or RAG pipeline. Long-context support helps it handle multi-page documents and image sets, while multilingual training ensures broad coverage. Teams typically deploy it for enterprise copilots that must interpret both text and visuals, analytics over charts and dashboards, accessibility features that describe images, and developer assistants that reason directly from screenshots. If you need multimodal reasoning with the same reliable instruction-following style as Aya Expanse, Aya Vision is the natural choice.

About Cohere

Visually guide customers over phone or live chat with instant, no-download cobrowsing.

Industry: Software Development
Company Size: 11-50
Location: New York, US
Website: cohere.io
View Company Profile

Related Models

Last updated: October 14, 2025