Voyage multimodal 3
The model supports screenshots of PDFs, slide decks, tables, charts, and figures, removing the need for heuristic-based parsing or OCR pipelines. In evaluations across 20 multimodal retrieval datasets, it outperforms OpenAI CLIP large and Cohere multimodal v3 by over 40% on table/figure retrieval and over 25% on document screenshot retrieval, while matching voyage-3 on pure text retrieval. Accessible via API with query and document input types for retrieval-optimized embeddings.
Overview
Multimodal embedding model that vectorizes interleaved text and images through a unified transformer encoder. Supports screenshots of PDFs, slides, tables, and figures without complex document parsing. Unlike CLIP-based models, eliminates the modality gap, enabling accurate mixed-modality retrieval across text and visual content.
About Voyage AI
Voyage AI provides best-in-class embedding models and rerankers for search and retrieval over unstructured data, used to power retrieval-augmented generation (RAG) and AI applications. It offers general-purpose, domain-specific (finance, legal, code) and company-specific fine-tuned models. Founded in 2023 and based in Palo Alto, the company was acquired by MongoDB, Inc. in February 2025 and now operates as a MongoDB subsidiary.
View Company Profile