TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

olmOCR

By Ai2
olmOCR provides both a toolkit and LLM-sized OCR models for high-fidelity document linearization. It focuses on turning messy PDFs or scanned pages into structured text suitable for downstream LLMs, with attention to layout, tables, math, and handwritten content. The project ships several 7B and 8B image-to-text models plus a pipeline that outputs markdown, making it a strong backbone for building RAG systems or datasets from large scientific and business corpora.
New Image Gen 4
Released: October 25, 2025

Overview

olmOCR is AllenAI’s open-source document recognition pipeline and model family that converts PDFs and images into clean text, preserving reading order, tables, equations, and handwriting.

About Ai2

We are a Seattle based non-profit AI research institute founded in 2014 by the late Paul Allen. We develop foundational AI research and innovation to deliver real-world impact through large-scale open models, data, robotics, conservation, and beyond.

Website: allenai.org
View Company Profile

Tools using olmOCR

No tools found for this model yet.

Last updated: February 12, 2026
0 AIs selected
Clear selection
#
Name
Task