TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

olmOCR

By Ai2
olmOCR provides both a toolkit and LLM-sized OCR models for high-fidelity document linearization. It focuses on turning messy PDFs or scanned pages into structured text suitable for downstream LLMs, with attention to layout, tables, math, and handwritten content. The project ships several 7B and 8B image-to-text models plus a pipeline that outputs markdown, making it a strong backbone for building RAG systems or datasets from large scientific and business corpora.
New Image Gen 4
Released: October 25, 2025

Overview

olmOCR is AllenAI’s open-source document recognition pipeline and model family that converts PDFs and images into clean text, preserving reading order, tables, equations, and handwriting.

About Ai2

Ai2 is a 501(c)(3) non-profit AI research institute founded in 2014 by the late Paul Allen (Microsoft co-founder), dedicated to conducting high-impact, open AI research and engineering for the common good, including open language models (OLMo), scientific AI tools (Semantic Scholar, Asta), environmental AI platforms, and embodied robotics research.

Industry: Artificial Intelligence
Company Size: 201-500
Location: Seattle, Washington, US
Website: allenai.org
View Company Profile

Tools using olmOCR

No tools found for this model yet.

Last updated: February 25, 2026
0 AIs selected
Clear selection
#
Name
Task