Go to 🎲 Random
🎲
Storytelling game
(79)
💬
Philosophical conversations
(64)
🎮
Game strategies
(50)
🗣️
English communication improvement
(47)
🎮
Gaming coach
(36)
🎨
Artistic guidance
(35)
🗣
Conversational management
(35)
🧘
Stoic advice
(28)
💬
Conversation support
(26)
🔍
Tech insights
(26)
💡
Coding help
(25)
🌱
Gardening
(25)
🔧
Vehicle diagnosis
(25)
🌍
Immigration advice
(23)
🛠
DIY
(22)
🏋️
Workout planning
(22)
😱
Horror images
(21)
❓
Questions generation
(21)
🎯
Strategic advice
(21)
🎤
Speeches
(20)
OCR
taaft.com/ocr
6,598 subscribers
There is 1 GPT and 1 GPT for OCR.
Subscribe
▼ State of the art
Free mode
100% free
Freemium
Free Trial
Specialized tools 1
-
Share
Transforming handwritten notes into digital formats.
Models 24
-
By BaiduPP-OCRv6 is PaddlePaddle/Baidu’s lightweight universal OCR system for multilingual text detection and recognition across edge, mobile, desktop, and server deployments.NewMultimodalReleased 1d ago
-
By Liquid AILFM2.5-VL-1.6B-Extract is Liquid AI’s larger vision-language extraction model for image-to-JSON structured field extraction.NewMultimodalReleased 8d ago
-
By Zyphra AIZamba2-VL-7B is Zyphra’s open 7B-class vision-language model for single-image and multi-image understanding, visual grounding, OCR, charts, documents, and on-device multimodal applications.NewMultimodalReleased 14d ago
-
By OpenBMBMiniCPM-V-4.6 is OpenBMB’s open-source lightweight multimodal model for efficient image, multi-image, and video understanding on mobile and edge devices.NewMultimodalReleased 1mo ago
-
By Liquid AILFM2.5-VL-1.6B-Extract is Liquid AI’s 1.6B vision-language extraction model for image-to-JSON structured field extraction.NewMultimodalReleased 2mo ago
-
By Liquid AILFM2.5-VL-450M is Liquid AI’s compact vision-language model for structured visual intelligence from edge to cloud. It is built to turn image streams into grounded, actionable outputs in real time, adding object grounding, better instruction following, multilingual image understanding, and function calling support while staying efficient enough for edge hardware.NewImageReleased 2mo ago
-
By DatalabChandra is an OCR model for difficult document extraction tasks. Its GitHub description says it handles complex tables, forms, and handwriting while preserving full layout structure, making it more document-understanding focused than plain text ONewMultimodalReleased 2mo ago
-
By LlamaIndexLiteParse is an open-source document parser focused on fast, lightweight parsing of PDFs into structured outputs.NewTextReleased 2mo ago
-
By TencentPenguin-VL-2B is a compact vision-language model that uses an LLM-based vision encoder to push efficiency limits in multimodal reasoning.MultimodalReleased 3mo ago
-
By BaiduQianfan-OCR is a 4B end-to-end document intelligence vision-language model that performs direct image-to-Markdown conversion and supports prompt-driven document tasks like table extraction, chart understanding, document QA, and key information extraction.MultimodalReleased 3mo ago
-
By TencentPenguin-VL-2B is a compact vision-language model that uses an LLM-based vision encoder to push efficiency limits in multimodal reasoning.MultimodalReleased 3mo ago
-
By LightOnLightOnOCR-1B is a compact vision-language model for OCR that converts document images into clean text and is designed for fast, large-scale document processing.TextReleased 3mo ago
-
By DeepSeekSecond-generation DeepSeek OCR model, “Visual Causal Flow,” aimed at more human-like visual encoding, with dynamic resolution support and strong document-to-Markdown and layout-aware OCR for images and PDFs.TextReleased 4mo ago
-
By NuMindNuMarkdown-8B-Thinking is a reasoning OCR vision-language model fine-tuned from Qwen2.5-VL to convert complex document images into clean Markdown, using intermediate “thinking” tokens to infer layout and tables before generating the final textTextReleased 5mo ago
-
By Liquid AILFM2-VL-3B is a 3B vision-language model that reads images with text and answers in natural language or structured JSON. It handles OCR, charts, tables, and screenshots with long context and low-latency streaming, making it practical for multimodal RAG and assistants.TextReleased 7mo ago
-
By DeepSeekLLM-centric OCR model using “Contexts Optical Compression” to explore visual-text compression and provide fast streaming and batch OCR for images and PDFs via vLLM and Transformers runtimes.TextReleased 7mo ago
-
By BaiduQianfan-VL 70B is Baidu’s large vision-language model on the Qianfan platform. It ingests images (docs, charts, screenshots, photos) with text and produces grounded answers, featuring strong OCR and layout understanding, long context, tool/function calling, streaming, and reliable JSON outputs for multimodal RAG and enterprise apps.TextReleased 8mo ago
-
By MoondreamMoondream 3 Preview is a compact frontier-oriented vision-language model built for fast visual reasoning, grounding, OCR, object detection, pointing, and structured output. It uses a 9B MoE architecture with 2B active parameters and extends context length to 32K, aiming to deliver strong real-world vision performance while staying efficient and inexpensive to run.MultimodalReleased 8mo ago
-
By Caldera LabsCommand A Vision is Cohere’s multimodal instruction model that pairs text and image understanding. It accepts images plus text prompts and outputs structured, step-by-step text answers. It’s tuned for enterprise workflows like document OCR, chart/diagram reasoning, screenshot/UI analysis, and tool or function calling.TextReleased 10mo ago
-
By NaverKanana-1.5-v-3B is a 3B-parameter vision–language model in Kakao’s Kanana line. It can process both images and text prompts, outputting grounded answers in natural language or structured JSON. It’s optimized for lightweight multimodal assistants and enterprise applications that need efficiency with visual reasoning.TextReleased 10mo ago
-
By AppleFastVLM is Apple’s lightweight vision-language model built for real-time multimodal apps. It ingests images alongside text and returns grounded answers fast—OCR, charts/diagrams, screenshots, and general visual QA—while supporting long context, tool/function calling, and structured JSON outputs.TextReleased 10mo ago
-
By Caldera LabsAya Vision is the multimodal sibling of the Aya family. It processes images alongside text prompts and produces grounded text answers, designed for tasks like document OCR, chart/diagram analysis, UI/screenshot reasoning, and visual Q&A across multiple languages.TextReleased 1y ago
-
By Mistral AIPixtral Large is Mistral’s flagship vision-language model. It takes images plus text and returns grounded, step-by-step answers—great for document OCR, charts/diagrams, UI screenshots, and general visual QA—with long-context support, tool/function calling, and reliable JSON outputs.TextReleased 1y ago
-
PaliGemma is Google’s open vision-language model that accepts images plus text and outputs text for captioning, visual question answering, OCR-style tasks, and detection.TextReleased 2y ago
Discussion(2)
📜
OCR
twinkle c
🛠️ 1 tool
🙏 4 karma
4mo ago
@DeepRead.Tech
We are a team of founders who worked before in data extraction, with traditional manual review of each document to check accuracy / errors. We are so excited about AI and the new LLM models because we have created DeepRead using those and the accuracy is 95% and we flag uncertain fields so no one has to manually review entire documents (whew), only the exceptions! We hope you try it out and please share your review, comments and feedback. Looking forward to hearing from users!
4
Reply
Share
Edit
Delete
Report
📜
OCR
GNN Murthy
1y ago
Languages supported for handwriting transcription?
Reply
Share
Edit
Delete
Report
×
×
Post
➤
MongoDB - Build AI That Scales
