TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

PaliGemma 2

By Google
Model family: Gemma
PaliGemma 2 pairs an upgraded vision encoder with a compact Gemma decoder to “look, read, and reason.” It ingests one or more images alongside a prompt and produces grounded, step-by-step text—captions, answers, summaries, or structured outputs (Markdown/JSON). Compared with the original PaliGemma, it improves layout-aware OCR, table/chart interpretation, and screenshot/UI analysis, and handles higher-resolution inputs via tiling/cropping strategies for dense documents.
For builders, it’s instruction-tuned for reliable formatting, supports function/tool calling for agent workflows (e.g., crop → read → reason), and integrates cleanly with RAG so answers can cite or reference specific regions. It’s lightweight enough to run on a single modern GPU, with 8/4-bit quantization and LoRA/full fine-tuning options to adapt to domains (invoices, forms, dashboards, manuals). Typical uses include enterprise document automation and extraction, analytics over charts/dashboards, accessibility (image descriptions), and developer assistants that reason directly from screenshots—bringing practical, efficient visual understanding to the Gemma ecosystem.
New Text Gen 3
Released: December 5, 2024

Overview

PaliGemma 2 is Google’s next-gen open-weight vision-language model in the Gemma family. It takes images (docs, charts, screenshots, photos) plus text and answers in text—with stronger OCR, grounded visual reasoning, multi-image understanding, and easy fine-tuning for real apps on a single GPU or edge devices.

About Google

At Google, we think that AI can meaningfully improve people's lives and that the biggest impact will come when everyone can access it.

Industry: Research
Company Size: 182.000-190.000
Location: Mountain View, CA, US
Website: ai.google
View Company Profile

Tools using PaliGemma 2

Last updated: February 18, 2026
0 AIs selected
Clear selection
#
Name
Task