PaliGemma 2
For builders, it’s instruction-tuned for reliable formatting, supports function/tool calling for agent workflows (e.g., crop → read → reason), and integrates cleanly with RAG so answers can cite or reference specific regions. It’s lightweight enough to run on a single modern GPU, with 8/4-bit quantization and LoRA/full fine-tuning options to adapt to domains (invoices, forms, dashboards, manuals). Typical uses include enterprise document automation and extraction, analytics over charts/dashboards, accessibility (image descriptions), and developer assistants that reason directly from screenshots—bringing practical, efficient visual understanding to the Gemma ecosystem.
Overview
PaliGemma 2 is Google’s next-gen open-weight vision-language model in the Gemma family. It takes images (docs, charts, screenshots, photos) plus text and answers in text—with stronger OCR, grounded visual reasoning, multi-image understanding, and easy fine-tuning for real apps on a single GPU or edge devices.
About Google
At Google, we think that AI can meaningfully improve people's lives and that the biggest impact will come when everyone can access it.
Tools using PaliGemma 2
No tools found for this model yet.
