TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

PaliGemma

Model family: Gemma
PaliGemma is a lightweight open vision-language model inspired by PaLI-3 and built from open components such as SigLIP and Gemma. It takes images and text prompts as input and outputs text, supporting tasks such as image captioning, image and short-video understanding, object detection, visual question answering, and reading text embedded in images.
Text Gen 7
Released: May 14, 2024

Overview

PaliGemma is Google’s open vision-language model that accepts images plus text and outputs text for captioning, visual question answering, OCR-style tasks, and detection.

About Google DeepMind

Company Size: 6000
Location: London, England, GB
View Company Profile

Tools using PaliGemma

No tools found for this model yet.

Last updated: June 2, 2026
0 AIs selected
Clear selection
#
Name
Task