✅

Tasks26,537

🎲

Random14,929

🔍

Image recognition6

Go to ✅ Tasks page 🎨 Creativity (10554) 💼 Work (9678) 🙋‍♂️ Personal (6305) 🎲 Random (14929)

Go to 🎲 Random 🎲 Storytelling game (79) 💬 Philosophical conversations (63) 🎮 Game strategies (50) 🗣️ English communication improvement (47) 🎮 Gaming coach (36) 🎨 Artistic guidance (35) 🗣 Conversational management (35) 🧘 Stoic advice (28) 💬 Conversation support (26) 🔍 Tech insights (26) 🌱 Gardening (25) 🔧 Vehicle diagnosis (25) 🌍 Immigration advice (24) 💡 Coding help (24) 🛠 DIY (22) 🏋️ Workout planning (22) 😱 Horror images (22) ❓ Questions generation (21) 🎯 Strategic advice (21) 🎤 Speeches (20)

Image recognition

taaft.com/image-recognition 28,083 subscribers

There are 2 GPTs and 2 GPTs for Image recognition.

Copy 🔗

Number of tools

Number of models

Number of robots

Number of devices

Specialized tools 2

Visual Identifier

Identify objects in images, simply upload a pic

Share

Released 2y ago
100% Free

1,022
13
5.0
6
What Is This

Expert at identifying objects from images, providing insightful information.

Share

Released 2y ago
100% Free

933
10
47

Related Tasks✕

Image analysis51 0

Image organization16 0

Image recreation15 0

Image search15 0

Facial recognition6 0

Image querying4 0

Image authenticity analysis4 0

Image reinterpretation2 0

Image recognition game1 0

Intent recognition1 0

Gesture recognition1 0

Image interpretation1 0

Image data extraction1 0

Models 40

Research preview Gen 3

ACT 2

By Sunday

ACT 2 is a robotics foundation model developed by Sunday Robotics for autonomous home manipulation. Pretrained on a large, high quality sensorized human demonstration dataset, it generalizes learned behaviors to unseen home environments with minimal additional data. In evaluation it achieved a 99.1% zero shot success rate folding diverse garments across unfamiliar rooms, surfaces, and starting configurations, with no per home data collection or fine tuning required.

🤖Robotics 🤖Task automation 🔍Image recognition 🎥Video analysis 🏠Home automation

NewMultimodal

Released 2d ago
Gen 3

SenseNova Vision 7B MoT

By SenseTime

SenseNova-Vision-7B-MoT is a unified multimodal model for computer vision. It reformulates object detection, OCR, referring localization, GUI grounding, keypoint detection, depth estimation, surface normal prediction, segmentation, and multi-view geometry as text, image, or mixed text-image generation, using natural language instructions instead of separate task-specific heads or decoders for each vision task.

👁️Computer vision assistance 🔍Image recognition 🖼️Image segmentation 🔍Image analysis 📜OCR

NewMultimodal

Released 6d ago
Gen 4 LingBot

LingBot Vision Giant

By Robbyant

LingBot-Vision-Giant is a ViT-g/16 self-supervised Vision Transformer backbone for dense visual perception. Pretrained with masked boundary modeling, a boundary-centric objective that produces spatially structured patch features while retaining strong semantic representations, intended for feature extraction and dense prediction research.

🔍Image analysis 🔍Image recognition 🖼️Image segmentation 🎥Video analysis 👁️Computer vision assistance

NewImage

Released 13d ago
Gen 4

LFM2.5 VL 450M

By Liquid AI

LFM2.5-VL-450M is Liquid AI’s compact vision-language model for structured visual intelligence from edge to cloud. It is built to turn image streams into grounded, actionable outputs in real time, adding object grounding, better instruction following, multilingual image understanding, and function calling support while staying efficient enough for edge hardware.

🔍Image interpretation 🔍Image recognition 📜OCR

Image

Released 3mo ago
Gen 1

WildDet3D

By Ai2

WildDet3D is Ai2’s open model for monocular 3D object detection from a single RGB image. It predicts full 3D bounding boxes, including position, size, and orientation in metric coordinates, and supports flexible prompts such as text queries, point clicks, and 2D boxes. It is designed for open-world spatial understanding across varied cameras and can also use optional depth signals when available.

🧩Spatial intelligence 🔍Image recognition 👁️Computer vision assistance

3d

Released 3mo ago
Gen 3

Falcon Perception

By Technology Innovation Institute

Falcon Perception is TII UAE’s 0.6B early-fusion vision-language model for open-vocabulary grounding and instance segmentation. Given an image and a natural language query, it can return zero, one, or many matching objects with pixel-accurate masks, making it suited for promptable object selection and dense visual localization tasks.

🖼️Image segmentation 🔍Image recognition 🔍Image querying 🔍Object identification

Multimodal

Released 3mo ago
Gen 3

Wholembed v3

By mixedbread ai

Wholembed v3 is Mixedbread’s unified omnimodal, multilingual late-interaction retrieval model built for state-of-the-art search across languages and modalities.

🔍Information retrieval 🌐Text translation 🔍Image recognition 🔍Image search

Multimodal

Released 4mo ago
Gen 7

OmniScient Model

By ByteDance

OmniScient Model (OSM) is an open-ended visual recognition approach that predicts free-form class labels for visual entities without requiring a predefined vocabulary at test time.

🔍Image recognition 🖼️Image segmentation 🔍Object identification 👁️Computer vision assistance

Text

Released 4mo ago
Gen 3

Carbon Robotics Large Plant Model

By Carbon Robotics

Large Plant Model (LPM) is Carbon Robotics’ vision model for agriculture, trained on 150M labeled plants to recognize crops and weeds in many climates, powering Carbon AI, LaserWeeder and AutoTractor for precise autonomous weed control.

🌿Plant identification 🔍Image recognition

Multimodal

Released 5mo ago
Gen 3 LingBot

LingBot VLA 4B Depth

By Robbyant

LingBot VLA 4B Depth is a vision language action foundation model for robot manipulation. Pretrained on 20,000 hours of real world data from 9 dual arm robot configurations, it adds a depth distilled module for improved spatial perception, mapping camera images and language instructions directly to robot action sequences.

🔍Image recognition 🧩Spatial intelligence 🤖Robotic guidance

Multimodal

Released 5mo ago
Gen 4

VIGA

By Fugtemypt123

VIGA is a vision-as-inverse-graphics agent that rebuilds a single image as an editable 3D Blender scene, alternating generator and verifier roles with interleaved multimodal reasoning to capture objects, layout, physics and interactions.

🌍3D images 🔍Image recognition 🎥YouTube thumbnails

Image

Released 5mo ago
Gen 4

SHARP

By Apple

SHARP is Apple’s monocular view-synthesis model that regresses a 3D Gaussian scene from one photo in under a second on a standard GPU, enabling real-time, photorealistic nearby views with metric camera motion.

🖼️Image generation 🔍Image recognition

Image

Released 7mo ago
Gen 4 Earth

OlmoEarth v1 Base

By The Allen Institute for Artificial Intelligence

OlmoEarth-v1-Base is a ViT-Base (89M parameter) vision foundation model pre-trained for Earth observation tasks. It processes Sentinel-1, Sentinel-2, and Landsat satellite images and image time series, and can be fine-tuned for downstream remote sensing tasks such as segmentation and classification.

🔍Image recognition 🖼️Image segmentation 🌐Geospatial analysis

Image

Released 8mo ago
Gen 4

Precision V2

By Generative Suite

Precision V2 refines V1 with cleaner micro-texture and steadier small-text legibility at similar speed.

🔍Image upscaling 🖌️Image editing 🔍Image recognition

Image

Released 8mo ago
Gen 4 Hunyuan

HunyuanWorld Mirror

By Tencent

HunyuanWorld Mirror is a scene-reconstruction and world-modeling system. It turns photos and videos into a consistent digital twin that you can explore, edit, and render, with export to common 3D formats for simulation, virtual production, and design.

🖼️Image generation 🔍Image recognition

Image

Released 8mo ago
Gen 4

Crystal Upscaler

By ClarityAI

Crystal Upscaler is an image super-resolution and enhancement model that enlarges 2x to 8x while restoring detail, reducing noise, and fixing compression artifacts. It works on photos, renders, anime, and UI art with controllable sharpness and texture preservation.

🔍Image upscaling 🖌️Image editing 📝Social media bios 🔍Image recognition

Image

Released 9mo ago
Gen 3

FlowRVS

By xmz111

FlowRVS is a referring video object segmentation method that learns a text-conditioned continuous flow to deform a video’s spatiotemporal representation into the target object mask.

🖼️Image segmentation 🔍Image recognition 🎥Video analysis 👁️Computer vision assistance

Multimodal

Released 9mo ago
Gen 3

CameraTrapAI

By Google

Google-provided AI models for classifying wildlife species in camera trap images.

🔍Image recognition 🧝‍♀️Fantasy images

Multimodal

Released 9mo ago
Gen 4

Pixel Perfect Depth

By Xiaomi

Pixel-Perfect Depth is a monocular depth estimation model that uses pixel-space diffusion transformers to predict high-quality, flying-pixel-free depth maps for dense point clouds, accepted at NeurIPS 2025.

🖼️Image generation 🔍Image upscaling 🔍Image recognition

Image

Released 9mo ago
Gen 4 Hunyuan

HunyuanImage 3.0

By Tencent

HunyuanImage 3.0 is Tencent’s next-gen text-to-image model. It delivers sharper detail, stronger style and identity consistency, improved typography, and precise, in-place editing—built for fast iteration from concept to production-ready visuals.

🖼️Image generation 🖌️Image editing 💡Coding help 🔍Image recognition

Image

Released 9mo ago
Gen 3 Qianfan

Qianfan-VL-3B

By Baidu

Qianfan-VL-3B is Baidu’s lightweight VLM for cost-sensitive, real-time multimodal apps. It processes images plus text and returns grounded answers with basic OCR and layout understanding, long context, tool/function calling, and JSON outputs—optimized for speed and efficiency.

🏭Manufacturing 🖼️Image to text 🔍Image recognition

Text

Released 9mo ago
Gen 3 Qianfan

Qianfan VL 70B

By Baidu

Qianfan-VL 70B is Baidu’s large vision-language model on the Qianfan platform. It ingests images (docs, charts, screenshots, photos) with text and produces grounded answers, featuring strong OCR and layout understanding, long context, tool/function calling, streaming, and reliable JSON outputs for multimodal RAG and enterprise apps.

📜OCR 🖼️3D image generation 🎬Video dubbing 🔍Image recognition

Text

Released 9mo ago
Gen 3 Moondream

Moondream 3 Preview

By Moondream

Moondream 3 Preview is a compact frontier-oriented vision-language model built for fast visual reasoning, grounding, OCR, object detection, pointing, and structured output. It uses a 9B MoE architecture with 2B active parameters and extends context length to 32K, aiming to deliver strong real-world vision performance while staying efficient and inexpensive to run.

🔍Image interpretation 🔍Image recognition 🖼️Image segmentation 📜OCR

Multimodal

Released 10mo ago
Gen 4

DINOv3

By Meta Platforms

DINOv3 is a self supervised computer vision model from Meta that produces universal image backbones without using human labeled data. Trained up to 7 billion parameters on 1.7 billion images, it generates high resolution dense visual features that power tasks like object detection, image segmentation, depth estimation and video object tracking without any finetuning.

🔍Image analysis 🔍Image recognition 🖼️Image segmentation 🎥Video analysis

Image

Released 11mo ago
Gen 3 Command

Command A Vision

By Caldera Labs

Command A Vision is Cohere’s multimodal instruction model that pairs text and image understanding. It accepts images plus text prompts and outputs structured, step-by-step text answers. It’s tuned for enterprise workflows like document OCR, chart/diagram reasoning, screenshot/UI analysis, and tool or function calling.

📜OCR 🖼️Image to text 🔍Image recognition

Text

Released 11mo ago
Gen 3 Kanana

Kanana 1.5

By Naver

Kanana-1.5-v-3B is a 3B-parameter vision–language model in Kakao’s Kanana line. It can process both images and text prompts, outputting grounded answers in natural language or structured JSON. It’s optimized for lightweight multimodal assistants and enterprise applications that need efficiency with visual reasoning.

📜OCR 🔍Image recognition

Text

Released 11mo ago
Gen 4 Earth

Earth 2 FourCastNet 3

By NVIDIA

Earth-2 FourCastNet 3 is a geometric ML global ensemble model that respects spherical Earth geometry to deliver fast, probabilistic medium- to subseasonal forecasts, outperforming leading numerical ensembles at much lower cost.

❤️Empathetic conversations 🔍Image recognition 💥Bdsm education

Image

Released 1y ago
Gen 3 Moondream

moondream 2

By Moondream

Moondream 2 is a small open vision-language model designed to run efficiently across many environments. It is the previous-generation Moondream model, released under Apache 2.0, and is positioned as a lightweight image-text model for practical multimodal use where smaller size and deployability matter more than maximum frontier scale.

🖼️Image generation 🔍Code reviews 🔍Image recognition

Multimodal

Released 1y ago
Gen 4

Hunyuan HY-T1

By Alibaba

Hunyuan T1 is Tencent’s deep reasoning model positioned for stronger structured reasoning and long-context analysis.

🖼️Image generation 🔍Image recognition

Image

Released 1y ago
Gen 3 Qwen

Qwen 2.5-VL-72B

By Alibaba

Qwen 2.5-VL-72B is Alibaba’s flagship open-weight vision-language model. It takes images (docs, charts, screenshots, photos) plus text and answers in text, with strong OCR, layout understanding, and multi-image reasoning. It supports long context, function/tool calling, and reliable JSON outputs—ideal for multimodal RAG, agents, and enterprise workflows.

🏭Manufacturing 🖼️Image to text 🔍Image recognition

Text

Released 1y ago
Gen 4 Moondream

Moondream 0.5B

By Moondream

Moondream 0.5B is a tiny open-source vision-language model built for edge devices and mobile platforms. With only 0.5B parameters, it is positioned as the world’s smallest VLM, designed for fast lightweight deployment on constrained hardware while still supporting practical real-world visual tasks.

🔍Image interpretation 🔍Image recognition

Image

Released 1y ago
Gen 3 Gemma

PaliGemma 2

By Google DeepMind

PaliGemma 2 is Google’s upgraded open vision-language model family based on Gemma 2, available in 3B, 10B, and 28B sizes.

🔍Image recognition

Text

Released 1y ago
Gen 4

Zen

By Freepik Company

Zen aims for minimalist, calm compositions with natural lighting and restrained color.

🖼️Image generation 🔍Image recognition 🔄Website conversion 📐Blueprints

Image

Released 1y ago
Gen 3

NV-CLIP

By NVIDIA

NV-CLIP is NVIDIA’s CLIP-style vision–language encoder that maps images and text into a shared embedding space for visual search, cross-modal retrieval, and zero-shot classification. It’s optimized for NVIDIA GPUs and easy to deploy at scale.

🎨NFT art 🔊Text to speech 🔍Image recognition

Text

Released 1y ago
Gen 6 LLaMA

Llama 3.2

By Meta Platforms

Includes lightweight text models (1B, 3B for edge/mobile, 128k context) and vision models (11B, 9...

📷Images 🔍Image recognition

Text

Released 1y ago
Gen 4

OmniParser

By Microsoft

OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.

🔍Image recognition 🖼️Image generation

Image

Released 1y ago
Gen 3 Palmyra

Palmyra Vision

By Writer Engineering

Palmyra Vision is Writer’s multimodal LLM that takes images as input and generates text output. It can extract text from images (including handwriting), interpret charts/graphs/diagrams, classify objects, and answer questions about visual content—all aimed at enterprise workflows.

🖼️Image to text 🔍Image recognition 🖼️Image descriptions

Text

Released 2y ago
Gen 4

Lightning XL

By Leonardo Interactive

Lightning XL is a speed optimized SDXL checkpoint that produces strong images in very few steps, ideal for rapid iteration.

🖼️Image generation 🔍Image upscaling 🔍Image recognition

Image

Released 2y ago
Gen 4

Ultralytics YOLO

By Ultralytics

Ultralytics YOLO is a family of real-time computer-vision models for detection, segmentation, classification, pose, and tracking, designed to be fast, accurate, and easy to deploy across edge and cloud.

🔍Image recognition

Image

Released 3y ago
Gen 4

NSFW Model

By Infinite Red

Image classification model trained on 60+ gigs of data to detect NSFW content. Classifies images into five categories: drawings, hentai, neutral, porn, and sexy. Built on Inception V3 and MobileNet V2 architectures with 93% accuracy.

🔍Image recognition 🔍Image analysis

Image

Released 6y ago

Robots 2

Calvin-40

Humanoid · FR · Fully autonomous · 06 Jun 2025

In production

🤖Task automation

Calvin is an industrial humanoid robot designed for heavy-load handling and precision industrial tasks. It is the first humanoid robot de...
Enlight L

Manipulator · CN · Semi-autonomous · 22 Mar 2026

In production

🤖Task automation

Enlight L is a lightweight 7-DoF force-controlled adaptive robot designed for safe human-robot collaboration. It features force sensors i...

Devices 7

Echo Show 10

Smart Display · Amazon
Feb 25, 2021

Discontinued$249.00

Rotating 10.1” smart display with Alexa, motion tracking, video calling, smart home hub, and AZ1 Neural Edge AI processor.
ZED 2i

AI Camera · Stereolabs
Dec 20, 2019

Available$499.00

The ZED 2i is an industrial-grade AI stereo camera designed for depth sensing and spatial perception. It uses a Neural Depth Engine combi...
OrCam MyEye 3 Pro

AI Wearable · OrCam Technologies
Nov, 2023

Available$4,490.00

The OrCam MyEye 3 Pro is a wearable AI-powered assistive device for blind and visually impaired users. A compact camera module magnetical...
Envision Glasses

Smart Glasses · Envision Technologies BV
Nov 25, 2020

Available$2,499.00

AI-powered smart glasses built on Google Glass Enterprise Edition 2 that help blind and low-vision users independently access visual info...
Rock X

Other · Alcatraz
Nov 1, 2024

AvailableN/A

The Rock X is an enterprise-grade AI-powered facial biometric authentication device for access control. Operating at the edge, it uses ma...
Bird Buddy Smart Bird Feeder Pro

AI Camera · Bird Buddy
Oct 1, 2022

Available$239.00

The Bird Buddy Smart Bird Feeder Pro is an AI-powered bird feeder with a built-in camera that captures 5MP photos and 2K HDR video of vis...
AIY Vision Kit

AI Camera · Google
Jun 1, 2017

Discontinued$89.99

The AIY Vision Kit from Google is a do-it-yourself intelligent camera that lets you build an image recognition device using machine learn...