Free mode

About Free mode

100% free

Freemium

Free Trial

Work

Random

Deals

Models

Browse and discover AI models from leading companies in the industry.

Gen 7

Harrier OSS v1 0.6b

By Microsoft

Harrier OSS v1 0.6B is Microsoft’s multilingual text embedding model for semantic search, retrieval, clustering, similarity, classification, bitext mining, and reranking. It uses a decoder-only architecture with last-token pooling and L2 normalization, supports 94 languages, handles up to 32,768 tokens, and is positioned as a strong multilingual embedding model.

NewText

Released 1d ago
Gen 3

SAM 3.1

By Meta Platforms

SAM 3.1 is Meta’s improved promptable segmentation model for images and video. It supports points, boxes, masks, text, and exemplar prompts, and is designed to segment and track objects more accurately than earlier SAM 3 releases, including open-vocabulary concepts across frames.

🖼️Image segmentation 🔍Image recognition 🎥Video analysis 🔍Object identification

NewMultimodal

Released 4d ago
Gen 3

Matrix Game 3.0

By Skywork AI

🌍Game worlds 🎥Interactive videos 🚀Physics simulations 🎥3d scenes

NewMultimodal

Released 4d ago
Gen 3

davinci llm 3B

By Sii GAIR-NLP

daVinci-LLM-3B is a 3B base language model built to make pretraining transparent and reproducible. Its release includes not only the weights, but also training trajectories, intermediate checkpoints, data-processing decisions, and more than 200 ablation studies.

📚Non-interactive language modeling 🤖Ai research assistance 🔍Data quality control 🧠Model training

NewText

Released 4d ago
Gen 3

Chroma Context 1

By Chroma

Context-1 is Chroma’s 20B agentic search model trained as a self-editing search agent. It is designed to decompose complex queries, prune irrelevant context, and deliver high retrieval quality at lower latency and cost than much larger frontier models.

🔍Information retrieval 📂Content categorization 🔍Conceptual search

NewText

Released 4d ago
Gen 3

Chandra OCR 2

By Datalab

Chandra is an OCR model for difficult document extraction tasks. Its GitHub description says it handles complex tables, forms, and handwriting while preserving full layout structure, making it more document-understanding focused than plain text O

📜OCR 🔍Text extraction 📄Document analysis 🔢Mathematical formula transcription 🔍Handwriting analysis

NewMultimodal

Released 4d ago
Gen 3

LongCat Next

By Meituan

LongCat Next is a multimodal LongCat model focused on compact yet capable visual and speech understanding. The official intro highlights strong performance despite a 28x compression ratio, with particular strength in text rendering, speech comprehension, low-latency voice conversation, and customizable voice cloning.

🖼️Image generation 🗣️Voice cloning 🔍Image interpretation 🔊Audio

NewMultimodal

Released 4d ago
Gen 3

Topaz Starlight Precise 2.5

By Topaz Labs

Topaz Starlight Precise 2.5 is an upgraded video upscaling model available through ComfyUI Partner Nodes. It is positioned as a direct replacement for the earlier SLP-2 model, promising sharper output, fewer artifacts, and better preserved detail at the same per-frame cost.

🎞️Video upscaling 🔍Image upscaling 🎥Video enhancement

NewVideo

Released 4d ago
Gen 4

Suno 5.5

By Suno

Suno is an AI music creation platform that generates complete original songs from prompts, including vocals, lyrics, and full production. It is built for fast music generation, remixing, beat making, and sharing, and supports creation from text, images, or voice inputs

🎵Music production assistance 🎵Songwriting 🎵Personalized songs 🎵Lyrics to music

NewAudio

Released 5d ago
Gen 3

Nanobanana 2

By Google

Gemini 3.1 Flash Live Preview is Google’s low-latency audio-to-audio model for real-time dialogue and voice-first AI apps. It is built for fast conversational interaction, with multimodal input support for text, images, audio, and video, and outputs in text and audio. Google positions it for acoustic nuance detection, numeric precision, and multimodal awareness.

🗣️Speech to speech 🎙Voice chatting 🔊Advanced audio generation 🎤Voice agents

NewMultimodal

Released 5d ago
Gen 4

Cohere Transcribe

By Cohere

Cohere Transcribe is an open-source automatic speech recognition model for highly accurate audio transcription. Cohere says it is built for practical enterprise use, supports 14 languages, uses a 2B parameter Conformer-based encoder-decoder architecture, and currently ranks #1 on Hugging Face’s Open ASR Leaderboard for accuracy.

🎤Voice transcription 📽Video transcription 🎙️Voice recognition 🎤Voice notes transcription

NewAudio

Released 5d ago
Gen 3

Voxtral TTS

By Mistral AI

Voxtral TTS is Mistral’s new open-source text-to-speech model for building voice agents and enterprise speech applications. According to TechCrunch, it supports 9 languages, can clone a voice from under 5 seconds of audio, preserves accents and speaking style, and is optimized for real-time use on edge devices like phones, laptops, and wearables.

🔊Text to speech 🗣️Voice cloning 🎙️Voiceovers 🎤Voice agents

NewAudio

Released 5d ago
Gen 3

TRIBE v2

By Meta Platforms

TRIBE v2 is Meta’s multimodal brain-encoding research demo. It predicts whole-brain fMRI responses to naturalistic stimuli by combining video, audio, and text representations, aiming to model how the brain reacts over time across different cortical regions and people. It builds on Meta’s TRIBE line for cross-modal brain response prediction.

🧠Neuroscience 🔬Scientific research 🧠Neuroscience exploration ⚡Neurofeedback analysis

NewMultimodal

Released 5d ago
Gen 3

Photon

By Moondream

Photon is Moondream’s real-time vision-language model aimed at production video and image analysis. It is designed to deliver VLM-style visual reasoning fast enough for live use cases such as manufacturing inspection, broadcast moderation, retail monitoring, and security feeds.

🔍Image interpretation 👓Visual assistance 🔍Object identification 👁️Computer vision assistance

NewMultimodal

Released 6d ago
Gen 4

Lyria 3 Pro

By Google

Lyria 3 Pro is Google DeepMind’s music generation model that creates longer songs, up to 3 minutes, with better musical structure control (intros, verses, choruses, bridges), and it is available across multiple Google products.

🔊Advanced audio generation 🎵Soundtracks 🎵Music production 🎵Songwriting

NewAudio

Released 6d ago
Gen 4

Lightning v3

By Smallest Ai

Lightning is Smallest.ai’s low-latency text-to-speech system for real-time voice agents, voiceovers, and voice cloning.

🔊Text to speech 🗣️Voice cloning 🎙️Voiceovers 🎤Voice agents

NewAudio

Released 6d ago
Gen 3

Uni 1

By Luma AI

Uni-1 is Luma’s multimodal reasoning model that can generate pixels, built to understand intent, respond to direction, and perform common-sense visual reasoning.

🖼️Image generation 🖌️Image editing 🖌️Sketch to image 📚Manga creation

NewMultimodal

Released 8d ago
Gen 3

BioReason Pro

By bowang-lab

BioReason-Pro SFT is a supervised fine-tuned checkpoint of BioReason-Pro, a multimodal reasoning LLM for protein function prediction that integrates ESM3 protein embeddings, a GO graph encoder, and biological context to generate functional annotations.

🔬Biology research assistance 🧬Biotechnology research analysis 🔬Protein engineering analysis 🧬Genome data analysis

NewMultimodal

Released 11d ago
Gen 3

Alpamayo 1.5 10B

By NVIDIA

Alpamayo 1.5-10B is NVIDIA’s open 10B vision-language-action model for autonomous driving. It is built as a steerable reasoning engine for AV research, combining multi-camera visual input, text, and egomotion history to produce both chain-of-causation reasoning and future driving trajectories.

🚗Autonomous driving 🔍Advanced reasoning 🔍Image interpretation 👁️Computer vision assistance

NewMultimodal

Released 12d ago
Gen 1

InSpatio World

By InSpatio

InSpatio-World is a video-conditioned 4D world model that turns a reference video into a dynamic scene you can explore from new viewpoints through time.

🎥3D videos 🎥Spatial image to video 🎥3d scenes

New3d

Released 12d ago
Gen 3

LiteParse

By LlamaIndex

LiteParse is an open-source document parser focused on fast, lightweight parsing of PDFs into structured outputs.

📄Document processing 📄Document data extraction 🔍Text extraction 📜OCR

NewText

Released 12d ago

No models found

Try adjusting your search or filters.

...

✕

0 AIs selected

Clear selection

#

Name

Task