Models

By Google

Gemini 3.1 Flash Live Preview is Google’s low-latency audio-to-audio model for real-time dialogue and voice-first AI apps. It is built for fast conversational interaction, with multimodal input support for text, images, audio, and video, and outputs in text and audio. Google positions it for acoustic nuance detection, numeric precision, and multimodal awareness.

🗣️Speech to speech 🎙Voice chatting 🔊Advanced audio generation 🎤Voice agents

NewMultimodal

Released 1d ago

Gen 4

Cohere Transcribe

By Cohere

Cohere Transcribe is an open-source automatic speech recognition model for highly accurate audio transcription. Cohere says it is built for practical enterprise use, supports 14 languages, uses a 2B parameter Conformer-based encoder-decoder architecture, and currently ranks #1 on Hugging Face’s Open ASR Leaderboard for accuracy.

🎤Voice transcription 📽Video transcription 🎙️Voice recognition 🎤Voice notes transcription

NewAudio

Released 1d ago

Gen 3

Voxtral TTS

By Mistral AI

Voxtral TTS is Mistral’s new open-source text-to-speech model for building voice agents and enterprise speech applications. According to TechCrunch, it supports 9 languages, can clone a voice from under 5 seconds of audio, preserves accents and speaking style, and is optimized for real-time use on edge devices like phones, laptops, and wearables.

🔊Text to speech 🗣️Voice cloning 🎙️Voiceovers 🎤Voice agents

NewAudio

Released 1d ago

Gen 3

TRIBE v2

By Meta Platforms

TRIBE v2 is Meta’s multimodal brain-encoding research demo. It predicts whole-brain fMRI responses to naturalistic stimuli by combining video, audio, and text representations, aiming to model how the brain reacts over time across different cortical regions and people. It builds on Meta’s TRIBE line for cross-modal brain response prediction.

🧠Neuroscience 🔬Scientific research 🧠Neuroscience exploration ⚡Neurofeedback analysis

NewMultimodal

Released 1d ago

Gen 4

Lyria 3 Pro

By Google

Lyria 3 Pro is Google DeepMind’s music generation model that creates longer songs, up to 3 minutes, with better musical structure control (intros, verses, choruses, bridges), and it is available across multiple Google products.

🔊Advanced audio generation 🎵Soundtracks 🎵Music production 🎵Songwriting

NewAudio

Released 2d ago

Gen 4

Lightning v3

By Smallest Ai

Lightning is Smallest.ai’s low-latency text-to-speech system for real-time voice agents, voiceovers, and voice cloning.

🔊Text to speech 🗣️Voice cloning 🎙️Voiceovers 🎤Voice agents

NewAudio

Released 2d ago

Gen 3

Uni 1

By Luma AI

Uni-1 is Luma’s multimodal reasoning model that can generate pixels, built to understand intent, respond to direction, and perform common-sense visual reasoning.

🖼️Image generation 🖌️Image editing 🖌️Sketch to image 📚Manga creation

NewMultimodal

Released 4d ago

Gen 3

BioReason Pro

By bowang-lab

BioReason-Pro SFT is a supervised fine-tuned checkpoint of BioReason-Pro, a multimodal reasoning LLM for protein function prediction that integrates ESM3 protein embeddings, a GO graph encoder, and biological context to generate functional annotations.

🔬Biology research assistance 🧬Biotechnology research analysis 🔬Protein engineering analysis 🧬Genome data analysis

NewMultimodal

Released 7d ago

Gen 1

InSpatio World

By InSpatio

InSpatio-World is a video-conditioned 4D world model that turns a reference video into a dynamic scene you can explore from new viewpoints through time.

🎥3D videos 🎥Spatial image to video 🎥3d scenes

New3d

Released 8d ago

Gen 3

LiteParse

By LlamaIndex

LiteParse is an open-source document parser focused on fast, lightweight parsing of PDFs into structured outputs.

📄Document processing 📄Document data extraction 🔍Text extraction 📜OCR

NewText

Released 8d ago

Gen 2

MAI Image 2

By Microsoft

MAI-Image-2 is Microsoft’s second-generation text-to-image model built for creative work, emphasizing photorealism, accurate in-image text, and detailed multi-object scenes

🖼️Image generation 🎨Image text overlay 🖼Poster creation 📸Photorealistic images

NewImage

Released 8d ago

Gen 2

Composer 2

By Anysphere

Composer 2 is Cursor’s frontier coding model optimized for high intelligence per dollar, built to solve long-horizon software engineering tasks with many tool actions.

💻Coding 🔧Code refactoring 💻Vibe coding

NewCoding

Released 8d ago

Gen 3

Xiaomi MiMo V2 TTS

By Xiaomi

MiMo-V2-TTS is Xiaomi’s large-scale speech synthesis model built for expressive agent voice, aiming for natural, emotionally aware speech.

🔊Text to speech 🎤Voice changing 🎙️Voiceovers 🎤Singing

NewAudio

Released 9d ago

Gen 3

Xiaomi MiMo V2 Omni

By Xiaomi

MiMo-V2-Omni is an omni foundation model that unifies multimodal understanding with agentic capability, built to see, hear, and act.

📚Large Language Models 🎥Video summaries 📚Audio summaries 🔍Image interpretation

NewMultimodal

Released 9d ago

Gen 3

Xiaomi MiMo V2 Pro

By Xiaomi

MiMo-V2-Pro is Xiaomi’s flagship foundation model built for real-world agent workloads, designed to act as the “brain” of agent systems that orchestrate complex workflows and tool use.

📚Large Language Models 🤖Agents 🔄Workflow optimization 💻Conversational coding

NewText

Released 9d ago

Gen 4 MiniMax

MiniMax M2.7

By MiniMax

MiniMax M2.7 is MiniMax’s new text model release positioned around “self-evolution,” aimed at higher performance and value for complex tasks.

💬Chatting 💻Software engineering guidance 💻Conversational coding 🗣️Multi-agent conversations

NewText

Released 9d ago

Gen 7

Mamba-3

By Together AI

Mamba-3 is a new state space model (SSM) architecture designed with inference efficiency as the primary goal, improving prefill and decode latency across sequence lengths.

🔨LLM development 🎓LLM training 🔄Language model optimization 🧠AI inference

NewText

Released 9d ago

Gen 3 GPT

GPT 5.4 Nano

By OpenAI

GPT-5.4 nano is the smallest, lowest-cost GPT-5.4-family model, optimized for speed and high-throughput tasks.

🔍Data extraction 🎯Code autocompletion 🔍Image interpretation 🔍Data classification

NewMultimodal

Released 10d ago

Gen 3 GPT

GPT 5.4 Mini

By OpenAI

GPT-5.4 mini is a fast, efficient GPT-5.4-family model optimized for high-volume coding and agent workloads, while keeping strong reasoning, multimodal understanding, and tool use.

💬Chatting 🎯Code autocompletion 🔍Image interpretation 💻Conversational coding

NewMultimodal

Released 10d ago

Gen 4

Midjourney V8

By Midjourney

NewImage

Released 10d ago

Gen 7

Nemotron Cascade 2 30B A3B

By NVIDIA

Nemotron-Cascade 2 is an open 30B Mixture-of-Experts model (3B activated) trained with Cascade RL and multi-domain on-policy distillation for strong reasoning and agentic capabilitie

💬Chatting 🔍Advanced reasoning 💻Conversational coding ⌨️Competitive programming coaching

NewText

Released 11d ago

Search

No models found

Search

Models

No models found

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: