Models
Browse and discover AI models from leading companies in the industry.
-
By GoogleGemini 3.1 Flash Live Preview is Googleโs low-latency audio-to-audio model for real-time dialogue and voice-first AI apps. It is built for fast conversational interaction, with multimodal input support for text, images, audio, and video, and outputs in text and audio. Google positions it for acoustic nuance detection, numeric precision, and multimodal awareness.NewMultimodalReleased 1d ago
-
By CohereCohere Transcribe is an open-source automatic speech recognition model for highly accurate audio transcription. Cohere says it is built for practical enterprise use, supports 14 languages, uses a 2B parameter Conformer-based encoder-decoder architecture, and currently ranks #1 on Hugging Faceโs Open ASR Leaderboard for accuracy.NewAudioReleased 1d ago
-
By Mistral AIVoxtral TTS is Mistralโs new open-source text-to-speech model for building voice agents and enterprise speech applications. According to TechCrunch, it supports 9 languages, can clone a voice from under 5 seconds of audio, preserves accents and speaking style, and is optimized for real-time use on edge devices like phones, laptops, and wearables.NewAudioReleased 1d ago
-
TRIBE v2 is Metaโs multimodal brain-encoding research demo. It predicts whole-brain fMRI responses to naturalistic stimuli by combining video, audio, and text representations, aiming to model how the brain reacts over time across different cortical regions and people. It builds on Metaโs TRIBE line for cross-modal brain response prediction.NewMultimodalReleased 1d ago
-
By GoogleLyria 3 Pro is Google DeepMindโs music generation model that creates longer songs, up to 3 minutes, with better musical structure control (intros, verses, choruses, bridges), and it is available across multiple Google products.NewAudioReleased 2d ago
-
By Smallest AiLightning is Smallest.aiโs low-latency text-to-speech system for real-time voice agents, voiceovers, and voice cloning.NewAudioReleased 2d ago
-
By Luma AIUni-1 is Lumaโs multimodal reasoning model that can generate pixels, built to understand intent, respond to direction, and perform common-sense visual reasoning.NewMultimodalReleased 4d ago
-
By bowang-labBioReason-Pro SFT is a supervised fine-tuned checkpoint of BioReason-Pro, a multimodal reasoning LLM for protein function prediction that integrates ESM3 protein embeddings, a GO graph encoder, and biological context to generate functional annotations.NewMultimodalReleased 7d ago
-
By InSpatioInSpatio-World is a video-conditioned 4D world model that turns a reference video into a dynamic scene you can explore from new viewpoints through time.New3dReleased 8d ago
-
By LlamaIndexLiteParse is an open-source document parser focused on fast, lightweight parsing of PDFs into structured outputs.NewTextReleased 8d ago
-
By MicrosoftMAI-Image-2 is Microsoftโs second-generation text-to-image model built for creative work, emphasizing photorealism, accurate in-image text, and detailed multi-object scenesNewImageReleased 8d ago
-
By AnysphereComposer 2 is Cursorโs frontier coding model optimized for high intelligence per dollar, built to solve long-horizon software engineering tasks with many tool actions.NewCodingReleased 8d ago
-
By XiaomiMiMo-V2-TTS is Xiaomiโs large-scale speech synthesis model built for expressive agent voice, aiming for natural, emotionally aware speech.NewAudioReleased 9d ago
-
By XiaomiMiMo-V2-Omni is an omni foundation model that unifies multimodal understanding with agentic capability, built to see, hear, and act.NewMultimodalReleased 9d ago
-
By XiaomiMiMo-V2-Pro is Xiaomiโs flagship foundation model built for real-world agent workloads, designed to act as the โbrainโ of agent systems that orchestrate complex workflows and tool use.NewTextReleased 9d ago
-
By MiniMaxMiniMax M2.7 is MiniMaxโs new text model release positioned around โself-evolution,โ aimed at higher performance and value for complex tasks.NewTextReleased 9d ago
-
By Together AIMamba-3 is a new state space model (SSM) architecture designed with inference efficiency as the primary goal, improving prefill and decode latency across sequence lengths.NewTextReleased 9d ago
-
By OpenAIGPT-5.4 nano is the smallest, lowest-cost GPT-5.4-family model, optimized for speed and high-throughput tasks.NewMultimodalReleased 10d ago
-
By OpenAIGPT-5.4 mini is a fast, efficient GPT-5.4-family model optimized for high-volume coding and agent workloads, while keeping strong reasoning, multimodal understanding, and tool use.NewMultimodalReleased 10d ago
-
NewImageReleased 10d ago
-
By NVIDIANemotron-Cascade 2 is an open 30B Mixture-of-Experts model (3B activated) trained with Cascade RL and multi-domain on-policy distillation for strong reasoning and agentic capabilitieNewTextReleased 11d ago
No models found
Try adjusting your search or filters.
