Models
Browse and discover AI models from leading companies in the industry.
-
NewMultimodalReleased 1d ago
-
By Sii GAIR-NLPdaVinci-LLM-3B is a 3B base language model built to make pretraining transparent and reproducible. Its release includes not only the weights, but also training trajectories, intermediate checkpoints, data-processing decisions, and more than 200 ablation studies.NewTextReleased 1d ago
-
By ChromaContext-1 is Chromaโs 20B agentic search model trained as a self-editing search agent. It is designed to decompose complex queries, prune irrelevant context, and deliver high retrieval quality at lower latency and cost than much larger frontier models.NewTextReleased 1d ago
-
By DatalabChandra is an OCR model for difficult document extraction tasks. Its GitHub description says it handles complex tables, forms, and handwriting while preserving full layout structure, making it more document-understanding focused than plain text ONewMultimodalReleased 1d ago
-
By MeituanLongCat Next is a multimodal LongCat model focused on compact yet capable visual and speech understanding. The official intro highlights strong performance despite a 28x compression ratio, with particular strength in text rendering, speech comprehension, low-latency voice conversation, and customizable voice cloning.NewMultimodalReleased 1d ago
-
By Topaz LabsTopaz Starlight Precise 2.5 is an upgraded video upscaling model available through ComfyUI Partner Nodes. It is positioned as a direct replacement for the earlier SLP-2 model, promising sharper output, fewer artifacts, and better preserved detail at the same per-frame cost.NewVideoReleased 1d ago
-
By SunoSuno is an AI music creation platform that generates complete original songs from prompts, including vocals, lyrics, and full production. It is built for fast music generation, remixing, beat making, and sharing, and supports creation from text, images, or voice inputsNewAudioReleased 2d ago
-
By GoogleGemini 3.1 Flash Live Preview is Googleโs low-latency audio-to-audio model for real-time dialogue and voice-first AI apps. It is built for fast conversational interaction, with multimodal input support for text, images, audio, and video, and outputs in text and audio. Google positions it for acoustic nuance detection, numeric precision, and multimodal awareness.NewMultimodalReleased 2d ago
-
By CohereCohere Transcribe is an open-source automatic speech recognition model for highly accurate audio transcription. Cohere says it is built for practical enterprise use, supports 14 languages, uses a 2B parameter Conformer-based encoder-decoder architecture, and currently ranks #1 on Hugging Faceโs Open ASR Leaderboard for accuracy.NewAudioReleased 2d ago
-
By Mistral AIVoxtral TTS is Mistralโs new open-source text-to-speech model for building voice agents and enterprise speech applications. According to TechCrunch, it supports 9 languages, can clone a voice from under 5 seconds of audio, preserves accents and speaking style, and is optimized for real-time use on edge devices like phones, laptops, and wearables.NewAudioReleased 2d ago
-
TRIBE v2 is Metaโs multimodal brain-encoding research demo. It predicts whole-brain fMRI responses to naturalistic stimuli by combining video, audio, and text representations, aiming to model how the brain reacts over time across different cortical regions and people. It builds on Metaโs TRIBE line for cross-modal brain response prediction.NewMultimodalReleased 2d ago
-
By MoondreamPhoton is Moondreamโs real-time vision-language model aimed at production video and image analysis. It is designed to deliver VLM-style visual reasoning fast enough for live use cases such as manufacturing inspection, broadcast moderation, retail monitoring, and security feeds.NewMultimodalReleased 3d ago
-
By GoogleLyria 3 Pro is Google DeepMindโs music generation model that creates longer songs, up to 3 minutes, with better musical structure control (intros, verses, choruses, bridges), and it is available across multiple Google products.NewAudioReleased 3d ago
-
By Smallest AiLightning is Smallest.aiโs low-latency text-to-speech system for real-time voice agents, voiceovers, and voice cloning.NewAudioReleased 3d ago
-
By Luma AIUni-1 is Lumaโs multimodal reasoning model that can generate pixels, built to understand intent, respond to direction, and perform common-sense visual reasoning.NewMultimodalReleased 5d ago
-
By bowang-labBioReason-Pro SFT is a supervised fine-tuned checkpoint of BioReason-Pro, a multimodal reasoning LLM for protein function prediction that integrates ESM3 protein embeddings, a GO graph encoder, and biological context to generate functional annotations.NewMultimodalReleased 8d ago
-
By NVIDIAAlpamayo 1.5-10B is NVIDIAโs open 10B vision-language-action model for autonomous driving. It is built as a steerable reasoning engine for AV research, combining multi-camera visual input, text, and egomotion history to produce both chain-of-causation reasoning and future driving trajectories.NewMultimodalReleased 9d ago
-
By InSpatioInSpatio-World is a video-conditioned 4D world model that turns a reference video into a dynamic scene you can explore from new viewpoints through time.New3dReleased 9d ago
-
By LlamaIndexLiteParse is an open-source document parser focused on fast, lightweight parsing of PDFs into structured outputs.NewTextReleased 9d ago
-
By MicrosoftMAI-Image-2 is Microsoftโs second-generation text-to-image model built for creative work, emphasizing photorealism, accurate in-image text, and detailed multi-object scenesNewImageReleased 9d ago
-
By AnysphereComposer 2 is Cursorโs frontier coding model optimized for high intelligence per dollar, built to solve long-horizon software engineering tasks with many tool actions.NewCodingReleased 9d ago
No models found
Try adjusting your search or filters.
