AI Models Directory
Browse and discover AI models from leading companies in the industry.
-
Kling Video 3.0 is Kuaishou's newest AI video model that unifies text, image, audio and reference video in one engine, generating up to 15 second photorealistic clips with native multi-language audio and strong consistency across shots.NewMultimodalReleased 7d ago
-
By ByteDanceSeedance 2.0 is ByteDance's multimodal AI video model that turns text plus image, video and audio references into high-resolution, sound-synced clips, giving creators director-level control over camera, motion, style and multi-shot storytelling.NewMultimodalReleased 17h ago
-
By Xmax AIXMAX X1 is a real-time interactive video model that fuses virtual and real worlds, using the phone camera and touch gestures for millisecond-level, on-device AR-style experiences.NewVideoReleased 2d ago
-
By UltralyticsUltralytics YOLO is a family of real-time computer-vision models for detection, segmentation, classification, pose, and tracking, designed to be fast, accurate, and easy to deploy across edge and cloud.ImageReleased 3y ago
-
By Soul AILabSoulX-FlashTalk is a 14B audio-driven avatar model that delivers high-fidelity lip-synced digital humans in real time, with sub-second startup and 30+ FPS streaming for live content.AudioReleased 10mo ago
-
By mbzuai-oryxMIRA is a multimodal medical RAG framework that combines image features and a medical knowledge base with dynamic context control to improve factual accuracy in clinical reasoning.TextReleased 8mo ago
-
By AI21 LabsJamba2 is AI21’s open-source enterprise LLM family, optimized for reliability, steerability, and grounding, with compact 3B and 52B variants and long context for production workflows.NewTextReleased 17h ago
-
By MiroMindAIMiroThinker v1.5 is an open-source deep research agent that orchestrates tools and web search to plan, retrieve, and synthesize evidence, with variants tuned for financial prediction.TextReleased 10mo ago
-
By ServiceNowApriel-1.5-15B-Thinker is a 15B multimodal reasoning model from ServiceNow, delivering frontier-level text and image reasoning using mid-training techniques at a fraction of typical scale.TextReleased 10mo ago
-
Hermes 4.3 is Nous Research’s 36B hybrid reasoning model, based on Seed-OSS-36B, offering long context (up to 512k) and very high helpfulness on RefusalBench while staying locally deployable.NewTextReleased 2mo ago
-
By InclusionAILLaDA2.X is InclusionAI’s diffusion language model family, scaling to 100B parameters and using parallel decoding to deliver fast, high-quality text generation and code with up to 500+ tokens per second.TextReleased 7mo ago
-
By NVIDIAOmniVinci is NVIDIA’s 9B omni-modal LLM that jointly understands images, video, audio, and text, achieving strong cross-modal reasoning with only about 0.2T training tokens.ImageReleased 8mo ago
-
By Ai2olmOCR is AllenAI’s open-source document recognition pipeline and model family that converts PDFs and images into clean text, preserving reading order, tables, equations, and handwriting.ImageReleased 3mo ago
-
By NovaSkySky-T1 is NovaSky’s open reasoning model family, including a 32B preview model that matches o1-preview on key benchmarks while being trainable for under 450 USD.TextReleased 1y ago
-
Vchitect-2.0 is a parallel-transformer text-to-video diffusion model that scales to large video datasets, improving text alignment and temporal coherence for longer, higher-quality clips.VideoReleased 1y ago
-
By GameGen-XGameGen-X is a diffusion transformer specifically built for open-world game video, generating and interactively controlling characters, environments, and actions in long gameplay clips.NewVideoReleased 1d ago
-
By Riverflow AIRiverflow 2.0 is Sourceful’s production-grade image model for brand design, focusing on photorealism, layout accuracy, font control, and 4K-ready visuals for packaging and marketing.ImageReleased 3mo ago
-
By DreamDojoDreamDojo is NVIDIA’s generalist robot world model trained on 44k hours of egocentric human video, enabling real-time, action-conditioned simulation and planning for diverse robot bodies.NewVideoReleased 6d ago
-
By YuanLabYuan 3.0 Flash is a 40B MoE multimodal foundation model from YuanLab that activates about 3.7B parameters per token, targeting enterprise reasoning with lower compute per token.NewMultimodalReleased 1mo ago
-
By Zyphra AIZonos-v0.1 is Zyphra’s open-weight text-to-speech family, two 1.6B models trained on 200k+ hours of multilingual speech, offering expressive, real-time TTS and high-quality voice cloning.NewAudioReleased 2d ago
-
Orpheus TTS is Canopy Labs’ Llama-based 3B speech LLM for natural, emotionally controllable, multilingual text-to-speech with real-time streaming and voice cloning.AudioReleased 10mo ago
No models found
Try adjusting your search or filters.
