AI Models Directory

Browse and discover AI models from leading companies in the industry.

Kling 3.0 Gen 3

By Kuaishou Technology

Kling Video 3.0 is Kuaishou's newest AI video model that unifies text, image, audio and reference video in one engine, generating up to 15 second photorealistic clips with native multi-language audio and strong consistency across shots.

NewMultimodal

Released 7d ago
Seedance 2.0 Gen 3

By ByteDance

Seedance 2.0 is ByteDance's multimodal AI video model that turns text plus image, video and audio references into high-resolution, sound-synced clips, giving creators director-level control over camera, motion, style and multi-shot storytelling.

NewMultimodal

Released 17h ago
XMAX X1 Gen 4

By Xmax AI

XMAX X1 is a real-time interactive video model that fuses virtual and real worlds, using the phone camera and touch gestures for millisecond-level, on-device AR-style experiences.

NewVideo

Released 2d ago
Ultralytics YOLO Gen 4

By Ultralytics

Ultralytics YOLO is a family of real-time computer-vision models for detection, segmentation, classification, pose, and tracking, designed to be fast, accurate, and easy to deploy across edge and cloud.

Image

Released 3y ago
SoulX FlashTalk Gen 4

By Soul AILab

SoulX-FlashTalk is a 14B audio-driven avatar model that delivers high-fidelity lip-synced digital humans in real time, with sub-second startup and 30+ FPS streaming for live content.

Audio

Released 10mo ago
MIRA Gen 7

By mbzuai-oryx

MIRA is a multimodal medical RAG framework that combines image features and a medical knowledge base with dynamic context control to improve factual accuracy in clinical reasoning.

Text

Released 8mo ago
Jamba 2 Gen 7

By AI21 Labs

Jamba2 is AI21’s open-source enterprise LLM family, optimized for reliability, steerability, and grounding, with compact 3B and 52B variants and long context for production workflows.

NewText

Released 17h ago
MiroThinker v1.5 Gen 7

By MiroMindAI

MiroThinker v1.5 is an open-source deep research agent that orchestrates tools and web search to plan, retrieve, and synthesize evidence, with variants tuned for financial prediction.

Text

Released 10mo ago
Apriel 1.5 15B Thinker Gen 7

By ServiceNow

Apriel-1.5-15B-Thinker is a 15B multimodal reasoning model from ServiceNow, delivering frontier-level text and image reasoning using mid-training techniques at a fraction of typical scale.

Text

Released 10mo ago
Hermes 4.3 Gen 7

By Nous Research

Hermes 4.3 is Nous Research’s 36B hybrid reasoning model, based on Seed-OSS-36B, offering long context (up to 512k) and very high helpfulness on RefusalBench while staying locally deployable.

NewText

Released 2mo ago
LLaDA2 X Gen 4

By InclusionAI

LLaDA2.X is InclusionAI’s diffusion language model family, scaling to 100B parameters and using parallel decoding to deliver fast, high-quality text generation and code with up to 500+ tokens per second.

Text

Released 7mo ago
OmniVinci Gen 4

By NVIDIA

OmniVinci is NVIDIA’s 9B omni-modal LLM that jointly understands images, video, audio, and text, achieving strong cross-modal reasoning with only about 0.2T training tokens.

Image

Released 8mo ago
olmOCR Gen 4

By Ai2

olmOCR is AllenAI’s open-source document recognition pipeline and model family that converts PDFs and images into clean text, preserving reading order, tables, equations, and handwriting.

Image

Released 3mo ago
Sky T1 Gen 7

By NovaSky

Sky-T1 is NovaSky’s open reasoning model family, including a 32B preview model that matches o1-preview on key benchmarks while being trainable for under 450 USD.

Text

Released 1y ago
Vchitect 2.0 Gen 4

By Shanghai AI Laboratory

Vchitect-2.0 is a parallel-transformer text-to-video diffusion model that scales to large video datasets, improving text alignment and temporal coherence for longer, higher-quality clips.

Video

Released 1y ago
GameGen X Gen 4

By GameGen-X

GameGen-X is a diffusion transformer specifically built for open-world game video, generating and interactively controlling characters, environments, and actions in long gameplay clips.

NewVideo

Released 1d ago
Riverflow Gen 4

By Riverflow AI

Riverflow 2.0 is Sourceful’s production-grade image model for brand design, focusing on photorealism, layout accuracy, font control, and 4K-ready visuals for packaging and marketing.

Image

Released 3mo ago
DreamDojo Gen 4

By DreamDojo

DreamDojo is NVIDIA’s generalist robot world model trained on 44k hours of egocentric human video, enabling real-time, action-conditioned simulation and planning for diverse robot bodies.

NewVideo

Released 6d ago
Yuan 3.0 40B Gen 3

By YuanLab

Yuan 3.0 Flash is a 40B MoE multimodal foundation model from YuanLab that activates about 3.7B parameters per token, targeting enterprise reasoning with lower compute per token.

NewMultimodal

Released 1mo ago
Zyphra Gen 4

By Zyphra AI

Zonos-v0.1 is Zyphra’s open-weight text-to-speech family, two 1.6B models trained on 200k+ hours of multilingual speech, offering expressive, real-time TTS and high-quality voice cloning.

NewAudio

Released 2d ago
Orpheus TTS Gen 4

By Canopy Labs AI

Orpheus TTS is Canopy Labs’ Llama-based 3B speech LLM for natural, emotionally controllable, multilingual text-to-speech with real-time streaming and voice cloning.

Audio

Released 10mo ago

No models found

Try adjusting your search or filters.

...

Search

AI Models Directory

No models found

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: