TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

AI Models Directory

Browse and discover AI models from leading companies in the industry.

to
  • Kling Video 3.0 is Kuaishou's newest AI video model that unifies text, image, audio and reference video in one engine, generating up to 15 second photorealistic clips with native multi-language audio and strong consistency across shots.
    NewMultimodal
    Released 7d ago
  • Seedance 2.0 is ByteDance's multimodal AI video model that turns text plus image, video and audio references into high-resolution, sound-synced clips, giving creators director-level control over camera, motion, style and multi-shot storytelling.
    NewMultimodal
    Released 17h ago
  • By Xmax AI
    XMAX X1 is a real-time interactive video model that fuses virtual and real worlds, using the phone camera and touch gestures for millisecond-level, on-device AR-style experiences.
    NewVideo
    Released 2d ago
  • Ultralytics YOLO is a family of real-time computer-vision models for detection, segmentation, classification, pose, and tracking, designed to be fast, accurate, and easy to deploy across edge and cloud.
    Image
    Released 3y ago
  • SoulX-FlashTalk is a 14B audio-driven avatar model that delivers high-fidelity lip-synced digital humans in real time, with sub-second startup and 30+ FPS streaming for live content.
    Audio
    Released 10mo ago
  • MIRA is a multimodal medical RAG framework that combines image features and a medical knowledge base with dynamic context control to improve factual accuracy in clinical reasoning.
    Text
    Released 8mo ago
  • Jamba2 is AI21’s open-source enterprise LLM family, optimized for reliability, steerability, and grounding, with compact 3B and 52B variants and long context for production workflows.
    NewText
    Released 17h ago
  • MiroThinker v1.5 is an open-source deep research agent that orchestrates tools and web search to plan, retrieve, and synthesize evidence, with variants tuned for financial prediction.
    Text
    Released 10mo ago
  • Apriel-1.5-15B-Thinker is a 15B multimodal reasoning model from ServiceNow, delivering frontier-level text and image reasoning using mid-training techniques at a fraction of typical scale.
    Text
    Released 10mo ago
  • Hermes 4.3 is Nous Research’s 36B hybrid reasoning model, based on Seed-OSS-36B, offering long context (up to 512k) and very high helpfulness on RefusalBench while staying locally deployable.
    NewText
    Released 2mo ago
  • LLaDA2.X is InclusionAI’s diffusion language model family, scaling to 100B parameters and using parallel decoding to deliver fast, high-quality text generation and code with up to 500+ tokens per second.
    Text
    Released 7mo ago
  • By NVIDIA
    OmniVinci is NVIDIA’s 9B omni-modal LLM that jointly understands images, video, audio, and text, achieving strong cross-modal reasoning with only about 0.2T training tokens.
    Image
    Released 8mo ago
  • By Ai2
    olmOCR is AllenAI’s open-source document recognition pipeline and model family that converts PDFs and images into clean text, preserving reading order, tables, equations, and handwriting.
    Image
    Released 3mo ago
  • By NovaSky
    Sky-T1 is NovaSky’s open reasoning model family, including a 32B preview model that matches o1-preview on key benchmarks while being trainable for under 450 USD.
    Text
    Released 1y ago
  • Vchitect-2.0 is a parallel-transformer text-to-video diffusion model that scales to large video datasets, improving text alignment and temporal coherence for longer, higher-quality clips.
    Video
    Released 1y ago
  • GameGen-X is a diffusion transformer specifically built for open-world game video, generating and interactively controlling characters, environments, and actions in long gameplay clips.
    NewVideo
    Released 1d ago
  • Riverflow 2.0 is Sourceful’s production-grade image model for brand design, focusing on photorealism, layout accuracy, font control, and 4K-ready visuals for packaging and marketing.
    Image
    Released 3mo ago
  • DreamDojo is NVIDIA’s generalist robot world model trained on 44k hours of egocentric human video, enabling real-time, action-conditioned simulation and planning for diverse robot bodies.
    NewVideo
    Released 6d ago
  • By YuanLab
    Yuan 3.0 Flash is a 40B MoE multimodal foundation model from YuanLab that activates about 3.7B parameters per token, targeting enterprise reasoning with lower compute per token.
    NewMultimodal
    Released 1mo ago
  • Zonos-v0.1 is Zyphra’s open-weight text-to-speech family, two 1.6B models trained on 200k+ hours of multilingual speech, offering expressive, real-time TTS and high-quality voice cloning.
    NewAudio
    Released 2d ago
  • Orpheus TTS is Canopy Labs’ Llama-based 3B speech LLM for natural, emotionally controllable, multilingual text-to-speech with real-time streaming and voice cloning.
    Audio
    Released 10mo ago

No models found

Try adjusting your search or filters.

0 AIs selected
Clear selection
#
Name
Task