TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Papers

Filter by company
  • GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation
    Xiaomi / The University of Hong Kong
    Published on: 2025-12-19 1 author
  • Diffusion Forcing for Multi-Agent Interaction Sequence Modeling
    Sony Group Corporation (AIBO), Meta Platforms / University of California
    Published on: 2025-12-19 1 author
  • Sigma-MoE-Tiny Technical Report
    Microsoft / Microsoft Research
    Published on: 2025-12-19 1 author
  • Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking
    Amazon / University of Wisconsin-Madison
    Published on: 2025-12-19 1 author
  • Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
    Published on: 2025-12-19 1 author
  • Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.
    Published on: 2025-12-19
  • DVGT: Driving Visual Geometry Transformer
    Xiaomi / Tsinghua University
    Published on: 2025-12-18 1 author
  • RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing
    Tencent / The Chinese University of Hong Kong
    Published on: 2025-12-18 1 author
  • N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
    Tencent / Hong Kong University of Science and Technology
    Published on: 2025-12-18 1 author
  • Kling-Omni Technical Report
    Published on: 2025-12-18 1 author
  • EasyV2V: A High-quality Instruction-based Video Editing Framework
    Snap / King Abdullah University of Science and Technology (KAUST)
    Published on: 2025-12-18 1 author
  • FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
    Microsoft / Fudan University
    Published on: 2025-12-18 1 author
  • GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluatio
    Meta Platforms / Allen Institute for AI, University of California, University of Washington
    Published on: 2025-12-18 1 author
  • Addendum to GPT-5.2 System Card: GPT-5.2-Codex
    Published on: 2025-12-18 1 author
  • Monitoring Monitorability
    Published on: 2025-12-18 1 author
  • Spatia: Video Generation with Updatable Spatial Memory
    Microsoft / The University of Sydney
    Published on: 2025-12-17 1 author
  • Prompt Repetition Improves Non-Reasoning LLMs
    Published on: 2025-12-17 1 author
  • Towards a Science of Scaling Agent Systems
    Google / Massachusetts Institute of Technology
    Published on: 2025-12-17 1 author
  • Fast and Accurate Causal Parallel Decoding using Jacobi Forcing
    Snowflake / University of California
    Published on: 2025-12-16 1 author
  • TalkVerse: Democratizing Minute-Long Audio-Driven Video Generation
    Snap / The Chinese University of Hong Kong
    Published on: 2025-12-16 1 author
  • GLM-TTS Technical Report
    Z.ai / Tsinghua University
    Published on: 2025-12-16 1 author
  • Native and Compact Structured Latents for 3D Generation
    Microsoft / Tsinghua University
    Published on: 2025-12-16 1 author
  • One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation
    Published on: 2025-12-16 1 author
  • T5Gemma 2: Seeing, Reading, and Understanding Longer
    Published on: 2025-12-16 1 author
  • Evaluating AI’s ability to perform scientific research tasks
    Published on: 2025-12-16 1 author
  • AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path
    Tencent / Australian National University
    Published on: 2025-12-15 1 author
  • Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation
    Published on: 2025-12-15 1 author
  • Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10
    Published on: 2025-12-15 1 author
  • GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training
    Tencent / Tsinghua University
    Published on: 2025-12-15 1 author
  • KlingAvatar 2.0 Technical Report
    Published on: 2025-12-15 1 author
  • Wait, Wait, Wait... Why Do Reasoning Models Loop?
    Microsoft / Massachusetts Institute of Technology
    Published on: 2025-12-15 1 author
  • World Models Can Leverage Human Videos for Dexterous Manipulation
    Meta Platforms / New York University
    Published on: 2025-12-15 1 author
  • Towards Scalable Pre-training of Visual Tokenizers for Generation
    MiniMax / Huazhong University of Science and Technology
    Published on: 2025-12-15 1 author
  • Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
    Published on: 2025-12-15 1 author
  • Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal
    Sony Group Corporation (AIBO) / Massachusetts Institute of Technology
    Published on: 2025-12-14 1 author
  • Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
    Kuaishou Technology / Peking University
    Published on: 2025-12-14 1 author
  • Diffusion Language Model Inference with Monte Carlo Tree Search
    Amazon / Dartmouth College
    Published on: 2025-12-13 1 author
  • SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder
    Kuaishou Technology / Tsinghua University
    Published on: 2025-12-12 1 author
  • SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving
    Published on: 2025-12-11 9 authors
  • Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases
    Meta Platforms / Harvard University
    Published on: 2025-12-11 11 authors
  • CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving
    AMD / Columbia University, Yale University
    Published on: 2025-12-11 2 authors
  • BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models
    Sony Group Corporation (AIBO) / Boston University
    Published on: 2025-12-11 1 author
  • Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
    Snap / University of California
    Published on: 2025-12-11 1 author
  • Glance: Accelerating Diffusion Models with 1 Sample
    Microsoft / Wissenschaftliche Hochschule für Unternehmensführung
    Published on: 2025-12-11 1 author
  • Sharp Monocular View Synthesis in Less Than a Second
    Published on: 2025-12-11 1 author
  • On Learning-Curve Monotonicity for Maximum Likelihood Estimators
    Published on: 2025-12-11 1 author
  • Matrix-game 2.0: An open-source real-time and streaming interactive world model
    Published on: 2025-12-10 1 author
  • UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving
    Published on: 2025-12-10 1 author
  • Efficiently Reconstructing Dynamic Scenes One D4RT at a Time
    Google / University College London
    Published on: 2025-12-10 1 author
  • PAVAS: Physics-Aware Video-to-Audio Synthesis
    Sony Group Corporation (AIBO) / Korea Advanced Institute of Science & Technology
    Published on: 2025-12-09 1 author
0 AIs selected
Clear selection
#
Name
Task