Papers

Filter by company

Prompt Repetition Improves Non-Reasoning LLMs

Google

Published on: 2025-12-17 1 author
Towards a Science of Scaling Agent Systems

Google / MIT

Published on: 2025-12-17 1 author
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Snowflake / UC San Diego

Published on: 2025-12-16 1 author
TalkVerse: Democratizing Minute-Long Audio-Driven Video Generation

Snap / The Chinese University of Hong Kong

Published on: 2025-12-16 1 author
GLM-TTS Technical Report

Z.ai / Tsinghua University

Published on: 2025-12-16 1 author
Native and Compact Structured Latents for 3D Generation

Microsoft / Tsinghua University

Published on: 2025-12-16 1 author
One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Apple

Published on: 2025-12-16 1 author
T5Gemma 2: Seeing, Reading, and Understanding Longer

Google

Published on: 2025-12-16 1 author
Evaluating AI’s ability to perform scientific research tasks

OpenAI

Published on: 2025-12-16 1 author
AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path

Tencent / Australian National University

Published on: 2025-12-15 1 author
Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation

Tencent

Published on: 2025-12-15 1 author
Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10

Tencent

Published on: 2025-12-15 1 author
GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

Tencent / Tsinghua University

Published on: 2025-12-15 1 author
KlingAvatar 2.0 Technical Report

Kuaishou Technology

Published on: 2025-12-15 1 author
Wait, Wait, Wait... Why Do Reasoning Models Loop?

Microsoft / MIT

Published on: 2025-12-15 1 author
World Models Can Leverage Human Videos for Dexterous Manipulation

Meta Platforms / New York University

Published on: 2025-12-15 1 author
Towards Scalable Pre-training of Visual Tokenizers for Generation

MiniMax / Huazhong University of Science and Technology

Published on: 2025-12-15 1 author
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

ByteDance

Published on: 2025-12-15 1 author
Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal

Sony Group Corporation (AIBO) / MIT

Published on: 2025-12-14 1 author
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Kuaishou Technology / Peking University

Published on: 2025-12-14 1 author
Diffusion Language Model Inference with Monte Carlo Tree Search

Amazon / Dartmouth College

Published on: 2025-12-13 1 author
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Kuaishou Technology

Published on: 2025-12-12 1 author
Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases

Meta Platforms / Harvard University

Published on: 2025-12-11 11 authors
CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving

AMD / Columbia University, Yale University

Published on: 2025-12-11 2 authors
BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models

Sony Group Corporation (AIBO) / Boston University

Published on: 2025-12-11 1 author
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Snap / UC Merced

Published on: 2025-12-11 1 author
Glance: Accelerating Diffusion Models with 1 Sample

Microsoft / Wissenschaftliche Hochschule für Unternehmensführung

Published on: 2025-12-11 1 author
Sharp Monocular View Synthesis in Less Than a Second

Apple

Published on: 2025-12-11 1 author
On Learning-Curve Monotonicity for Maximum Likelihood Estimators

OpenAI

Published on: 2025-12-11 1 author
Matrix-game 2.0: An open-source real-time and streaming interactive world model

Published on: 2025-12-10 1 author
UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

ByteDance

Published on: 2025-12-10 1 author
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Google / University College London

Published on: 2025-12-10 1 author
PAVAS: Physics-Aware Video-to-Audio Synthesis

Sony Group Corporation (AIBO) / Korea Advanced Institute of Science & Technology

Published on: 2025-12-09 1 author
Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation

Apple / Duke University

Published on: 2025-12-09 1 author
HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Snap / Sun Yat-sen University

Published on: 2025-12-09 1 author
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models

Snap / Sun Yat-sen University

Published on: 2025-12-09 1 author
Process Reward Models That Think

LG AI / University of Michigan

Published on: 2025-12-08 1 author
Distribution Matching Variational AutoEncoder

Tencent / Peking University

Published on: 2025-12-08 1 author
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Published on: 2025-12-08 1 author
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Kuaishou Technology / Tsinghua University

Published on: 2025-12-08 1 author
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents

Moonshot AI / Peking University

Published on: 2025-12-08 1 author
Unsupervised decoding of encoded reasoning using language model interpretability

Anthropic

Published on: 2025-12-06 1 author
EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Meituan / Beihang University

Published on: 2025-12-05 1 author
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Kuaishou Technology

Published on: 2025-12-05 1 author
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

Snap / Rice University

Published on: 2025-12-05 1 author
Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability

Z.ai

Published on: 2025-12-05 1 author
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Microsoft / Beijing Jiaotong University

Published on: 2025-12-05 1 author
SIMA 2: A Generalist Embodied Agent for Virtual Worlds

Google / Google DeepMind

Published on: 2025-12-04 1 author
SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

Snap / Simon Fraser University

Published on: 2025-12-03 1 author
Training LLMs for Honesty via Confessions

OpenAI

Published on: 2025-12-03

Prev 39 40 41 42 43 44 45 46 47 48 49 Next

Search

Papers

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: