Papers

Filter by company

D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning

Tencent / Shanghai Jiao Tong University

Published on: 2025-12-26 1 author
Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration

Apple

Published on: 2025-12-26 1 author
DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO

Kuaishou Technology

Published on: 2025-12-25 1 author
SemanticGen: Video Generation in Semantic Space

Kuaishou Technology / Zhejiang University

Published on: 2025-12-25 1 author
Streaming Video Instruction Tuning

Tencent / Hong Kong Baptist University

Published on: 2025-12-24 1 author
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

DeepSeek

Published on: 2025-12-23 1 author
FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

Snap / Sun Yat-sen University

Published on: 2025-12-23 1 author
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

ByteDance

Published on: 2025-12-23 1 author
COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

Intel

Published on: 2025-12-22 1 author
From Word to World: Can Large Language Models be Implicit Text-based World Models?

Microsoft / Southern University of Science and Technology

Published on: 2025-12-21 1 author
Secret mixtures of experts inside your LLM

University of Pennsylvania, Wharton School of Statistics and Data Science

Published on: 2025-12-20 1 author
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

ByteDance

Published on: 2025-12-19 22 authors
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Xiaomi / The University of Hong Kong

Published on: 2025-12-19 1 author
Diffusion Forcing for Multi-Agent Interaction Sequence Modeling

Sony Group Corporation (AIBO), Meta Platforms / UC Berkeley

Published on: 2025-12-19 1 author
Sigma-MoE-Tiny Technical Report

Microsoft / Microsoft Research

Published on: 2025-12-19 1 author
Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking

Amazon / University of Wisconsin-Madison

Published on: 2025-12-19 1 author
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Meta Platforms

Published on: 2025-12-19 1 author
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.

Google

Published on: 2025-12-19
DVGT: Driving Visual Geometry Transformer

Xiaomi / Tsinghua University

Published on: 2025-12-18 1 author
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

Tencent / The Chinese University of Hong Kong

Published on: 2025-12-18 1 author
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Tencent / Hong Kong University of Science and Technology

Published on: 2025-12-18 1 author
Kling-Omni Technical Report

Kuaishou Technology

Published on: 2025-12-18 1 author
EasyV2V: A High-quality Instruction-based Video Editing Framework

Snap

Published on: 2025-12-18 1 author
FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction

Microsoft / Fudan University

Published on: 2025-12-18 1 author
GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluatio

Meta Platforms

Published on: 2025-12-18 1 author
Addendum to GPT-5.2 System Card: GPT-5.2-Codex

OpenAI

Published on: 2025-12-18 1 author
Monitoring Monitorability

OpenAI

Published on: 2025-12-18 1 author
Spatia: Video Generation with Updatable Spatial Memory

Microsoft / The University of Sydney

Published on: 2025-12-17 1 author
Prompt Repetition Improves Non-Reasoning LLMs

Google

Published on: 2025-12-17 1 author
Towards a Science of Scaling Agent Systems

Google / MIT

Published on: 2025-12-17 1 author
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Snowflake / UC San Diego

Published on: 2025-12-16 1 author
TalkVerse: Democratizing Minute-Long Audio-Driven Video Generation

Snap / The Chinese University of Hong Kong

Published on: 2025-12-16 1 author
GLM-TTS Technical Report

Z.ai / Tsinghua University

Published on: 2025-12-16 1 author
Native and Compact Structured Latents for 3D Generation

Microsoft / Tsinghua University

Published on: 2025-12-16 1 author
One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Apple

Published on: 2025-12-16 1 author
T5Gemma 2: Seeing, Reading, and Understanding Longer

Google

Published on: 2025-12-16 1 author
Evaluating AI’s ability to perform scientific research tasks

OpenAI

Published on: 2025-12-16 1 author
AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path

Tencent / Australian National University

Published on: 2025-12-15 1 author
Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation

Tencent

Published on: 2025-12-15 1 author
Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10

Tencent

Published on: 2025-12-15 1 author
GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

Tencent / Tsinghua University

Published on: 2025-12-15 1 author
KlingAvatar 2.0 Technical Report

Kuaishou Technology

Published on: 2025-12-15 1 author
Wait, Wait, Wait... Why Do Reasoning Models Loop?

Microsoft / MIT

Published on: 2025-12-15 1 author
World Models Can Leverage Human Videos for Dexterous Manipulation

Meta Platforms / New York University

Published on: 2025-12-15 1 author
Towards Scalable Pre-training of Visual Tokenizers for Generation

MiniMax / Huazhong University of Science and Technology

Published on: 2025-12-15 1 author
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

ByteDance

Published on: 2025-12-15 1 author
Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal

Sony Group Corporation (AIBO) / MIT

Published on: 2025-12-14 1 author
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Kuaishou Technology / Peking University

Published on: 2025-12-14 1 author
Diffusion Language Model Inference with Monte Carlo Tree Search

Amazon / Dartmouth College

Published on: 2025-12-13 1 author
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Kuaishou Technology

Published on: 2025-12-12 1 author

Prev 58 59 60 61 62 63 64 65 66 67 68 Next

Search

Papers

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: