Papers
-
BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation
-
Voxtral Realtime
-
Accelerating Mathematical and Scientific Discovery with Gemini Deep Think
-
OmniSapiens: A Foundation Model for Social Behavior Processing via HARPOMassachusetts Institute of Technology, Nanyang Technological University, National University of Singapore, Qatar Computing Research Institute, University of Rochester
-
GameDevBench Evaluating Agentic Capabilities Through Game Development Wayne Chi1 , Yixiong Fang1 , Arnav Yayavaram1 , Siddharth Yayavaram1 , Seth Karten2Carnegie Mellon University, Princeton University
-
ArtifactLens: Hundreds of Labels Are Enough for Artifact Detection with VLMs
-
EchoJEPA: A Latent Predictive Foundation Model for EchocardiographyCohere / University Health Network, University of California, University of Toronto, Vector Institute
-
Federated Balanced Learning
-
MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
-
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
-
Autoregressive Image Generation with Masked Bit Modeling
-
iGRPO: Self-Feedback–Driven LLM Reasonin
-
ML-DCN: Masked Low-Rank Deep Crossing Network Towards Scalable Ads Click-through Rate Prediction at Pinterest
-
Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset
-
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing
-
VirtualEnv: A Platform for Embodied AI Research
-
Intelligence ExplosionMachine Intelligence Research Institute
-
DriveWorld-VLA: Unified Latent-Space World Modeling with Vision-Language-Action for Autonomous Driving
-
ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training
-
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMsMeituan / Nanjing University, National University of Singapore, Shanghai Jiao Tong University, Zhejiang University
-
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
-
DreamDojo: A Generalist Robot World Model from Large-Scale Human VideosNVIDIA / Hong Kong University of Science and Technology, Korea Advanced Institute of Science & Technology, Stanford University, University of California, University of Toronto, University of Washington
-
Can Post-Training Transform LLMs into Causal Reasoners?Fudan University, Shanghai Artificial Intelligence Laboratory
-
Learning a Generative Meta-Model of LLM ActivationsUC Berkeley
-
Self-Consistency Improves Chain of Thought Reasoning in Language Models
-
Large Language Model Reasoning FailuresCarleton College, Stanford University
-
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel GenerationsTikTok / Hong Kong University of Science and Technology, Nanyang Technological University, The Chinese University of Hong Kong
-
Learning to Discover at Test Time
-
MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
-
Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review
-
RISE-Video: Can Video Generators Decode Implicit World Rules?
-
Vector Quantization using Gaussian Variational Autoencoder
-
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning
-
Using a GPT-5-driven autonomous lab to optimize the cost and titer of cell-free protein synthesis
-
Knowledge-Intensive AgentsNortheastern University, China
-
Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers
-
Asynchronous Reasoning: Training-Free Interactive Thinking LLMs
-
OpenOneRec Technical Report
-
Learning to Reason in 13 Parameters
-
A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces
-
LLMs as Orchestrators: Constraint-Compliant Multi-Agent Optimization for Recommendation Systems
-
IRIS: Implicit Reward-Guided Internal Sifting for Mitigating Multimodal Hallucination
-
BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations
-
ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution
-
HY3D-Bench: Generation of 3D Assets
-
Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory
-
CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability
-
LIVE: Long-horizon Interactive Video World Modeling
-
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent
-
Generative Engine Optimization: A VLM and Agent Framework for Pinterest Acquisition Growth
MongoDB - Build AI That Scales
