Papers
-
Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning
-
Towards an AI-Augmented Textbook
-
Steering MoE LLMs via Expert (De)Activation
-
Robix: A Unified Model for Robot Interaction, Reasoning and Planning
-
AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data
-
An AI System to Help Scientists Write Expert-Level Empirical Software
-
Why Language Models Hallucinate
-
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
-
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
-
Measuring the environmental impact of delivering AI at Google Scale
-
3D-GENERALIST: Vision-Language-Action Models for Crafting 3D Worlds
-
X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms
-
NoProp: Training Neural Networks without Full Back-propagation or Full Forward-propagation
-
Matrix-3D: Omnidirectional Explorable 3D World Generation
-
Amazon Ads Multi-Touch Attribution
-
Scaling Laws for Native Multimodal Models
-
Devstral: Fine-tuning Language Models for Coding Agent Applications
-
Establishing Best Practices for Building Rigorous Agentic Benchmarks
-
No LLM Solved Yu Tsumura's 554th Problem
-
Why do LLMs attend to the first token
-
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
-
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
-
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
-
Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
-
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
-
Kimi K2: Open Agentic Intelligence
-
Scaling Data-Constrained Language Models
-
Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
-
Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them
-
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
-
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
-
Voxtral
-
Apple Intelligence Foundation Language Models: Tech Report 2025
-
Non-preemptive Throughput Maximizationunder Time-varying Capacity
-
MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
-
A Survey of Automatic Prompt Optimization with Instruction-focused Heuristic-based Search Algorithm
-
SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt Optimization
-
Skywork-R1V3 Technical Report
-
SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Age
-
Unconditional Diffusion for Generative Sequential Recommendation
-
Evaluating the Critical Risks of Amazon’s Nova Premier under the Frontier Model Safety Framework
-
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
-
`For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts
-
UMA: A Family of Universal Models for Atoms
-
Hierarchical Reasoning Model
-
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
-
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
-
KIMI-VL TECHNICAL REPORT
-
TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation
