Papers
-
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
-
The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference
-
Dynamic Chunking Diffusion Transformer
-
AI+HW 2035: Shaping the Next DecadeNVIDIA, Google, AMD, IBM, Together AI, OpenAI, SEMRON, EnCharge AI, SambaNova, SK Hynix, Oracle / Agentrys, Brown University, California Institute of Technology, Carnegie Mellon University, Hewlett Packard Labs, New York University, Princeton University, Stanford University, University at Buffalo, University of California, University of Illinois Urbana-Champaign, University of Pennsylvania, University of Texas
-
Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks
-
GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
-
AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection
-
Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers
-
M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization
-
Zebra-Llama: Towards Extremely Efficient Hybrid Models
-
Power Aware Dynamic Reallocation For Inference
-
AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines
-
CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models
-
CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving
-
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
-
Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering
-
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
-
Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
-
Agent Laboratory: Using LLM Agents as Research Assistants
-
Agent Laboratory: Using LLM Agents as Research Assistants
MongoDB - Build AI That Scales
