Papers
-
TrajTok: Learning Trajectory Tokens enables better Video Understanding
-
The Design Space of Tri-Modal Masked Diffusion Models
-
Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
-
Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment
-
DSO: Direct Steering Optimization for Bias Mitigation
-
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
-
MemoryLLM: Plug-n-Play Interpretable Feed-Forward Memory for Transformers
-
RayRoPE: Projective Ray Positional Encoding for Multi-view Attention
-
GenCtrl -- A Formal Controllability Toolkit for Generative Models
-
NarrativeTrack: Evaluating Video Language Models Beyond the Frame
-
Delay-Tolerant Networking for Tsunami Evacuation on the Small Island of Hachijojima: A Study of Epidemic and Prophet Routing
-
Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration
-
One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation
-
Sharp Monocular View Synthesis in Less Than a Second
-
Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation
-
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
-
Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
-
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
-
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
-
Apple Intelligence Foundation Language Models: Tech Report 2025
-
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
-
FastVLM: Efficient Vision Encoding for Vision Language Models
-
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
-
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
-
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
-
AToken: A Unified Tokenizer for Vision
-
pfl-research: simulation framework for accelerating research in Private Federated Learning
-
Controlling Language and Diffusion Models by Transporting Activations
-
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
-
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
-
Efficient Large Language Model Inference with Limited Memory
-
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages
-
Diffusion Models Without Attention
-
Differentially Private Heavy Hitter Detection using Federated Analytics
-
Application-Agnostic Language Modeling for On-Device ASR
-
Stable Diffusion with Core ML on Apple Silicon
-
Training a Tokenizer for Free with Private Federated Learning
-
On-device Panoptic Segmentation for Camera Using Transformers
-
Federated Evaluation and Tuning for On-Device Personalization: System Design & Applications
-
Overton: A Data System for Monitoring and Improving Machine-Learned Products
-
Learning with Privacy at Scale
