Papers
-
Slim attention: cut your context memory in half without loss – K-cache is all you need for MHA
-
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
-
Qwen3 Technical Report
-
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
-
WizardLM: Empowering large pre-trained language models to follow complex instructions
-
Florence: A New Foundation Model for Computer Vision
-
Iterative Reranking as a Compute-Scaling Method for LLM-based Rankers
-
KG-CRAFT: Knowledge graph-based contrastive reasoning with LLMs for enhancing automated fact-checking
-
Pattern Discovery with Wide-Lens Analysis and Sharp-Focus Validation
-
ChatLLM network: More brains, more intelligenceBeijing Institute of Technology
-
OmniSapiens: A Foundation Model for Social Behavior Processing via HARPOMIT, National University of Singapore
-
GameDevBench Evaluating Agentic Capabilities Through Game Development Wayne Chi1 , Yixiong Fang1 , Arnav Yayavaram1 , Siddharth Yayavaram1 , Seth Karten2Carniege Mellon University, Princeton University
-
Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset
-
Intelligence Explosion
-
Can Post-Training Transform LLMs into Causal Reasoners?Fudan University, Shanghai Artificial Intelligence Laboratory
-
Learning a Generative Meta-Model of LLM ActivationsUC Berkeley
-
Self-Consistency Improves Chain of Thought Reasoning in Language Models
-
Using a GPT-5-driven autonomous lab to optimize the cost and titer of cell-free protein synthesis
-
Knowledge-Intensive AgentsNortheastern University, China
-
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques
-
Closing the Loop: Universal Repository Representation with RPG-Encoder
-
Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity
-
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations
-
Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems
-
Generative AI for Enzyme Design and Biocatalysis
-
Argument Rarity-based Originality Assessment for AI-Assisted WritingRitsumeikan Global Innovation Research Organization
-
AgentRx: Diagnosing AI Agent Failures from Execution Trajectories
-
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Complex Real-World Tasks
-
Lost in Transmission: When and Why LLMs Fail to Reason Globally
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
-
Latent Diffusion for Internet of Things Attack Data Generation in Intrusion Detection SystemsUniversidad Rey Juan Carlos
-
SOFAI-LM: A Cognitive Architecture for Building Efficient and Reliable Reasoning Systems with LLMs
-
Small Models, Big Impact: Tool-Augmented AI Agents for Wireless Network PlanningKing Abdullah University of Science and Technology (KAUST)
-
Recurrent Confidence Chain: Temporal-Aware Uncertainty Quantification in Large Language Models
-
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
-
Hardware Acceleration for Neural Networks: A Comprehensive SurveyArizona State University
-
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World ModelsThe Hong Kong Polytechnic University
-
AgriAgent: Contract-Driven Planning and Capability-Aware Tool Orchestration in Real-World Agriculture
-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.
-
Addendum to GPT-5.2 System Card: GPT-5.2-Codex
-
Monitoring Monitorability
-
Evaluating AI’s ability to perform scientific research tasks
-
On Learning-Curve Monotonicity for Maximum Likelihood Estimators
-
Training LLMs for Honesty via Confessions
-
Early Science Acceleration Experiments with GPT-5
-
Weight-Sparse Transformers Have Interpretable Circuits
-
SageServe: Optimizing LLM Serving on Cloud Data Centers with Forecast Aware Auto-Scaling
-
Signs of introspection in large language models
-
Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
-
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
