Papers
-
CUA-Skill: Develop Skills for Computer Using Agent
-
ReasonCACHE: Teaching LLMs To Reason Without Weight Updates
-
SimMerge: Learning to Select Merge Operators from Similarity Signals
-
Argument Rarity-based Originality Assessment for AI-Assisted WritingRitsumeikan Global Innovation Research Organization
-
AgentRx: Diagnosing AI Agent Failures from Execution Trajectories
-
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
-
What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-Zoom
-
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Complex Real-World Tasks
-
Toward Fully Autonomous Driving: AI, Challenges, Opportunities, and Needs
-
AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment
-
SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
-
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
-
MemoryLLM: Plug-n-Play Interpretable Feed-Forward Memory for Transformers
-
LLM-42: Enabling Determinism in LLM Inference with Verified Speculation
-
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
-
Lost in Transmission: When and Why LLMs Fail to Reason Globally
-
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing
-
PI-Light: Physics-Inspired Diffusion for Full-Image Relighting
-
M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization
-
Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding
-
Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models
-
DeepSeek-OCR 2: Visual Causal Flow
-
WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models
-
Efficient Autoregressive Video Diffusion with Dummy Head
-
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
-
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
-
Astra: General Interactive World Model with Autoregressive Denoising
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
-
Differentiable Semantic ID for Generative Recommendation
-
Dep-Search: Learning Dependency-Aware Reasoning Traces with Persistent Memory
-
Agentic Very Long Video Understanding
-
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
-
AnyView: Synthesizing Any Novel View in Dynamic Scenes
-
Latent Diffusion for Internet of Things Attack Data Generation in Intrusion Detection SystemsUniversidad Rey Juan Carlos
-
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents
-
CamPilot: Improving Camera Control in Video Diffusion Model with Efficient Camera Reward Feedback
-
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs
-
DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
-
RayRoPE: Projective Ray Positional Encoding for Multi-view Attention
-
OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
-
Unified Text-Image Generation with Weakness-Targeted Post-Training
-
Zebra-Llama: Towards Extremely Efficient Hybrid Models
-
From Chains to Graphs: Self-Structured Reasoning for General-Domain LLMs
-
Learning Latent Action World Models In The Wild
-
SOFAI-LM: A Cognitive Architecture for Building Efficient and Reliable Reasoning Systems with LLMs
-
Small Models, Big Impact: Tool-Augmented AI Agents for Wireless Network PlanningKing Abdullah University of Science and Technology (KAUST)
-
Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time
-
RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering
-
S2DiT: Sandwich Diffusion Transformer for Mobile Streaming Video Generation
-
Recurrent Confidence Chain: Temporal-Aware Uncertainty Quantification in Large Language Models
