Papers
-
Logics-Parsing-Omni Technical Report
-
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR
-
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
-
Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective
-
SecAgent: Efficient Mobile GUI Agent with Semantic Context
-
Reference-guided Policy Optimization for Molecular Optimization via LLM ReasoningAlibaba / Central Michigan University, DAMO Academy, Hong Kong Baptist University, Shanghai Jiao Tong University
-
Efficient Vector Search in the Wild: One Model for Multi-K Queries
-
Making Training-Free Diffusion Segmentors Scale with the Generative PowerAlibaba / Chinese Academy of Sciences, Sun Yat-sen University, University of Chinese Academy of Sciences
-
On the Generalization Capacities of MLLMs for Spatial Intelligence
-
Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes
-
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
-
SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
-
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Complex Real-World Tasks
-
Extracting books from production language models
-
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
-
Tongyi DeepResearch Technical Report
-
Robix: A Unified Model for Robot Interaction, Reasoning and Planning
-
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
-
Qwen3 Technical Report
-
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration
-
Qwen2.5-Omni Technical Report
-
Qwen2.5-VL Technical Report
-
Qwen2.5 Technical Report
-
Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
-
Qwen2-Audio Technical Report
-
Qwen2 Technical Report
-
mPLUG-Owl : Modularization Empowers Large Language Models with Multimodality
-
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
-
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
-
Qwen Technical Report
-
DAMO-YOLO : A Report on Real-Time Object Detection Design
-
VECO 2.0: Cross-lingual Language Model Pre-training with Multi-granularity Contrastive Learning
-
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
-
Prompt Tuning for Generative Multimodal Pretrained Models
-
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
-
M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems
-
ML-Decoder: Scalable and Versatile Classification Head
-
M6: A Chinese Multimodal Pretrainer
-
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
MongoDB - Build AI That Scales
