NVIDIA / TensorRT-Edge-LLM
High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI
README
<div align="center">
TensorRT Edge-LLM
High-Performance Large Language Model Inference Framework for NVIDIA Edge Platforms
Overview | Examples | Documentation | Roadmap
<div align="left">
Overview
TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of state-of-the-art language models on resource-constrained devices such as NVIDIA Jetson and NVIDIA DRIVE platforms. TensorRT Edge-LLM provides convenient Python scripts to convert HuggingFace checkpoints to ONNX. Engine build and end-to-end inference runs entirely on Edge platforms.
Getting Started
For the supported platforms, models and precisions, see the Overview. Get started with TensorRT Edge-LLM in <15 minutes. For complete installation and usage instructions, see the Quick Start Guide.
Documentation
Introduction
- Overview - What is TensorRT Edge-LLM and key features
- Supported Models - Complete model compatibility matrix
User Guide
- Installation - Set up Python export pipeline and C++ runtime
- Quick Start Guide - Run your first inference in ~15 minutes
- Examples - End-to-end LLM, VLM, EAGLE, and LoRA workflows
- Input Format Guide - Request format and specifications
- Chat Template Format - Chat template configuration
Developer Guide
Software Design
- Python Export Pipeline - Model export and quantization
- Engine Builder - Building TensorRT engines
- C++ Runtime Overview - Runtime system architecture
Advanced Topics
- Customization Guide - Customizing TensorRT Edge-LLM for your needs
- TensorRT Plugins - Custom plugin development
- Tests - Comprehensive test suite for contributors
Use Cases
🚗 Automotive
- In-vehicle AI assistants
- Voice-controlled interfaces
- Scene understanding
- Driver assistance systems
🤖 Robotics
- Natural language interaction
- Task planning and reasoning
- Visual question answering
- Human-robot collaboration
🏭 Industrial IoT
- Equipment monitoring with NLP
- Automated inspection
- Predictive maintenance
- Voice-controlled machinery
📱 Edge Devices
- On-device chatbots
- Offline language processing
- Privacy-preserving AI
- Low-latency inference
Tech Blogs
Coming soon
Stay tuned for technical deep-dives, optimization guides, and deployment best practices.
Latest News
- [01/05] 🚀 Accelerate AI Inference for Edge and Robotics with NVIDIA Jetson T4000 and NVIDIA JetPack 7.1 ✨ ➡️ link
- [01/05] 🚀 Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM ✨ ➡️ link
Follow our GitHub repository for the latest updates, releases, and announcements.
Support
- Documentation: Full Documentation
- Examples: Code Examples
- Roadmap: Developer Roadmap
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Forums: NVIDIA Developer Forums
License
Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
