TensorRT Edge-LLM

NVIDIA / TensorRT-Edge-LLM

Repository page

High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI

481 89 Language: Python License: Apache-2.0 Updated: 16h ago

📚Large Language Models 🎮3D game assets

README

TensorRT Edge-LLM

High-Performance Large Language Model Inference Framework for NVIDIA Edge Platforms

Overview   |   Examples   |   Documentation   |   Roadmap

Overview

TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of state-of-the-art language models on resource-constrained devices such as NVIDIA Jetson and NVIDIA DRIVE platforms. TensorRT Edge-LLM provides convenient Python scripts to convert HuggingFace checkpoints to ONNX. Engine build and end-to-end inference runs entirely on Edge platforms.

Getting Started

For the supported platforms, models and precisions, see the Overview. Get started with TensorRT Edge-LLM in <15 minutes. For complete installation and usage instructions, see the Quick Start Guide.

Documentation

Introduction

Overview - What is TensorRT Edge-LLM and key features
Supported Models - Complete model compatibility matrix

User Guide

Installation - Set up Python export pipeline and C++ runtime
Quick Start Guide - Run your first inference in ~15 minutes
Examples - End-to-end LLM, VLM, EAGLE, and LoRA workflows
Input Format Guide - Request format and specifications
Chat Template Format - Chat template configuration

Developer Guide

Software Design

Python Export Pipeline - Model export and quantization
Engine Builder - Building TensorRT engines
C++ Runtime Overview - Runtime system architecture
- LLM Inference Runtime
- LLM SpecDecode Runtime

Advanced Topics

Customization Guide - Customizing TensorRT Edge-LLM for your needs
TensorRT Plugins - Custom plugin development
Tests - Comprehensive test suite for contributors

Use Cases

🚗 Automotive

In-vehicle AI assistants
Voice-controlled interfaces
Scene understanding
Driver assistance systems

🤖 Robotics

Natural language interaction
Task planning and reasoning
Visual question answering
Human-robot collaboration

🏭 Industrial IoT

Equipment monitoring with NLP
Automated inspection
Predictive maintenance
Voice-controlled machinery

📱 Edge Devices

On-device chatbots
Offline language processing
Privacy-preserving AI
Low-latency inference

Tech Blogs

Coming soon

Stay tuned for technical deep-dives, optimization guides, and deployment best practices.

Latest News

[01/05] 🚀 Accelerate AI Inference for Edge and Robotics with NVIDIA Jetson T4000 and NVIDIA JetPack 7.1 ✨ ➡️ link
[01/05] 🚀 Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM ✨ ➡️ link

Follow our GitHub repository for the latest updates, releases, and announcements.

Support

Documentation: Full Documentation
Examples: Code Examples
Roadmap: Developer Roadmap
Issues: GitHub Issues
Discussions: GitHub Discussions
Forums: NVIDIA Developer Forums

License

Apache License 2.0

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Search

TensorRT Edge-LLM

Report repository

README

TensorRT Edge-LLM

Overview

Getting Started

Documentation

Introduction

User Guide

Developer Guide

Software Design

Advanced Topics

Use Cases

Tech Blogs

Latest News

Support

License

Contributing

Go to section

Search

TensorRT Edge-LLM

README

TensorRT Edge-LLM

Overview

Getting Started

Documentation

Introduction

User Guide

Developer Guide

Software Design

Advanced Topics

Use Cases

Tech Blogs

Latest News

Support

License

Contributing

Help

People also viewed

Feedback and Incident Report

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: