AirLLM

AirLLM

AirLLM reduces peak GPU memory by executing models layer-by-layer (keeping most weights off GPU until needed), enabling constrained-hardware inference without requiring model compression as the core premise of the project.

Overview

AirLLM is an inference system that enables running very large LLMs on low-VRAM GPUs by loading model layers from disk in a memory-optimized execution pipeline.

About 0xSojalSec

View Company Profile

Tools using AirLLM

No tools found for this model yet.

Last updated: March 12, 2026

Search

Overview

About 0xSojalSec

Tools using AirLLM

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: