TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

AirLLM

AirLLM reduces peak GPU memory by executing models layer-by-layer (keeping most weights off GPU until needed), enabling constrained-hardware inference without requiring model compression as the core premise of the project.
New Text Gen 7
Released: September 21, 2024

Overview

AirLLM is an inference system that enables running very large LLMs on low-VRAM GPUs by loading model layers from disk in a memory-optimized execution pipeline.

About 0xSojalSec

View Company Profile

Tools using AirLLM

No tools found for this model yet.

Last updated: March 12, 2026
0 AIs selected
Clear selection
#
Name
Task