ShadowLM Trainer
A fine-tuning SDK โ any open model, any harness, any method. 12 training methods behind one argument; pure-stdlib core.
README
ShadowLM Trainer
A fine-tuning SDK. Any open model โ with any method, on any hardware, for any harness.
Open source ยท built by Lyzr Research Labs ยท maintained by Khush Patel ยท slmโฅ
pip install shadowlm # batteries included โ the full training stack
import shadowlm as slm
ds = slm.Dataset.from_jsonl("data.jsonl").as_chat() # datasets
model = slm.load("mlx-community/Qwen2.5-0.5B-Instruct-4bit", # load
accelerator="shadow")
run = model.finetune(ds, method="lora", max_steps=60) # finetune
print(run.loss, run.sparkline()) # live metrics
print(model.generate("What is the capital of France?")) # inference
model.save("out/", fmt="adapter") # ship it
Change method="lora" to qlora, dora, full, dpo, grpo, more, bitfit,
prompt, ptuning, adapter, cpt โ and nothing else changes. That's the idea.
What ShadowLM is for
Your agent runs on a rented frontier model โ general, costly, someone else's.
ShadowLM moves one task to a small model you own, without touching the
agent: it keeps calling the same endpoint; only the model behind it changes.
What you end up with is a shadowLM โ a small fine-tuned model that shadows
the frontier model, runs in its shadow on real traffic until it does the job as
well, then takes over. Lower cost, data stays inside, the weights are yours.
- Baseline โ your agent runs on the frontier model.
- Capture & fine-tune โ
slm.capture()records the real traffic; train a small open model on it. - Shadow mode โ the shadowLM runs behind the same agent, answering in parallel so you can compare.
- Gradual switch โ once it holds up, route traffic to the shadowLM. You own it.
This repo is the engine for that loop. The orchestration that wraps it into a
one-click migration is ShadowLM Studio.
Agent tuning in three steps
with slm.capture(model) as proxy: # 1. record your agent, unchanged
run_my_agent(base_url=proxy.base_url) # any OpenAI-client harness
group = slm.judge_group( # 2. score whole episodes (LLM judge)
slm.TrajectoryGroup(proxy.trajectories()), judge=judge)
run = model.finetune([group], method="grpo") # 3. train the shadowLM on them
No reward math, no rewriting the agent into an RL framework โ the model API is
the one boundary every agent already has, so ShadowLM trains from it.
What you get today
The whole capture โ judge โ train โ own a shadowLM loop runs on these:
| Block | What it does | API |
|---|---|---|
| Capture proxy | drop-in OpenAI endpoint that records your agent's traffic into trajectories โ agent unchanged | slm.capture() |
| 12 methods | LoRA ยท QLoRA ยท DoRA ยท full ยท CPT ยท DPO ยท GRPO ยท MoRE ยท BitFit ยท prompt ยท p-tuning ยท adapter | method= |
| Judge โ train | score episodes with an LLM judge, train with trajectory-GRPO or DPO | judge_group |
| APO | optimize the prompt instead of weights โ same capture/judge front end, no GPU | slm.optimize_prompt() |
| VERL RL | production multi-GPU GRPO (vLLM rollouts + FSDP) for cluster-scale RL | backend="verl" |
| MoRE | facts fused into attention โ near-zero-hallucination recall | method="more" |
| Any hardware | CUDA ยท TPU ยท Trainium ยท Intel ยท Apple ยท CPU (whatever HF accelerate targets) | device= |
| Shadow accelerator | 4-bit, grad checkpointing, flash-attn, fused optimizer, optional Liger kernels โ logged, never silent | accelerator="shadow" |
| Checkpoints | save every N steps, then load or A/B any version โ step 200 vs final โ in the playground |
save_steps= ยท run.checkpoint_at(step) |
| Remote + server | train on a GPU box or fleet over one JSON protocol; metrics stream back | backend="remote" ยท shadowlm serve |
| Studio | datasets โ models โ guided train โ live runs (charts + console) โ playground compare | shadowlm serve โ / |
| CLI | finetune / runs / plot / chat / export / methods from the shell | shadowlm โฆ |
| Own the weights | adapter/merged export, run records that survive restarts, nothing leaves your box | model.save() |
Training methods
Each technique is a declarative spec under shadowlm/methods/; backends read the
spec (adapter kind, base requirements, data rendering), never the method name.
| method | what it does | base | default LR |
|---|---|---|---|
lora |
LoRA adapters | either | 2e-4 |
qlora |
LoRA on a 4-bit base, lowest memory | 4-bit | 2e-4 |
dora |
weight-decomposed LoRA, better at low rank | either | 2e-4 |
full |
update every transformer weight | unquantized | 2e-5 |
cpt |
continued pretraining on raw domain text | either | 5e-5 |
dpo |
preference optimization on {prompt, chosen, rejected} |
either | 5e-6 |
grpo |
RL from reward functions or scored TrajectoryGroups |
either | 5e-6 |
more |
mixture of retrieval experts โ facts fused into attention | either | 1e-4 |
bitfit |
train only the bias terms (~0.1% of params) | unquantized | 5e-4 |
prompt/ptuning |
soft prompts / p-tuning โ learned virtual tokens | either | 5e-3 |
adapter |
bottleneck adapter modules after each layer | either | 1e-4 |
Base requirements are enforced with clear errors (e.g. qlora on a 16-bit model
tells you to load a 4-bit one). Adding your own method is one file โ
methods.register(TrainingMethod(...)).
Backends & hardware
torch (CUDA) is the production backend; mlx is the local-dev loop on Apple
Silicon; remote runs the same API against any ShadowLM server; verl is the
production, multi-GPU RL engine (vLLM rollouts + FSDP) for cluster-scale GRPO โ
pip install shadowlm[verl], then slm.load(model, backend="verl").finetune(ds, method="grpo", reward_fns=[โฆ]). auto picks the right one for SFT/local work.
The torch path rides HuggingFace Trainer + accelerate, so it trains on any
accelerator HuggingFace supports โ pick it with device=:
| ecosystem | how |
|---|---|
| NVIDIA CUDA | device="cuda" (+ 4-bit, flash-attn, fused optim) |
| AWS Trainium ยท Google TPU | device="xla" (Neuron / torch-xla) |
| Intel GPU | device="xpu" ยท Apple backend="mlx" ยท CPU device="cpu" |
On Microsoft Azure / any cloud you run on NVIDIA GPUs โ the cuda path, nothing
to configure.
Install
One command โ installs the right backend for your machine and opens the studio:
curl -fsSL https://install.shadowlm.sh | sh
It detects your hardware and installs the matching stack โ Apple Silicon โ mlx,
NVIDIA โ torch + Liger fused kernels, otherwise torch CPU โ into an isolated env
in ~/.shadowlm/venv, then launches shadowlm serve at http://127.0.0.1:8329.
Re-run any time to upgrade. Override with SHADOWLM_EXTRAS=cli (UI only),
SHADOWLM_PORT=โฆ, or SHADOWLM_NO_SERVE=1 (install without launching).
Or with pip โ pip install shadowlm ships the full training stack (torch +
HuggingFace, retrieval, CLI). On Apple Silicon the mlx dev backend is pulled in
automatically. Two extras stay opt-in for specialized hardware:
| extra | adds |
|---|---|
[kernels] |
fused Triton kernels on NVIDIA (Liger, Apache-2.0) |
[verl] |
the VERL distributed-RL backend (backend="verl") |
git clone https://github.com/open-gitagent/shadowLM && cd shadowLM
python3 -m venv .venv && source .venv/bin/activate && pip install -e .
python examples/quickstart.py # datasets โ finetune โ inference, end to end
No hardware handy? Test-drive the whole thing โ checkpoints, faiss MoRE, APO โ
on a free Colab GPU:
Run output (mlx, a 0.5B model, ~3.5s):
[shadow] enabled: gradient checkpointing
[mlx:gpu] finetuning Qwen2.5-0.5B-Instruct-4bit ยท lora ยท 40 iters ยท lora r=16
[โโโโโโโโโโโโโโโโโโโโโโโโ] step 40/40 loss 0.0718 lr 5.00e-05 1,048 tok/s
loss โโโโโโโโโ
โ
โโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ 4.2120 โ 0.0718
โฅ succeeded ยท 40 steps ยท 3.5s
CLI & studio
shadowlm finetune data.jsonl --model Qwen/Qwen2.5-0.5B-Instruct --method lora
shadowlm finetune --config run.yaml --dry-run # reproducible runs, preview first
shadowlm chat out/adapter/ # talk to what you trained
shadowlm serve # studio UI + API on one port
Headline hyperparameters are typed flags; every other TrainConfig field is
reachable via --set field=value or a --config file (flags override config
override defaults). shadowlm serve opens the studio at http://127.0.0.1:8329
โ Datasets (upload + HuggingFace) โ Models โ guided Train โ live Runs (loss
charts + training console) โ Playground (compare base โ finetuned). It's the
built React app, shipped in the wheel; the same JSON protocol powers
backend="remote".
The shadow accelerator
accelerator="shadow" turns on the optimizations that are safe for your model
and hardware โ gradient checkpointing, flash-attention-2, a fused 8-bit
optimizer, 4-bit QLoRA, and optional Liger
fused Triton kernels ([kernels] extra, NVIDIA). Modes: auto / shadow /
none. It logs exactly what it enabled and no-ops when something isn't
available โ ShadowLM integrates proven optimizations rather than shipping its own
GPU kernels, so no magic multipliers, just the standard wins turned on safely.
The road ahead
The engine ships first; ShadowLM Studio (the hosted tier) wraps this exact
API โ nothing reimplemented โ to turn the blocks into a one-click migration:
- Decision inbox โ captured traces surfaced for human approve/correct into chosen-vs-rejected pairs (today: auto-scored by an LLM judge).
- Eval gates โ advance only when quality holds and savings beat cost: task-level evals + cost-per-task on the run records.
- Shadow router โ the capture proxy evolved: run the shadowLM in parallel behind the live agent, then shift traffic % frontier โ owned.
- Fleet + teams โ GPU job queue, shared run history, dataset/adapter registry.
[x] SDK โ datasets โ finetune โ inference on mlx / torch / remote
[x] 12 methods incl. MoRE, trajectory GRPO, judge rewards
[x] Capture proxy ยท shadow accelerator ยท any-hardware
[x] Remote backend + reference server + the studio dashboard + CLI
[ ] Studio orchestration โ decision inbox ยท eval gates ยท shadow router ยท switch
Contributing
Adding a training method is one file; bug reports with a failing snippet are
gold. Fork โ branch โ PR. โญ the repo if it trains something for you โ it helps
others find it.
License
MIT ยท slmโฅ
MongoDB - Build AI That Scales
