Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

WeichenFan / Spectral_Forcing

Repository page

Code of "Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion"

23 1 Language: Python License: MIT Updated: 22d ago

🖼️Image generation

README

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Weichen Fan1, Haiwen Diao1, Penghao Wu1, Ziwei Liu1✉

S-Lab, Nanyang Technological University1 ✉Corresponding Author.

</p>

Paper | Weights

Spectral Forcing (SF) is a parameter-free, input-side operator for pixel-space
diffusion / flow image generation. It applies a time-conditional 2D-DCT
low-pass mask to the denoiser's noisy input:

SF_t(z) = IDCT( DCT(z) ⊙ M(t) ),    M(t) = sigmoid( κ · (c(t) − r) )

where r is the normalized DCT radius and the radial cutoff c(t) = c_min + (c_max − c_min)·f(t) grows from c_min = 0.05 at t = 0 to c_max = 1.0
(identity) at t = 1, with sharpness κ = 30 and the linear schedule
f(t) = t. Early steps (mostly noise) keep only low frequencies; at the data
endpoint the mask is the identity. The forward process, v-prediction loss, EMA,
Heun sampler, and classifier-free guidance are all unchanged — SF is a drop-in
adapter costing ~0.5% extra compute (one forward + inverse DCT per step).

Left: the per-band data-to-noise ratio splits the (frequency, time) plane — a denoiser only does useful data-distribution work below the moving bandwidth front, while capacity above it is spent on a noise-dominated, closed-form solution. Right: Spectral Forcing applies a time-conditional 2D-DCT low-pass (DCT → mask → IDCT) to the network's input whose cutoff tracks that front, so only the signal-bearing low frequencies pass early and the full spectrum passes at the data endpoint.

Spectral Forcing transfers to native vision-language models: DPG-Bench overall and per-category.

Spectral-Forcing applied to SenseNovaU1 trained from scratch for 100k iters.

Installation

conda create -n sf python=3.10 -y
conda activate sf
pip install uv
uv pip install -r requirements.txt

Dataset

Download ImageNet and point IMAGENET_ROOT at
the directory that contains train/.

Training

Runs go through run_jit_multinode.sh, configured by environment variables (run
./run_jit_multinode.sh --help for the full list). Examples are for one 8-GPU
node; W&B defaults to offline. Flags after ./run_jit_multinode.sh are
forwarded to main_jit.py. The paper uses three fixed-patch configs:

Config	`MODEL`	patch size	tokens @ 256²
130M/32	`JiT-B/32`	32	64
130M/16	`JiT-B/16`	16	256
700M/32	`JiT-700M`	32	64

Baseline (vanilla JiT, no spectral forcing) — e.g. JiT-B/32:

IMAGENET_ROOT=/path/to/imagenet \
MODEL='JiT-B/32' IMG_SIZE=256 \
OUTPUT_DIR=output/jit_b32_baseline \
./run_jit_multinode.sh --disable_dct_patchify

Linear Spectral Forcing (full-image DCT) — JiT-B/32:

IMAGENET_ROOT=/path/to/imagenet \
MODEL='JiT-B/32' IMG_SIZE=256 \
OUTPUT_DIR=output/jit_b32_linear_sf \
./run_jit_multinode.sh

The DCT low-pass is applied over the full image — window_size defaults to
img_size, so no extra flag is needed; just change MODEL (and IMG_SIZE) for
the other configs. The SF settings (linear schedule, c_min=0.05, c_max=1.0,
κ=30) are the defaults in model_jit.py. To run SF with a time-independent
(constant) mask, add --disable_dct_scale_schedule --dct_fixed_scale 1.0.

Evaluation

Set EVALUATE_GEN=1 and point RESUME at a directory containing
checkpoint-last.pth. Pass the same architecture flags used for training so
the model matches the checkpoint (e.g. add --disable_dct_patchify when
evaluating a baseline checkpoint). Evaluation generates --num_images samples and
reports FID/IS against the precomputed reference stats in fid_stats/ (256/512).

IMAGENET_ROOT=/path/to/imagenet \
MODEL='JiT-B/32' IMG_SIZE=256 \
GEN_BSZ=125 EVALUATE_GEN=1 \
OUTPUT_DIR=output/jit_b32_linear_sf \
RESUME=output/jit_b32_linear_sf \
./run_jit_multinode.sh

prepare_ref.py can regenerate a reference image folder if you want to recompute
the FID statistics yourself.

Acknowledgements

Built on the JiT codebase.

BibTex

@misc{fan2026show,
  title         = {{Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion}},
  author        = {Fan, Weichen and Diao, Haiwen and Wu, Penghao and Liu, Ziwei},
  year          = {2026},
  eprint        = {2606.15236},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  doi           = {10.48550/arXiv.2606.15236},
  url           = {https://arxiv.org/abs/2606.15236}
}

Search

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Report repository

README

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Installation

Dataset

Training

Evaluation

Acknowledgements

BibTex

Go to section

Search

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

README

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Installation

Dataset

Training

Evaluation

Acknowledgements

BibTex

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: