Anthos

⚠️ IMPORTANT NOTICE

  1. It's tiny. 984K params. 120K steps on Oxford Flowers 102. 256×256 out.
  2. Petals are negotiable. It can draw a rose. Sometimes the rose has too many petals, or stamens in odd places, or a stem that becomes a vine. That's the territory at 0.98M params.
  3. Not Stable Diffusion. Class-conditional DiT-Nano/2. No text encoder, no safety filter, no upscaler. The 102 Oxford Flowers classes are the entire vocabulary.

Quick Stats

Stat Value
Parameters 983,808
Architecture DiT-Nano/2 (6 blocks, hidden 96, 4 heads, patch 2, SwiGLU)
Training Steps 120,000
Training Time ~18 min on an RTX Pro 6000
Precision bfloat16
Output 256 × 256
Latents 32 × 32 × 4
Classes 102
Loss 1.843 → 0.880 (flow-matching MSE)
Sampling Heun, 50 steps, CFG 4.0

What Is This?

Anthos means flower in Greek. That's the whole naming story.

Trained it as a rectified flow rather than diffusion, so the network predicts the velocity field between noise and data. Architecture is just a regular DiT — adaLN-Zero, SwiGLU MLPs, sin-cos pos embed. SD-VAE for the latent side, Heun or Euler at sample time.

Dataset: every image in Oxford Flowers 102 (8,189 across train+val+test), plus a horizontal-flip copy. Encoded once through the VAE and dumped into VRAM as BF16 channels_last. 130 MB on the card. GPULatentLoader shuffles on-GPU and yields batches straight out of VRAM, so a step is just a forward pass.

120K steps in 18 minutes on the RTX Pro 6000. Loss went 1.84 → 0.88. We saved a sample grid every 2K steps to watch the convergence.

Meant as a sanity check on the training loop. It turned out to be a flower generator.

Samples

Sample Grid

4×4, class-conditional, step 120K, CFG 4.0, Heun 50. Each tile is a different Oxford Flowers class. Some are clearly the right flower. Some are a guess.

Model Specifications

Parameter Value
Architecture DiT
Variant DiT-Nano/2
Depth 6
Hidden Size 96
Heads 4 (head dim 24)
Patch Size 2
Grid 16 × 16 = 256 tokens
MLP SwiGLU, ratio 2.0
Norm LayerNorm, no affine on block norms (adaLN)
Attention QK-LayerNorm, SDPA
Conditioning AdaLN-Zero, t + y
Class Dropout 0.1
Class Embed 102 + 1 (null)
Pos Embed 2D sin-cos, frozen
VAE stabilityai/sd-vae-ft-ema, 8× downsample, 4 ch
VAE Scale 0.18215
Output Ch 4 (no learned sigma)

Training Details

Parameter Value
Dataset Oxford Flowers 102 (train+val+test, 8,189 imgs)
Aug identity + hflip = 16,378 latents
Storage all in VRAM, channels_last BF16
Batch 256
Grad Accum 1
Optim AdamW, β=(0.9, 0.95), wd=0, fused
LR 1e-4, 1K-step linear warmup, then constant
Grad Clip 1.0
EMA 0.9999
t Sampler logit-normal (μ=0, σ=1)
Loss flow-matching MSE on velocity
CFG Dropout 0.1 (10% labels → null token)
Precision BF16 autocast, FP32 reductions
Compile torch.compile(mode="max-autotune")
GPU RTX Pro 6000, 96 GB, sm_120
Step Rate ~111 it/s
Wall 1078s for 120K steps

Benchmarks

Loss curve, sampled from the log: 1.843 → 1.71 (1K) → 1.31 (10K) → 1.04 (50K) → 0.91 (100K) → 0.880 (120K). Monotone down. Didn't bother with FID or IS.

Usage

from pipeline import AnthosPipeline

pipe = AnthosPipeline(repo_dir=".")

# every class
imgs = pipe(classes="all", seed=0)
imgs[0].save("out.png")  # class 0 is pink primrose, not a rose, by the way

# specific names or ids, comma-separated
imgs = pipe("rose,sunflower,daffodil", n_per_class=2, seed=42)
for i, im in enumerate(imgs):
    im.save(f"flower_{i:02d}.png")

# fiddling
imgs = pipe(73, steps=100, cfg_scale=2.5, sampler="euler", seed=7)

CLI:

python pipeline.py "rose,sunflower,daffodil" --n-per-class 2 --seed 42 --out out.png

For the Gradio demo, see this.

Files

File What it is
model.safetensors EMA weights, 3.95 MB
config.json arch + sampling config
modeling.py DiT + samplers
pipeline.py AnthosPipeline
classes.txt 102 names, id⟂tab⟂name
convert_checkpoint.py final.pt → safetensors
sample_grid.png 4×4 grid, step 120K
requirements.txt deps

Limitations

  • 102 classes, hard-coded. There's no prompt, so no "sunset over a meadow."
  • Output is 256×256. Bigger needs an upscaler.
  • 0.98M params, which is enough for a rose but absolutely not enough for Stable Diffusion.
  • A few classes stayed rough through training. "Barberton daisy" and "mexican petunia" in particular. Oxford Flowers 102 is class-imbalanced and we didn't rebalance it.
  • No FID/IS numbers. We looked at the samples.
  • Don't use it in a textbook or in production.

Citation

@misc{anthos2026,
  author = {Glint Research},
  title  = {Anthos: a 984K-parameter class-conditional DiT on Oxford Flowers 102},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/Glint-Research/Anthos}
}

Built by Glint Research.

Downloads last month
-
Safetensors
Model size
984k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Glint-Research/Anthos-1 1

Collection including Glint-Research/Anthos-1