Anthos

Anthos is a class-conditional latent diffusion model trained with rectified flow matching on the Oxford Flowers 102 dataset. It generates 256x256 images across 102 flower categories using a DiT-Nano/2 architecture of approximately 984K parameters. It is a research artifact: a minimal, transparent, end-to-end training and sampling demonstration.

Notice

Anthos is a research prototype. It is not Stable Diffusion, does not include a text encoder, safety filter, or upscaler, and operates exclusively over the 102 Oxford Flowers class vocabulary. Output quality reflects the scale of the model. Use accordingly.

At a Glance

Property	Value
Parameters	983,808
Architecture	DiT-Nano/2 (6 blocks, hidden dim 96, 4 heads, patch size 2, SwiGLU)
Training Steps	120,000
Training Duration	~18 minutes on an RTX Pro 6000
Precision	bfloat16
Output Resolution	256 x 256
Latent Shape	32 x 32 x 4
Number of Classes	102
Final Loss	1.843 → 0.880 (flow-matching MSE)
Sampler	Heun, 50 steps, CFG scale 4.0

Background

The name derives from the Greek word for flower. The model was built as a sanity check on a rectified flow training loop and turned into a functional flower generator in the process.

Rather than predicting noise, the network predicts the velocity field transporting a sample from Gaussian noise to the data distribution. The architecture is a standard DiT with adaLN-Zero conditioning, SwiGLU MLPs, and sinusoidal 2D positional embeddings. The latent space is provided by the Stability AI VAE (stabilityai/sd-vae-ft-ema), which compresses 256x256 images to 32x32x4 latents at an 8x spatial downsampling factor.

The entire Oxford Flowers 102 dataset, including train, validation, and test splits (8,189 images), was encoded once through the VAE, augmented with horizontal flips to yield 16,378 latents, and stored in VRAM as BF16 channels-last tensors. A custom GPULatentLoader shuffles and batches directly from VRAM, reducing each training step to a forward pass and an optimizer update.

At approximately 111 iterations per second, 120,000 steps completed in under 18 minutes. Loss decreased monotonically from 1.843 to 0.880.

Sample Output

4x4 class-conditional grid, step 120,000, CFG scale 4.0, Heun sampler, 50 steps. Each tile corresponds to a distinct Oxford Flowers 102 class.

Model Specification

Parameter	Value
Architecture	Diffusion Transformer (DiT)
Variant	DiT-Nano/2
Depth	6 blocks
Hidden Size	96
Attention Heads	4 (head dimension 24)
Patch Size	2
Token Grid	16 x 16 = 256 tokens
MLP Type	SwiGLU, expansion ratio 2.0
Normalization	LayerNorm; adaLN-Zero on block norms
Attention	QK-LayerNorm, scaled dot-product attention
Conditioning	AdaLN-Zero on timestep and class label
Class Dropout Rate	0.1
Class Embedding	102 classes + 1 null token
Positional Embedding	2D sinusoidal, frozen
VAE	`stabilityai/sd-vae-ft-ema`, 8x downsample, 4 channels
VAE Scaling Factor	0.18215
Output Channels	4 (velocity prediction; no learned sigma)

Training Details

Parameter	Value
Dataset	Oxford Flowers 102 (train + val + test, 8,189 images)
Augmentation	Identity + horizontal flip = 16,378 latents
Latent Storage	Full dataset in VRAM, channels-last BF16
Batch Size	256
Gradient Accumulation	1
Optimizer	AdamW, beta=(0.9, 0.95), weight decay=0, fused
Learning Rate	1e-4, 1,000-step linear warmup, then constant
Gradient Clipping	1.0
EMA Decay	0.9999
Timestep Sampler	Logit-normal (mu=0, sigma=1)
Loss Function	Flow-matching MSE on velocity field
CFG Dropout	0.1 (10% of labels replaced with null token)
Precision	BF16 autocast, FP32 reductions
Compilation	`torch.compile(mode="max-autotune")`
Hardware	RTX Pro 6000, 96 GB VRAM, sm_120
Throughput	~111 steps/second
Total Wall Time	1,078 seconds for 120,000 steps

Loss Curve

Loss was logged throughout training. Selected values are reported below. No FID or Inception Score was computed; evaluation was performed by visual inspection of sample grids saved every 2,000 steps.

Step	Loss
0	1.843
1,000	1.710
10,000	1.310
50,000	1.040
100,000	0.910
120,000	0.880

Usage

Python API

from pipeline import AnthosPipeline

pipe = AnthosPipeline(repo_dir=".")

# Generate one image per class across all 102 classes
imgs = pipe(classes="all", seed=0)
imgs[0].save("out.png")

# Generate images for specific classes by name or ID
imgs = pipe("rose,sunflower,daffodil", n_per_class=2, seed=42)
for i, img in enumerate(imgs):
    img.save(f"flower_{i:02d}.png")

# Fine-grained sampler control
imgs = pipe(73, steps=100, cfg_scale=2.5, sampler="euler", seed=7)
imgs[0].save("class_73.png")

Command-Line Interface

python pipeline.py "rose,sunflower,daffodil" --n-per-class 2 --seed 42 --out out.png

Gradio Demo

An interactive demo is available at Glint-Research/Anthos-1.

Repository Contents

File	Description
`model.safetensors`	EMA weights, 3.95 MB
`config.json`	Architecture and sampling configuration
`modeling.py`	DiT implementation and sampler definitions
`pipeline.py`	`AnthosPipeline` inference wrapper
`classes.txt`	102 class names in `id\tname` format
`convert_checkpoint.py`	Converts `final.pt` training checkpoint to safetensors
`sample_grid.png`	4x4 sample grid at step 120,000
`requirements.txt`	Python dependencies

Limitations

Fixed vocabulary. The model conditions on one of 102 discrete class labels. Free-form text prompts are not supported.
Fixed resolution. Output is 256x256. Higher-resolution output requires an external upscaler.
Scale constraints. At 984K parameters, the model cannot match the fidelity of large-scale generative models. Fine structure, particularly complex petal arrangements and unusual stamen geometry, is occasionally incorrect.
Class imbalance. Oxford Flowers 102 is not class-balanced, and no rebalancing was applied. Several classes, including Barberton daisy and Mexican petunia, exhibit noticeably lower output quality.
No quantitative evaluation. FID and Inception Score were not computed. Assessment is based on visual inspection only.
Not for production or publication. This model is a research prototype and should not be used in production systems or as a primary source in academic or journalistic work.

Citation

@misc{anthos2026,
  author    = {Glint Research},
  title     = {Anthos: A 984K-Parameter Class-Conditional DiT on Oxford Flowers 102},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/Glint-Research/Anthos}
}

Built by Glint Research

Downloads last month: 13

Safetensors

Model size

984k params

Tensor type

F32

Space using Glint-Research/Anthos-1 1

Collection including Glint-Research/Anthos-1

Image models

Collection

1 item • Updated 11 days ago • 4