🧠✨ TransformerLM (Diffusion 784, 32) — MNIST
Training run artifacts from https://github.com/triloy8/transformerlm: a minimal masked discrete diffusion Transformer trained on MNIST with a fixed 784‑token context (28×28 image tokens) and conditional generation using discrete labels plus a null label for classifier‑free guidance (CFG).
✅ Key Facts
- Model type: Diffusion Transformer with LLaDA‑style objective
- Dataset: MNIST (binned to 32 levels)
- Context length: 784 tokens (28×28 image)
- Layers: 12
- Heads: 8
- d_model: 256
- d_ff: 1024
- Training setup: Single NVIDIA A40 (48GB)
- Runtime: ~2 hours ⏱️
📦 What’s Inside
- 6k steps (full run), including:
- Optimizer state
- RNG state
- Safetensors weights
- Run config
🚀 Reproducibility
Exact commit that launched the run: https://github.com/triloy8/transformerlm/commit/84a190a106ecefb7cad49f47eac24963d97fe000
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support