trixyL
/

transformerlm-diff-32-mnist

Model card Files Files and versions

🧠✨ TransformerLM (Diffusion 784, 32) — MNIST

Training run artifacts from https://github.com/triloy8/transformerlm: a minimal masked discrete diffusion Transformer trained on MNIST with a fixed 784‑token context (28×28 image tokens) and conditional generation using discrete labels plus a null label for classifier‑free guidance (CFG).

✅ Key Facts

Model type: Diffusion Transformer with LLaDA‑style objective
Dataset: MNIST (binned to 32 levels)
Context length: 784 tokens (28×28 image)
Layers: 12
Heads: 8
d_model: 256
d_ff: 1024
Training setup: Single NVIDIA A40 (48GB)
Runtime: ~2 hours ⏱️

📦 What’s Inside

6k steps (full run), including:
- Optimizer state
- RNG state
- Safetensors weights
Run config

🚀 Reproducibility

Exact commit that launched the run: https://github.com/triloy8/transformerlm/commit/84a190a106ecefb7cad49f47eac24963d97fe000

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train trixyL/transformerlm-diff-32-mnist