🧠✨ TransformerLM (Diffusion 784, 32) — MNIST

Training run artifacts from https://github.com/triloy8/transformerlm: a minimal masked discrete diffusion Transformer trained on MNIST with a fixed 784‑token context (28×28 image tokens) and conditional generation using discrete labels plus a null label for classifier‑free guidance (CFG).

✅ Key Facts

  • Model type: Diffusion Transformer with LLaDA‑style objective
  • Dataset: MNIST (binned to 32 levels)
  • Context length: 784 tokens (28×28 image)
  • Layers: 12
  • Heads: 8
  • d_model: 256
  • d_ff: 1024
  • Training setup: Single NVIDIA A40 (48GB)
  • Runtime: ~2 hours ⏱️

📦 What’s Inside

  • 6k steps (full run), including:
    • Optimizer state
    • RNG state
    • Safetensors weights
  • Run config

🚀 Reproducibility

Exact commit that launched the run: https://github.com/triloy8/transformerlm/commit/84a190a106ecefb7cad49f47eac24963d97fe000

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train trixyL/transformerlm-diff-32-mnist