This contains the final 45M trained diffusion language model as well as the tokenizer trained from scratch on TinyStories Dataset. The model architecture is provided as well : ```python cfg = { "vocab_size": 26_000, "context_length": 256, "emb_dim": 512, "n_layers": 10, "n_heads": 8, "d_ff": 2048, # 4*emb_dim "dropout": 0.1, "diffusion_steps": 128 } ```