This contains the final 45M trained diffusion language model as well as the tokenizer trained from scratch on TinyStories Dataset.

The model architecture is provided as well : 
```python
cfg = {
    "vocab_size": 26_000,
    "context_length": 256,
    "emb_dim": 512,
    "n_layers": 10,
    "n_heads": 8,
    "d_ff": 2048, # 4*emb_dim
    "dropout": 0.1,
    "diffusion_steps": 128
}
```