| This contains the final 45M trained diffusion language model as well as the tokenizer trained from scratch on TinyStories Dataset. | |
| The model architecture is provided as well : | |
| ```python | |
| cfg = { | |
| "vocab_size": 26_000, | |
| "context_length": 256, | |
| "emb_dim": 512, | |
| "n_layers": 10, | |
| "n_heads": 8, | |
| "d_ff": 2048, # 4*emb_dim | |
| "dropout": 0.1, | |
| "diffusion_steps": 128 | |
| } | |
| ``` |