This contains the final 45M trained diffusion language model as well as the tokenizer trained from scratch on TinyStories Dataset.
The model architecture is provided as well :
cfg = {
"vocab_size": 26_000,
"context_length": 256,
"emb_dim": 512,
"n_layers": 10,
"n_heads": 8,
"d_ff": 2048, # 4*emb_dim
"dropout": 0.1,
"diffusion_steps": 128
}