Mouad

Create README.md

343495a verified 17 days ago

389 Bytes

This contains the final 45M trained diffusion language model as well as the tokenizer trained from scratch on TinyStories Dataset.

The model architecture is provided as well :

cfg = {
    "vocab_size": 26_000,
    "context_length": 256,
    "emb_dim": 512,
    "n_layers": 10,
    "n_heads": 8,
    "d_ff": 2048, # 4*emb_dim
    "dropout": 0.1,
    "diffusion_steps": 128
}