Mouad
Create README.md
343495a verified
|
Raw
History Blame Contribute Delete
389 Bytes

This contains the final 45M trained diffusion language model as well as the tokenizer trained from scratch on TinyStories Dataset.

The model architecture is provided as well :

cfg = {
    "vocab_size": 26_000,
    "context_length": 256,
    "emb_dim": 512,
    "n_layers": 10,
    "n_heads": 8,
    "d_ff": 2048, # 4*emb_dim
    "dropout": 0.1,
    "diffusion_steps": 128
}