BurnyCoder commited on
Commit
df0959a
·
verified ·
1 Parent(s): 438d6bd

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ pipeline_tag: text-classification
16
 
17
  # Modular Multiplication Transformer
18
 
19
- A 1-layer, 4-head transformer trained on **(a x b) mod 113** that exhibits **grokking** delayed generalization after memorization. This checkpoint includes full training history (400 checkpoints across 40,000 epochs).
20
 
21
  ## Model Architecture
22
 
 
16
 
17
  # Modular Multiplication Transformer
18
 
19
+ A 1-layer, 4-head transformer trained on **(a x b) mod 113** that exhibits **grokking** (delayed generalization after memorization). This checkpoint includes full training history (400 checkpoints across 40,000 epochs).
20
 
21
  ## Model Architecture
22