Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -16,7 +16,7 @@ pipeline_tag: text-classification
|
|
| 16 |
|
| 17 |
# Modular Multiplication Transformer
|
| 18 |
|
| 19 |
-
A 1-layer, 4-head transformer trained on **(a x b) mod 113** that exhibits **grokking**
|
| 20 |
|
| 21 |
## Model Architecture
|
| 22 |
|
|
|
|
| 16 |
|
| 17 |
# Modular Multiplication Transformer
|
| 18 |
|
| 19 |
+
A 1-layer, 4-head transformer trained on **(a x b) mod 113** that exhibits **grokking** (delayed generalization after memorization). This checkpoint includes full training history (400 checkpoints across 40,000 epochs).
|
| 20 |
|
| 21 |
## Model Architecture
|
| 22 |
|