| Model specification | |
| - Params: 21 million | |
| - Architecture: Decoder-only transformer | |
| - Training data: 1.1 million tokens from Shakespeare text | |
| - Context length: 256 |
| Model specification | |
| - Params: 21 million | |
| - Architecture: Decoder-only transformer | |
| - Training data: 1.1 million tokens from Shakespeare text | |
| - Context length: 256 |