File size: 164 Bytes
d58649e
55c830f
d58649e
 
 
1
2
3
4
5
Model specification 
- Params: 21 million 
- Architecture: Decoder-only transformer
- Training data: 1.1 million tokens from Shakespeare text
- Context length: 256