Source code
This is a small language model trained on the Julius caeser text. The source code of the model is available at - Link to the codebase
The model configurations are as follows -
Model configuration
- learning rate = 1e-4
- max iters = 10000
- warmup steps = 2000
- min lr = 5e-4
- eval iters = 500
- batch size = 8
- block size = 128
- vocab size=50257
- block size=128
- number of layers=4
- number of heads=4
- embedding dimension=768
- dropout=0.01
- bias=True
Test Vs Validation loss
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
