Trained a 17M Language Model for CS336 Assignment 1.
- beta0:0.9
- beta1:0.95
- context_length:256
- d_ff:1,344
- d_model:512
- eval_batches:200
- log_interval:100
- lr:0.001
- max_steps:30,000
- num_heads:16
- num_layers:4
- precision:"bf16"
- theta:10,000
- vocab_size:10,000
train/val loss is around 1.5. https://wandb.ai/rrisoncai-na/cs336-assignment1/runs/vvr52w7e?nw=nwuserrrisoncai
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support