Trained a 17M Language Model for CS336 Assignment 1.

  • beta0:0.9
  • beta1:0.95
  • context_length:256
  • d_ff:1,344
  • d_model:512
  • eval_batches:200
  • log_interval:100
  • lr:0.001
  • max_steps:30,000
  • num_heads:16
  • num_layers:4
  • precision:"bf16"
  • theta:10,000
  • vocab_size:10,000

train/val loss is around 1.5. https://wandb.ai/rrisoncai-na/cs336-assignment1/runs/vvr52w7e?nw=nwuserrrisoncai

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support