checkpoint @ 31,404,851,200 tokens trained, settings used:

batch_size: 512
context_length: 1024
learning_rate: 2e-4
schedule: cosine with 10% warmup from 0 to 2e-4, cooldown to 0

requirements:

pytorch
transformers
einops
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support