BEE-spoke-data/smol_llama-101M-GQA
Text Generation • 0.1B • Updated
• 1.95k • 32
small-scale pretraining experiments of mine
Note smol_llama-220M-GQA CPT on fineweb-edu for 10 billion tokens
Note this is a mid-training checkpoint of what is now smol_llama-220M