A newer version of this model is available: qikp/kite-3.1-20m

Kite

🎉 You are looking at Kite 3, which is now using FineWeb-Edu and pika 3, lower learning rate, and is now public domain-like!

Kite is a small, trained, 15 million parameter language model, without any special optimizations.

Training

It was trained on 50K rows of FineWeb Edu using 1 epoch, 4 batch size, 1.5e-4 learning rate, and the pika 3 tokenizer.

Limitations

Due to its size, the model is not suitable for production workloads.

Downloads last month
477
Safetensors
Model size
16.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train qikp/kite-3-15m