A newer version of this model is available: qikp/kite-4.1-14m

Kite

🎉 You are looking at Kite 3.5, which is now using the Qwen3 architecture, a different dataset and an efficient configuration!

Kite is a small, trained, 10 million parameter language model.

Training

It was trained on 50K rows of a tokenized version of Cosmopedia v0.1 using 1 epoch, 32 batch size, 1.5e-4 learning rate, and the pika 3 tokenizer.

Due to its size, the model is not suitable for production workloads.

Safetensors

Model size

9.38M params

Tensor type

F32