A newer version of this model is available: qikp/hummingbird-3-110m
Hummingbird
🎉 You are looking at Hummingbird 2, trained on a much more efficient corpus, achieving similar performance with 3x less parameters!
Hummingbird is a GPT-2 derivative trained to be conversational.
Training
The model was trained using the paged_adamw_8bit optimizer, gradient checkpointing, 500 steps, 1 batch size, and 4 gradient accumulation steps.
Datasets
The training corpus is made up of:
- First 1400 rows of qikp/reborn-5k-no-thoughts
- First 500 rows of HuggingFaceTB/smol-smoltalk
- First 100 rows of HuggingFaceTB/everyday-conversations-llama3.1-2k
The train / train_sft splits were used.
Chat template
The Zephyr chat template was used.
Limitations
The model frequently outputs incorrect information, confirmation with a larger, mature model is advised.
Benchmark
This model was tested against GAIA and compared using embeddings. See the results here.
- Downloads last month
- 22