Treeswift
Treeswift is a derivative of DistilGPT2 trained to be conversational. It is also designed to be similar to GPT-3.5.
Training
The model was trained using 2,750 steps, and 4 batch size.
Datasets
The training corpus is made up of:
The train / train_sft splits were used.
Chat template
The Zephyr chat template was used, but most notably, chat template tokens were added to enhance performance.
Limitations
The model frequently outputs incorrect information, confirmation with a larger, mature model is advised. In addition, it may subtly repeat.
- Downloads last month
- 350