Text Generation
Transformers
Safetensors
English
gpt2
conversational
text-generation-inference

Treeswift

Treeswift is a derivative of DistilGPT2 trained to be conversational. It is also designed to be similar to GPT-3.5.

Training

The model was trained using 2,750 steps, and 4 batch size.

Datasets

The training corpus is made up of:

The train / train_sft splits were used.

Chat template

The Zephyr chat template was used, but most notably, chat template tokens were added to enhance performance.

Limitations

The model frequently outputs incorrect information, confirmation with a larger, mature model is advised. In addition, it may subtly repeat.

Downloads last month
350
Safetensors
Model size
81.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for qikp/treeswift-90m

Finetuned
(1461)
this model
Quantizations
1 model

Datasets used to train qikp/treeswift-90m