Update README.md
Browse files
README.md
CHANGED
|
@@ -20,13 +20,12 @@ tags:
|
|
| 20 |
|
| 21 |
# Model Card for Phoenix
|
| 22 |
|
| 23 |
-
|
| 24 |
**Phoenix** is a model trained using Direct Preference Optimization (DPO) for the german language. Its training procedure follows the process of the alignment-handbook from Huggingface.
|
| 25 |
In contrast to zephyr and notus this model has been trained using german instruction and dpo data. In detail, a german translation of HuggingFaceH4/ultrachat_200k
|
| 26 |
and HuggingFaceH4/ultrafeedback_binarized were created in addition to a series of allready available instruction datasets. The LLM haoranxu/ALMA-13B was used for this.
|
| 27 |
While the mistral model performs really well, it is not really suitable for the german language. Therefore we have used the fantastic LeoLM/leo-mistral-hessianai-7b.
|
| 28 |
Thanks to the new type of training, Phoenix is not only able to compete with the Mistral model from LeoLM but also **beats the Llama-70b-chat model in 2 mt-bench categories**.
|
| 29 |
-
This model **wouldn't have been possible without the amazing work of Huggingface, LeoLM, openbnb,
|
| 30 |
i would like to personally thank all AI researchers who make the training of such models possible
|
| 31 |
|
| 32 |
## MT-Bench-DE Scores
|
|
@@ -72,7 +71,7 @@ Florian Leurer compared Phoenix to other LLMs. Check it out here:
|
|
| 72 |
### Model Sources
|
| 73 |
|
| 74 |
- **Repository:** -
|
| 75 |
-
- **Paper:**
|
| 76 |
- **Demo:** -
|
| 77 |
|
| 78 |
## Training Details
|
|
@@ -116,8 +115,8 @@ You will first need to install `transformers` and `accelerate` (just to ease the
|
|
| 116 |
```python
|
| 117 |
import torch
|
| 118 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 119 |
-
model = AutoModelForCausalLM.from_pretrained("DRXD1000/Phoenix
|
| 120 |
-
tokenizer = AutoTokenizer.from_pretrained("DRXD1000/Phoenix
|
| 121 |
prompt = """<|system|>
|
| 122 |
</s>
|
| 123 |
<|user|>
|
|
@@ -131,9 +130,9 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
| 131 |
|
| 132 |
## Ethical Considerations and Limitations
|
| 133 |
|
| 134 |
-
As with all LLMs, the potential outputs of `DRXD1000/Phoenix
|
| 135 |
in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses
|
| 136 |
-
to user prompts. Therefore, before deploying any applications of `DRXD1000/Phoenix
|
| 137 |
perform safety testing and tuning tailored to their specific applications of the model.
|
| 138 |
Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/).
|
| 139 |
|
|
@@ -144,6 +143,22 @@ Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-
|
|
| 144 |
### Training hyperparameters
|
| 145 |
|
| 146 |
The following hyperparameters were used during training:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
- learning_rate: 5e-07
|
| 148 |
- train_batch_size: 8
|
| 149 |
- eval_batch_size: 4
|
|
@@ -157,6 +172,18 @@ The following hyperparameters were used during training:
|
|
| 157 |
- lr_scheduler_warmup_ratio: 0.1
|
| 158 |
- num_epochs: 1
|
| 159 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 160 |
|
| 161 |
### Framework versions
|
| 162 |
|
|
|
|
| 20 |
|
| 21 |
# Model Card for Phoenix
|
| 22 |
|
|
|
|
| 23 |
**Phoenix** is a model trained using Direct Preference Optimization (DPO) for the german language. Its training procedure follows the process of the alignment-handbook from Huggingface.
|
| 24 |
In contrast to zephyr and notus this model has been trained using german instruction and dpo data. In detail, a german translation of HuggingFaceH4/ultrachat_200k
|
| 25 |
and HuggingFaceH4/ultrafeedback_binarized were created in addition to a series of allready available instruction datasets. The LLM haoranxu/ALMA-13B was used for this.
|
| 26 |
While the mistral model performs really well, it is not really suitable for the german language. Therefore we have used the fantastic LeoLM/leo-mistral-hessianai-7b.
|
| 27 |
Thanks to the new type of training, Phoenix is not only able to compete with the Mistral model from LeoLM but also **beats the Llama-70b-chat model in 2 mt-bench categories**.
|
| 28 |
+
This model **wouldn't have been possible without the amazing work of Huggingface, LeoLM, openbnb, argilla, the Alma-Team and many others of the AI community**.
|
| 29 |
i would like to personally thank all AI researchers who make the training of such models possible
|
| 30 |
|
| 31 |
## MT-Bench-DE Scores
|
|
|
|
| 71 |
### Model Sources
|
| 72 |
|
| 73 |
- **Repository:** -
|
| 74 |
+
- **Paper:** [`PHOENIX: Open-Source Language Adaption for Direct Preference Optimization`](https://arxiv.org/abs/2401.10580)
|
| 75 |
- **Demo:** -
|
| 76 |
|
| 77 |
## Training Details
|
|
|
|
| 115 |
```python
|
| 116 |
import torch
|
| 117 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 118 |
+
model = AutoModelForCausalLM.from_pretrained("DRXD1000/Phoenix", torch_dtype=torch.bfloat16, device_map="auto")
|
| 119 |
+
tokenizer = AutoTokenizer.from_pretrained("DRXD1000/Phoenix")
|
| 120 |
prompt = """<|system|>
|
| 121 |
</s>
|
| 122 |
<|user|>
|
|
|
|
| 130 |
|
| 131 |
## Ethical Considerations and Limitations
|
| 132 |
|
| 133 |
+
As with all LLMs, the potential outputs of `DRXD1000/Phoenix` cannot be predicted
|
| 134 |
in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses
|
| 135 |
+
to user prompts. Therefore, before deploying any applications of `DRXD1000/Phoenix`, developers should
|
| 136 |
perform safety testing and tuning tailored to their specific applications of the model.
|
| 137 |
Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/).
|
| 138 |
|
|
|
|
| 143 |
### Training hyperparameters
|
| 144 |
|
| 145 |
The following hyperparameters were used during training:
|
| 146 |
+
|
| 147 |
+
#### SFT Training
|
| 148 |
+
- learning_rate: 2e-05
|
| 149 |
+
- train_batch_size: 32
|
| 150 |
+
- eval_batch_size: 16
|
| 151 |
+
- seed: 42
|
| 152 |
+
- distributed_type: multi-GPU
|
| 153 |
+
- num_devices: 8
|
| 154 |
+
- gradient_accumulation_steps: 2
|
| 155 |
+
- total_train_batch_size: 512
|
| 156 |
+
- total_eval_batch_size: 128
|
| 157 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 158 |
+
- lr_scheduler_type: cosine
|
| 159 |
+
- num_epochs: 1
|
| 160 |
+
|
| 161 |
+
#### DPO Training
|
| 162 |
- learning_rate: 5e-07
|
| 163 |
- train_batch_size: 8
|
| 164 |
- eval_batch_size: 4
|
|
|
|
| 172 |
- lr_scheduler_warmup_ratio: 0.1
|
| 173 |
- num_epochs: 1
|
| 174 |
|
| 175 |
+
### Citation
|
| 176 |
+
```
|
| 177 |
+
@misc{uhlig2024phoenix,
|
| 178 |
+
title={PHOENIX: Open-Source Language Adaption for Direct Preference Optimization},
|
| 179 |
+
author={Matthias Uhlig and Sigurd Schacht and Sudarshan Kamath Barkur},
|
| 180 |
+
year={2024},
|
| 181 |
+
eprint={2401.10580},
|
| 182 |
+
archivePrefix={arXiv},
|
| 183 |
+
primaryClass={cs.CL}
|
| 184 |
+
}
|
| 185 |
+
```
|
| 186 |
+
|
| 187 |
|
| 188 |
### Framework versions
|
| 189 |
|