Update README.md
Browse files
README.md
CHANGED
|
@@ -28,7 +28,7 @@ This model is a fine-tuned version of TODO on [ReBatch/ultrafeedback_nl](https:/
|
|
| 28 |
|
| 29 |
## Model description
|
| 30 |
|
| 31 |
-
This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further finetuned with QLoRA.
|
| 32 |
|
| 33 |
|
| 34 |
## Intended uses & limitations
|
|
@@ -56,7 +56,7 @@ Mistral-7B-v0.3-Instruct | 60.76 / 45.39 | 13.20 / 34.26 | 23.23 / 59.26 | 48.94
|
|
| 56 |
Finetuned by [Julien Van den Avenne](https://huggingface.co/vandeju)
|
| 57 |
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
The following hyperparameters were used during training:
|
| 62 |
- learning_rate: 5e-06
|
|
@@ -70,8 +70,8 @@ The following hyperparameters were used during training:
|
|
| 70 |
- lr_scheduler_type: cosine
|
| 71 |
- lr_scheduler_warmup_ratio: 0.1
|
| 72 |
- num_epochs: 1
|
| 73 |
-
|
| 74 |
-
|
| 75 |
|
| 76 |
- PEFT 0.11.1
|
| 77 |
- Transformers 4.41.2
|
|
|
|
| 28 |
|
| 29 |
## Model description
|
| 30 |
|
| 31 |
+
This model is a Dutch chat model, originally developed from Mistral 7B v0.3 Instruct and further finetuned with QLoRA. First with SFT on a chat dataset and then with a DPO on a feedback Chat dataset.
|
| 32 |
|
| 33 |
|
| 34 |
## Intended uses & limitations
|
|
|
|
| 56 |
Finetuned by [Julien Van den Avenne](https://huggingface.co/vandeju)
|
| 57 |
|
| 58 |
|
| 59 |
+
## Training hyperparameters
|
| 60 |
|
| 61 |
The following hyperparameters were used during training:
|
| 62 |
- learning_rate: 5e-06
|
|
|
|
| 70 |
- lr_scheduler_type: cosine
|
| 71 |
- lr_scheduler_warmup_ratio: 0.1
|
| 72 |
- num_epochs: 1
|
| 73 |
+
|
| 74 |
+
## Framework versions
|
| 75 |
|
| 76 |
- PEFT 0.11.1
|
| 77 |
- Transformers 4.41.2
|