Elaine
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ We use a curated subset of Open Assistant 2 and translated the dataset into Finn
|
|
| 24 |
|
| 25 |
### DPO
|
| 26 |
|
| 27 |
-
We use the HelpSteer2 preference binarized into chosen-rejected pairs using the helpfulness score as
|
| 28 |
|
| 29 |
- **English**: [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2)
|
| 30 |
|
|
@@ -32,7 +32,7 @@ We use the HelpSteer2 preference binarized into chosen-rejected pairs using the
|
|
| 32 |
|
| 33 |
## Recipes
|
| 34 |
|
| 35 |
-
We used 4 nodes (8 x AMD MI250X) to obtain a global batch size of 128 for SFT and 64 for DPO.
|
| 36 |
|
| 37 |
**SFT**
|
| 38 |
|
|
|
|
| 24 |
|
| 25 |
### DPO
|
| 26 |
|
| 27 |
+
We use the HelpSteer2 preference binarized into chosen-rejected pairs using the helpfulness score as recommended in the [HelpSteer2](https://arxiv.org/abs/2406.08673) paper. We translated the dataset into Finnish using Poro.
|
| 28 |
|
| 29 |
- **English**: [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2)
|
| 30 |
|
|
|
|
| 32 |
|
| 33 |
## Recipes
|
| 34 |
|
| 35 |
+
We used 4 nodes (8 x AMD MI250X) to obtain a global batch size of 128 for SFT and 64 for DPO. We used the [Alignment Handbook](https://github.com/huggingface/alignment-handbook/) codebase for finetuning.
|
| 36 |
|
| 37 |
**SFT**
|
| 38 |
|