Update README.md
Browse files
README.md
CHANGED
|
@@ -50,7 +50,7 @@ The recommended environments include the following transfomer versions: 4.12.3 ,
|
|
| 50 |
|
| 51 |
### Training Data
|
| 52 |
|
| 53 |
-
The
|
| 54 |
|
| 55 |
|
| 56 |
| Dataset | Sentences before cleaning |
|
|
@@ -67,7 +67,7 @@ The Catalan-Basque data collected from the web was a combination of the followin
|
|
| 67 |
| WikiMatrix | 119,480 |
|
| 68 |
| **Total** | **15,653,108** |
|
| 69 |
|
| 70 |
-
The 9,033,998 sentence pairs of synthetic parallel data were created by translating a compendium of ES-EU parallel corpora into
|
| 71 |
|
| 72 |
|
| 73 |
### Training Procedure
|
|
|
|
| 50 |
|
| 51 |
### Training Data
|
| 52 |
|
| 53 |
+
The Basque-English data collected from the web was a combination of the following datasets:
|
| 54 |
|
| 55 |
|
| 56 |
| Dataset | Sentences before cleaning |
|
|
|
|
| 67 |
| WikiMatrix | 119,480 |
|
| 68 |
| **Total** | **15,653,108** |
|
| 69 |
|
| 70 |
+
The 9,033,998 sentence pairs of synthetic parallel data were created by translating a compendium of ES-EU parallel corpora into Basque using the [ES-EU translator from HiTZ](https://huggingface.co/HiTZ/mt-hitz-es-eu).
|
| 71 |
|
| 72 |
|
| 73 |
### Training Procedure
|