Update README.md
Browse files
README.md
CHANGED
|
@@ -73,7 +73,7 @@ Normalized: informazio gehiago hitz puntu e hatxe u puntu eus web horrian
|
|
| 73 |
## Training
|
| 74 |
|
| 75 |
### Data preparation
|
| 76 |
-
The training data was compiled by our research group from multiple heterogeneous sources and consists of approximately 9,784,905 sentences.
|
| 77 |
|
| 78 |
Prior to training, the data underwent preprocessing steps including cleaning, punctuation standardization, filtering, and the creation of aligned input–output sentence pairs for the capitalization and punctuation restoration task.
|
| 79 |
|
|
|
|
| 73 |
## Training
|
| 74 |
|
| 75 |
### Data preparation
|
| 76 |
+
The training data was compiled by our research group from multiple heterogeneous sources and consists of approximately 9,784,905 sentences. This dataset is a subset of the data used in the training of the following machine translation model [mt-hitz-eu-es](https://huggingface.co/HiTZ/mt-hitz-eu-es)
|
| 77 |
|
| 78 |
Prior to training, the data underwent preprocessing steps including cleaning, punctuation standardization, filtering, and the creation of aligned input–output sentence pairs for the capitalization and punctuation restoration task.
|
| 79 |
|