Inigohm123 commited on
Commit
c3b76ce
verified
1 Parent(s): b24040a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -73,7 +73,8 @@ Normalized: m谩s informaci贸n en uve doble uve doble uve doble punto e hache u p
73
  ## Training
74
 
75
  ### Data preparation
76
- The training data was compiled by our research group from multiple heterogeneous sources and consists of approximately 9,784,905 sentences. This dataset is a subset of the data used in the following machine translation model mt-hitz-eu-es.
 
77
  Prior to training, the data underwent preprocessing steps including cleaning, punctuation standardization, filtering, and the creation of aligned input鈥搊utput sentence pairs for the capitalization and punctuation restoration task.
78
 
79
  To generate the input鈥搊utput pairs, the target sentences were lowercased, punctuation was removed, and text normalization was applied using an in-house normalization tool.
 
73
  ## Training
74
 
75
  ### Data preparation
76
+ The training data was compiled by our research group from multiple heterogeneous sources and consists of approximately 9,784,905 sentences. This dataset is a subset of the data used in the following machine translation model [mt-hitz-eu-es](https://huggingface.co/HiTZ/mt-hitz-eu-es)
77
+
78
  Prior to training, the data underwent preprocessing steps including cleaning, punctuation standardization, filtering, and the creation of aligned input鈥搊utput sentence pairs for the capitalization and punctuation restoration task.
79
 
80
  To generate the input鈥搊utput pairs, the target sentences were lowercased, punctuation was removed, and text normalization was applied using an in-house normalization tool.