DerivedFunction commited on
Commit
e5b4fb6
·
verified ·
1 Parent(s): b357baa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -162,13 +162,13 @@ factors were used to simulate messy text, and to reduce single character bias on
162
  - Random chance to change the casing of compatible language scripts, such as Latin and Cyrllic.
163
  - Low chance of simulating OCR and messy text with character mutation.
164
 
165
-
166
  To generalize well on both the target language and code switching a circulumn is provided:
167
  - Pure documents 55%: Single language to learn its vocabulary
168
  - Homogenous 25%: Single language + one foreign sentence to learn simple code switching
169
  - Spliced 10%: A foreign sentence is centered between two same-language sentence, with the first sentence's punctuation stripped, and second sentence's forced to be lowercased.
170
  - Mixed 10%: Generic mix of any languages.
171
- -
 
172
  ### Training Data Breakdown
173
  | lang | train sentences | train tokens | eval sentences | eval tokens | all sentences | all tokens |
174
  | :--- | ---: | ---: | ---: | ---: | ---: | ---: |
 
162
  - Random chance to change the casing of compatible language scripts, such as Latin and Cyrllic.
163
  - Low chance of simulating OCR and messy text with character mutation.
164
 
 
165
  To generalize well on both the target language and code switching a circulumn is provided:
166
  - Pure documents 55%: Single language to learn its vocabulary
167
  - Homogenous 25%: Single language + one foreign sentence to learn simple code switching
168
  - Spliced 10%: A foreign sentence is centered between two same-language sentence, with the first sentence's punctuation stripped, and second sentence's forced to be lowercased.
169
  - Mixed 10%: Generic mix of any languages.
170
+
171
+
172
  ### Training Data Breakdown
173
  | lang | train sentences | train tokens | eval sentences | eval tokens | all sentences | all tokens |
174
  | :--- | ---: | ---: | ---: | ---: | ---: | ---: |