DerivedFunction commited on
Commit
b357baa
·
verified ·
1 Parent(s): afe7dc7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -168,6 +168,8 @@ To generalize well on both the target language and code switching a circulumn is
168
  - Homogenous 25%: Single language + one foreign sentence to learn simple code switching
169
  - Spliced 10%: A foreign sentence is centered between two same-language sentence, with the first sentence's punctuation stripped, and second sentence's forced to be lowercased.
170
  - Mixed 10%: Generic mix of any languages.
 
 
171
  | lang | train sentences | train tokens | eval sentences | eval tokens | all sentences | all tokens |
172
  | :--- | ---: | ---: | ---: | ---: | ---: | ---: |
173
  | en | 342138 (2.14%) | 8515554 (1.58%) | 2925 (3.89%) | 29279 (1.57%) | 345063 (2.14%) | 8544833 (1.58%) |
 
168
  - Homogenous 25%: Single language + one foreign sentence to learn simple code switching
169
  - Spliced 10%: A foreign sentence is centered between two same-language sentence, with the first sentence's punctuation stripped, and second sentence's forced to be lowercased.
170
  - Mixed 10%: Generic mix of any languages.
171
+ -
172
+ ### Training Data Breakdown
173
  | lang | train sentences | train tokens | eval sentences | eval tokens | all sentences | all tokens |
174
  | :--- | ---: | ---: | ---: | ---: | ---: | ---: |
175
  | en | 342138 (2.14%) | 8515554 (1.58%) | 2925 (3.89%) | 29279 (1.57%) | 345063 (2.14%) | 8544833 (1.58%) |