DerivedFunction
/

polyglot-tagger-60L-Experimental

Token Classification

language-detection

language-identification

Model card Files Files and versions

Metrics Training metrics Community

DerivedFunction commited on 3 days ago

Commit

6b7ec2e

·

verified ·

1 Parent(s): 4a741b3

Update README.md

Files changed (1) hide show

README.md +43 -0

README.md CHANGED Viewed

@@ -127,6 +127,49 @@ The model supports the following ISO-coded languages:
 > Note that Romanized versions of any language is not included in the training set, such as Romanized Russian, and Hindi.
 ## Evaluation
 ### The model scored the following on `papulca/language-identification`'s test set
 |Language     | Correct  |  Total     | Accuracy    |

 > Note that Romanized versions of any language is not included in the training set, such as Romanized Russian, and Hindi.
+The coverage is as follows from a sample:
+Per-group coverage (examples / tokens):
+  English         47 examples |    3947 tokens
+  Russian         47 examples |    3665 tokens
+  German          58 examples |    4625 tokens
+  Japanese        50 examples |    4188 tokens
+  Chinese         60 examples |    4131 tokens
+  French          40 examples |    3723 tokens
+  Spanish         44 examples |    4756 tokens
+  Portuguese      27 examples |    2130 tokens
+  Italian         57 examples |    5178 tokens
+  Polish          25 examples |    1753 tokens
+  Dutch           44 examples |    3082 tokens
+  Turkish         35 examples |    2315 tokens
+  SoutheastAsianLatin   114 examples |    8861 tokens
+  CentralEuropeanLatin   125 examples |    9761 tokens
+  Korean          38 examples |    3958 tokens
+  EastSlavicCyrillic    85 examples |    7471 tokens
+  Arabic          45 examples |    2508 tokens
+  NordicCore     194 examples |   14094 tokens
+  BalkanCyrillic    71 examples |    6231 tokens
+  ArabicOther     92 examples |    8010 tokens
+  Hindi           33 examples |    3251 tokens
+  IndicOther     261 examples |   40630 tokens
+  CentralAsianCyrillic    57 examples |    3789 tokens
+  AfricanLatin    82 examples |    5910 tokens
+  OtherScripts   269 examples |   28603 tokens
+Top token languages:
+  ml      8197
+  it      5178
+  ta      4903
+  he      4873
+  es      4756
+  de      4625
+  kn      4613
+  pa      4457
+  ja      4188
+  zh      4131
+  uk      4007
+  ko      3958
 ## Evaluation
 ### The model scored the following on `papulca/language-identification`'s test set
 |Language     | Correct  |  Total     | Accuracy    |