Update README.md
Browse files
README.md
CHANGED
|
@@ -117,12 +117,12 @@ The data composition follows a strategic curriculum:
|
|
| 117 |
* **10% Mixed with Noise:** Integration of "neutral" spans including code snippets, mathematical notation, emojis, symbols, and `rot_13` text tagged as `O` or their respective source to reduce hallucination.
|
| 118 |
|
| 119 |
### Supported Languages and Limitations (60)
|
| 120 |
-
The model supports the following ISO-coded languages
|
| 121 |
-
|
| 122 |
`af, am, ar, as, be, bg, bn, cs, da, de, el, en, es, fa, fi, fr, gu, he, hi, hu, hy, id, is, it, ja, ka, kk, km, kn, ko, la, lo, ml, mk, mn, mr, ms, my, nl, no, or, pa, pl, ps, pt, ro, ru, sd, sq, sr, sv, ta, te, th, tr, ug, uk, ur, vi, zh`
|
| 123 |
|
|
|
|
| 124 |
|
| 125 |
-
### The model scored the following on `papulca/language-identification's test set
|
| 126 |
|Language | Correct | Total | Accuracy |
|
| 127 |
|-------------|----------|-------------|--------|
|
| 128 |
|ar | 114 | 114 | 100.0% |
|
|
|
|
| 117 |
* **10% Mixed with Noise:** Integration of "neutral" spans including code snippets, mathematical notation, emojis, symbols, and `rot_13` text tagged as `O` or their respective source to reduce hallucination.
|
| 118 |
|
| 119 |
### Supported Languages and Limitations (60)
|
| 120 |
+
The model supports the following ISO-coded languages:
|
|
|
|
| 121 |
`af, am, ar, as, be, bg, bn, cs, da, de, el, en, es, fa, fi, fr, gu, he, hi, hu, hy, id, is, it, ja, ka, kk, km, kn, ko, la, lo, ml, mk, mn, mr, ms, my, nl, no, or, pa, pl, ps, pt, ro, ru, sd, sq, sr, sv, ta, te, th, tr, ug, uk, ur, vi, zh`
|
| 122 |
|
| 123 |
+
> Note that Romanized versions of any language is not included in the training set, such as Romanized Russian, and Hindi.
|
| 124 |
|
| 125 |
+
### The model scored the following on `papulca/language-identification`'s test set
|
| 126 |
|Language | Correct | Total | Accuracy |
|
| 127 |
|-------------|----------|-------------|--------|
|
| 128 |
|ar | 114 | 114 | 100.0% |
|