nice readme
Browse files
README.md
CHANGED
|
@@ -18,12 +18,30 @@ datasets:
|
|
| 18 |
- oscar
|
| 19 |
---
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
|
|
|
|
| 26 |
|
| 27 |
-
|
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
- oscar
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# RoBERTa for Single Language Classification
|
| 22 |
+
## Training
|
| 23 |
+
RoBERTa fine-tuned on small parts of Open Subtitles, Oscar and Tatoeba datasets (~9k samples per language).
|
| 24 |
|
| 25 |
+
| data source | language |
|
| 26 |
+
|-----------------|----------------|
|
| 27 |
+
| open_subtitles | ka, he, en, de |
|
| 28 |
+
| oscar | be, kk, az, hu |
|
| 29 |
+
| tatoeba | ru, uk |
|
| 30 |
|
| 31 |
+
## Validation
|
| 32 |
+
The metrics obtained from validation on the another part of dataset (~1k samples per language).
|
| 33 |
|
| 34 |
+
|index|class|f1-score|precision|recall|support|
|
| 35 |
+
|---|---|---|---|---|---|
|
| 36 |
+
|0|az|0\.998|0\.997|1\.0|997|
|
| 37 |
+
|1|be|0\.996|0\.998|0\.994|1004|
|
| 38 |
+
|2|de|0\.976|0\.966|0\.987|979|
|
| 39 |
+
|3|en|0\.976|0\.986|0\.967|1020|
|
| 40 |
+
|4|he|1\.0|1\.0|0\.999|1001|
|
| 41 |
+
|5|hy|0\.994|0\.991|0\.998|993|
|
| 42 |
+
|6|ka|0\.999|0\.999|0\.999|1000|
|
| 43 |
+
|7|kk|0\.996|0\.998|0\.993|1005|
|
| 44 |
+
|8|uk|0\.982|0\.997|0\.968|1030|
|
| 45 |
+
|9|ru|0\.982|0\.968|0\.997|971|
|
| 46 |
+
|10|macro\_avg|0\.99|0\.99|0\.99|10000|
|
| 47 |
+
|11|weighted avg|0\.99|0\.99|0\.99|10000|
|