Commit ·
3fdd67d
1
Parent(s): eb31495
Update README.md
Browse files
README.md
CHANGED
|
@@ -3,8 +3,23 @@ license: mit
|
|
| 3 |
tags:
|
| 4 |
- feature-extraction
|
| 5 |
library_name: generic
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
Usage
|
| 9 |
```
|
| 10 |
import fasttext.util
|
|
|
|
| 3 |
tags:
|
| 4 |
- feature-extraction
|
| 5 |
library_name: generic
|
| 6 |
+
datasets:
|
| 7 |
+
- ubertext2.0
|
| 8 |
+
widget:
|
| 9 |
+
- text: "доброго вечора ми з україни"
|
| 10 |
---
|
| 11 |
|
| 12 |
+
_name_ is pre-trained word vectors for the Ukrainian language, trained with fastText on (yet unreleased) UberText2.0 dataset, released by the [lang-uk](https://lang.org.ua/en/). This model was trained using skipgram in dimension 300, with character n-grams range of 2-5, and 15 negative samples.
|
| 13 |
+
|
| 14 |
+
Our model increases Accuracy by 6.3% compared to the [Facebook Ukrainian word vectors](https://fasttext.cc/docs/en/crawl-vectors.html) on the word analogy task. The dataset for Ukrainian word analogy is available [here](https://github.com/lang-uk/vecs/).
|
| 15 |
+
|
| 16 |
+
Extrinsic evaluations were performed on two sequence labeling tasks: NER and POS tagging. NER-UK dataset was released by the lang-uk, and Ukrainian (UD) corpus was developed by a non-profit organization Institute for Ukrainian.
|
| 17 |
+
|
| 18 |
+
Results:
|
| 19 |
+
1) spaCy NER F-score 0.818
|
| 20 |
+
2) POS Flair Accuracy 0.824
|
| 21 |
+
3) POS spaCy Accuracy 0.911
|
| 22 |
+
|
| 23 |
Usage
|
| 24 |
```
|
| 25 |
import fasttext.util
|