File size: 2,204 Bytes
7d76965
 
 
 
 
 
 
 
 
9b90449
6ff4435
 
 
 
 
 
 
 
 
 
 
 
df8995b
6ff4435
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
537acdf
 
df8995b
 
 
 
 
 
6ff4435
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
language:
- et
base_model:
- EMBEDDIA/est-roberta
pipeline_tag: token-classification
library_name: transformers
tags:
- NER
license: cc-by-4.0
---
# est-roberta-ud-ner

<!-- Provide a quick summary of what the model is/does. -->

### Model Description

<!-- Provide a longer summary of what this model is. -->
est-roberta-ud-ner is an [Est-RoBERTa](https://huggingface.co/EMBEDDIA/est-roberta) based model fine-tuned for named entity recognition in Estonian on the [EDT](https://github.com/UniversalDependencies/UD_Estonian-EDT) and [EWT](https://github.com/UniversalDependencies/UD_Estonian-EWT) datasets.


### How to use
The model can be used with Transformers pipeline for NER. Try it in Google Colab, where the Transformers library is pre-installed or on your local machine (preferably using a virtual environment, see tutorial below) and install the Transformers library using ```pip install transformers```.
```
from transformers import pipeline

ner = pipeline("ner", model="vbius01/est-roberta-ud-ner")

text = "Eesti kuulub erinevalt Lätist ja Leedust kahtlemata Põhjamaade kultuuriruumi."
results = ner(text)

print(results)
```
```
[{'entity': 'B-GEP', 'score': np.float32(0.99339926), 'index': 1, 'word': '▁Eesti', 'start': 0, 'end': 5}, {'entity': 'B-GEP', 'score': np.float32(0.9923631), 'index': 4, 'word': '▁Lätist', 'start': 22, 'end': 29}, {'entity': 'B-GEP', 'score': np.float32(0.990756), 'index': 6, 'word': '▁Leedust', 'start': 32, 'end': 40}, {'entity': 'B-LOC', 'score': np.float32(0.61792), 'index': 8, 'word': '▁Põhjamaade', 'start': 51, 'end': 62}]
```

<!-- Provide the basic links for the model. -->

- **Repository:** [github.com/martinkivisikk/ner_thesis](https://github.com/martinkivisikk/ner_thesis)
- **Paper:** [Developing a NER Model Based on Treebank Corpora]()
### Virtual environment setup
Create and activate a virtual environment in your project directory with venv.
```
python -m venv .env
source .env/bin/activate
```
## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
This model can be used to find named entities from Estonian texts.