est-roberta-ud-ner / README.md

vbius01

Update README.md

df8995b verified 9 months ago

preview code

raw

history blame contribute delete

2.2 kB

metadata

language:
  - et
base_model:
  - EMBEDDIA/est-roberta
pipeline_tag: token-classification
library_name: transformers
tags:
  - NER
license: cc-by-4.0

est-roberta-ud-ner

Model Description

est-roberta-ud-ner is an Est-RoBERTa based model fine-tuned for named entity recognition in Estonian on the EDT and EWT datasets.

How to use

The model can be used with Transformers pipeline for NER. Try it in Google Colab, where the Transformers library is pre-installed or on your local machine (preferably using a virtual environment, see tutorial below) and install the Transformers library using pip install transformers.

from transformers import pipeline

ner = pipeline("ner", model="vbius01/est-roberta-ud-ner")

text = "Eesti kuulub erinevalt Lätist ja Leedust kahtlemata Põhjamaade kultuuriruumi."
results = ner(text)

print(results)

[{'entity': 'B-GEP', 'score': np.float32(0.99339926), 'index': 1, 'word': '▁Eesti', 'start': 0, 'end': 5}, {'entity': 'B-GEP', 'score': np.float32(0.9923631), 'index': 4, 'word': '▁Lätist', 'start': 22, 'end': 29}, {'entity': 'B-GEP', 'score': np.float32(0.990756), 'index': 6, 'word': '▁Leedust', 'start': 32, 'end': 40}, {'entity': 'B-LOC', 'score': np.float32(0.61792), 'index': 8, 'word': '▁Põhjamaade', 'start': 51, 'end': 62}]

Repository: github.com/martinkivisikk/ner_thesis
Paper: Developing a NER Model Based on Treebank Corpora

Virtual environment setup

Create and activate a virtual environment in your project directory with venv.

python -m venv .env
source .env/bin/activate

Uses

This model can be used to find named entities from Estonian texts.