unimelb-nlp/wikiann
Viewer • Updated • 2M • 45.2k • 121
How to use auhide/bert-base-ner-bulgarian with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="auhide/bert-base-ner-bulgarian") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("auhide/bert-base-ner-bulgarian")
model = AutoModelForTokenClassification.from_pretrained("auhide/bert-base-ner-bulgarian")The model rmihaylov/bert-base-bg fine-tuned on a Bulgarian subset of wikiann. It achieves 0.99 F1-score on that dataset.
Import the libraries:
from pprint import pprint
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
Load the model:
MODEL_ID = "auhide/bert-base-ner-bulgarian"
model = AutoModelForTokenClassification.from_pretrained(MODEL_ID)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
ner = pipeline(task="ner", model=model, tokenizer=tokenizer)
Do inference:
text = "Философът Барух Спиноза е роден в Амстердам."
pprint(ner(text))
[{'end': 13,
'entity': 'B-PER',
'index': 3,
'score': 0.9954899,
'start': 9,
'word': '▁Бар'},
{'end': 15,
'entity': 'I-PER',
'index': 4,
'score': 0.9660787,
'start': 13,
'word': 'ух'},
{'end': 23,
'entity': 'I-PER',
'index': 5,
'score': 0.99728084,
'start': 15,
'word': '▁Спиноза'},
{'end': 43,
'entity': 'B-LOC',
'index': 9,
'score': 0.8990479,
'start': 33,
'word': '▁Амстердам'}]
Note: There are three types of entities - PER, ORG, LOC.