NER Slavic models
Collection
4 items • Updated
This is a Huggingface's transformer library pipeline-frendly model ivlcic/sour-sarma
Usage:
ner = pipeline(
task="token-classification",
model="ivlcic/sour-sarma-pline",
tokenizer="ivlcic/sour-sarma-pline",
aggregation_strategy="simple",
)
text = " Janez Novak... Metka Kralj,,. in Boris A. Novak živijo v Ljubljani in delajo za Microsoft."
tokens = re.findall(r"\s+|\w+|[^\w\s]", text, flags=re.UNICODE)
result = ner(tokens, is_split_into_words=True, delimiter="")
for r in result[0]:
print(f'[{text[r['start']:r['end']]}]({r["entity_group"]}@{"%.2f"%r["score"]}|{r["start"]}:{r["end"]})')
Should output something like:
[Janez Novak](PER@1.00|1:12)
[Metka Kralj](PER@1.00|16:27)
[Boris A. Novak](PER@1.00|34:48)
[Ljubljani](LOC@1.00|58:67)
[Microsoft](ORG@1.00|81:90)