treamyracle/nergrit-indo-ner
Viewer • Updated • 17.4k • 76
How to use treamyracle/indobert-ner-distilled with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="treamyracle/indobert-ner-distilled") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("treamyracle/indobert-ner-distilled")
model = AutoModelForTokenClassification.from_pretrained("treamyracle/indobert-ner-distilled")This model is a fine-tuned version of indobenchmark/indobert-large-p1 for Indonesian Named Entity Recognition (NER) using a 39-tag BIO label scheme.
Training Data: Distilled NER data generated via knowledge distillation from a teacher model.
Training Size: 2,338 sentences
This model uses a 39-tag BIO tagging scheme covering the following entity types:
O, B-CRD, B-DAT, B-EVT, B-FAC, B-GPE, B-LAN, B-LAW, B-LOC, B-MON, B-NOR, B-ORD, B-ORG, B-PER, B-PRC, B-PRD, B-QTY, B-REG, B-TIM, B-WOA, I-CRD, I-DAT, I-EVT, I-FAC, I-GPE, I-LAN, I-LAW, I-LOC, I-MON, I-NOR, I-ORD, I-ORG, I-PER, I-PRC, I-PRD, I-QTY, I-REG, I-TIM, I-WOA
| Benchmark | Precision | Recall | F1-Score |
|---|---|---|---|
| ner-ui | 0.4035 | 0.4475 | 0.4244 |
| ner-ugm | 0.3179 | 0.4257 | 0.3640 |
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("treamyracle/indobert-ner-distilled")
model = AutoModelForTokenClassification.from_pretrained("treamyracle/indobert-ner-distilled")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
result = nlp("Joko Widodo adalah presiden Indonesia yang tinggal di Jakarta.")
print(result)
indobenchmark/indobert-large-p1Base model
indobenchmark/indobert-large-p1