IndoBERT-Large NER — Distilled Data

Model Description

This model is a fine-tuned version of indobenchmark/indobert-large-p1 for Indonesian Named Entity Recognition (NER) using a 39-tag BIO label scheme.

Training Data: Distilled NER data generated via knowledge distillation from a teacher model.

Training Size: 2,338 sentences

Label Space

This model uses a 39-tag BIO tagging scheme covering the following entity types:

O, B-CRD, B-DAT, B-EVT, B-FAC, B-GPE, B-LAN, B-LAW, B-LOC, B-MON, B-NOR, B-ORD, B-ORG, B-PER, B-PRC, B-PRD, B-QTY, B-REG, B-TIM, B-WOA, I-CRD, I-DAT, I-EVT, I-FAC, I-GPE, I-LAN, I-LAW, I-LOC, I-MON, I-NOR, I-ORD, I-ORG, I-PER, I-PRC, I-PRD, I-QTY, I-REG, I-TIM, I-WOA

Benchmark Results

Benchmark	Precision	Recall	F1-Score
ner-ui	0.4035	0.4475	0.4244
ner-ugm	0.3179	0.4257	0.3640

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("treamyracle/indobert-ner-distilled")
model = AutoModelForTokenClassification.from_pretrained("treamyracle/indobert-ner-distilled")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
result = nlp("Joko Widodo adalah presiden Indonesia yang tinggal di Jakarta.")
print(result)

Training Details

Base Model: indobenchmark/indobert-large-p1
Framework: Hugging Face Transformers
Optimizer: AdamW with cosine LR scheduler
Mixed Precision: FP16
Hardware: Google Colab T4 GPU

Downloads last month: 52

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for treamyracle/indobert-ner-distilled

Base model

indobenchmark/indobert-large-p1

Finetuned

(17)

this model

treamyracle
/

indobert-ner-distilled