Token Classification
Transformers
Safetensors
Indonesian
bert
ner
indonesian
indobert

IndoBERT-Large NER — Distilled Data

Model Description

This model is a fine-tuned version of indobenchmark/indobert-large-p1 for Indonesian Named Entity Recognition (NER) using a 39-tag BIO label scheme.

Training Data: Distilled NER data generated via knowledge distillation from a teacher model.

Training Size: 2,338 sentences

Label Space

This model uses a 39-tag BIO tagging scheme covering the following entity types:

O, B-CRD, B-DAT, B-EVT, B-FAC, B-GPE, B-LAN, B-LAW, B-LOC, B-MON, B-NOR, B-ORD, B-ORG, B-PER, B-PRC, B-PRD, B-QTY, B-REG, B-TIM, B-WOA, I-CRD, I-DAT, I-EVT, I-FAC, I-GPE, I-LAN, I-LAW, I-LOC, I-MON, I-NOR, I-ORD, I-ORG, I-PER, I-PRC, I-PRD, I-QTY, I-REG, I-TIM, I-WOA

Benchmark Results

Benchmark Precision Recall F1-Score
ner-ui 0.4035 0.4475 0.4244
ner-ugm 0.3179 0.4257 0.3640

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("treamyracle/indobert-ner-distilled")
model = AutoModelForTokenClassification.from_pretrained("treamyracle/indobert-ner-distilled")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
result = nlp("Joko Widodo adalah presiden Indonesia yang tinggal di Jakarta.")
print(result)

Training Details

  • Base Model: indobenchmark/indobert-large-p1
  • Framework: Hugging Face Transformers
  • Optimizer: AdamW with cosine LR scheduler
  • Mixed Precision: FP16
  • Hardware: Google Colab T4 GPU
Downloads last month
52
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for treamyracle/indobert-ner-distilled

Finetuned
(17)
this model

Datasets used to train treamyracle/indobert-ner-distilled