BERTurk-128k | HisTR NER v2
A BERTurk-128k encoder fine-tuned for Named-Entity Recognition on the HisTR corpus (person and location entities in historical Turkish).
Training was performed with the Hugging Face run_ner.py example (Guide Link).
BERTurk-128k fine-tuned on HisTR
Strict F1 (dev): 0.873 (default seqeval metrics used)
Strict F1 (Ruznamçe test) (nervaluate used): to be filled
...
| Parameter | Value |
|---|---|
| Base model | dbmdz/bert-base-turkish-128k-cased |
| Task | NER |
| Max sequence length | 128 |
| Train batch size | 16 |
| Eval batch size | 8 |
| Learning rate | 3 × 10⁻⁵ |
| Epochs | 5 (best at 4) |
| Optim steps | 145 |
| Gradient accumulation | 1 |
| Mixed precision | Disabled (fp16=False) |
| Device | 1 × A100 (40 GB) |
Example use
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_id = "cihanunlu/BERTurk_HisTR_NER_v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
ner = pipeline("ner", model=model, tokenizer=tokenizer,
aggregation_strategy="simple")
sentence = "Mevlânâzâde Ahmed Hulûsî Efendi, 1293 senesinde Haleb vilâyetine memur edilmiştir."
print(ner(sentence))
- Downloads last month
- 10
Model tree for cihanunlu/BERTurk_HisTR_NER_v2
Base model
dbmdz/bert-base-turkish-128k-cased