berturk-histr-lora-ner / README.md

cihanunlu

Update README.md

166678f verified 7 months ago

preview code

raw

history blame contribute delete

1.61 kB

metadata

library_name: transformers
tags:
  - peft
  - lora
  - ottomanturkish
datasets:
  - BUCOLIN/HisTR
language:
  - tr
metrics:
  - f1
  - precision
  - recall
base_model:
  - cihanunlu/BerTurk_Ottoman_Full_DAPT
pipeline_tag: token-classification

Model Card for Model ID

Overview


Base model	`cihanunlu/BerTurk_Ottoman_Full_DAPT`
Adapter type	LoRA (Low-Rank Adaptation) built with HF PEFT
Task	Named-Entity Recognition • BIO tags PER / LOC / O
Language	Ottoman / Late-Ottoman Turkish (Latin transliteration)
Repo contents	≈ 2 MB LoRA weights (`adapter_model.bin`, `adapter_config.json`) + tokenizer files

Attach it to the BerTurk_Ottoman_Full_DAPT checkpoint to obtain a lightweight NER model fine-tuned on the HiSTR corpus.

• Suitable for historical/Ottoman Turkish NER focusing on PERSON and LOCATION.
• Performance drops on modern Turkish or domain-specific jargon.
• Adapter inherits ethical constraints and biases of the base BerTurk model.

3 Evaluation

Dev (HiSTR) — best checkpoint (epoch 4)
- Precision – 77.3 %
- Recall – 84.9 %
- F1 – 80.9 %
Test (Rûznâmçe)
- Precision – 54.4 %
- Recall – 52.8 %
- F1 – 53.6 %

4 Training hyper-parameters

LoRA rank r: 16
LoRA α: 16
Dropout: 0.10
Peak learning rate: 5 × 10⁻⁴
Effective batch size: 16
Epochs: 5
Mixed precision: FP16