Turkish Lemmatizer with spaCy + Transformers
This is a fine-tuned spaCy pipeline for Turkish lemmatization using a transformer backbone (dbmdz/bert-base-turkish-cased).
It combines a transformer component with a trainable_lemmatizer to produce high-quality lemmas for Turkish text.
⚡ Dependencies
Install spaCy and spaCy Transformers:
pip install -U spacy spacy-transformers
---
## Usage
import spacy
from huggingface_hub import snapshot_download
# Download the model from Hugging Face
model_path = snapshot_download("umit/turkish-lemmatizer")
# Load with spaCy
nlp = spacy.load(model_path)
# Test lemmatization
doc = nlp("Ayşe kitapları masanın üzerine koydu.")
print([token.lemma_ for token in doc])
# Expected output: ['Ayşe', 'kitap', 'masa', 'üzer', 'koy', '.']
---
## Training Details
- **Language:** Turkish (`tr`)
- **Pipeline components:** `transformer`, `trainable_lemmatizer`
- **Transformer backbone:** `dbmdz/bert-base-turkish-cased`
- **Batch size:** 128
- **Max epochs:** 10 (or 200,000 steps)
- **Lemmatizer:** trainable with `orth` backoff
---
## Training Performance
The model was trained on **3,402,790** tokens (total number of unique tokens = 128,663).
The model was evaluated on a **held-out test set** consisting of 854,544 tokens that were **not seen during training**.
| Step | Lemma Accuracy (%) | Score |
|-------|------------------|-------|
| 0 | 55.66 | 0.56 |
| 2000 | 87.72 | 0.88 |
| 4000 | 94.28 | 0.94 |
| 6000 | 95.89 | 0.96 |
| 8000 | 96.57 | 0.97 |
| 10000 | 96.99 | 0.97 |
| 13000 | 97.29 | 0.97 |
> ✅ The model reached over **97% lemmatization accuracy** during training.
## Citation / Contributors
If you use this model, please cite the following contributors for the data and model development:
**Contributors:**
Ümit Atlamaz, Yasin Demirtaş, Oğuz Özgür Uğur, Feyza Budan, Özkan Yavuz
@misc{umit2026turkishlemmatizer,
title={Turkish Lemmatizer with spaCy + Transformers},
author={Ümit Atlamaz and Yasin Demirtaş and Oğuz Özgür Uğur and Feyza Budan and Özkan Yavuz},
year={2026},
howpublished={\url{https://huggingface.co/umit/turkish-lemmatizer}}
}
- Downloads last month
- -