📚 Kadi Registers Ottoman-Turkish BERT Model (SerBERT)
This repository provides a fine-tuned BERT-based language model trained on a large-scale corpus of Kadi registers written in Latin transliteration and publihed by Türkiye Diyanet Vakfı Center for Islamic Studies. The model is designed to support computational research in Ottoman Turkish studies, digital humanities, digital history, and historical linguistics.
Model Description
Architecture: BERT (Masked Language Model)
Base model: BERT-based Turkish language model
Script: Latin alphabet (Ottoman Turkish transliteration)
Training objective: Masked Language Modeling (MLM)
Framework: Hugging Face Transformers
Model format:
safetensorsRepository: [İstanbul Kadı Sicilleri] (https://kadisicilleri.org)
The model was fine-tuned on a domain-specific corpus consisting of 54,445 rows text entries, enabling it to capture lexical, morphological, and syntactic patterns specific to Ottoman Turkish written in Latin characters. The final training corpus has 110,650,880 parameters.
Usage
from transformers import AutoTokenizer, AutoModelForMaskedLM
model_path = "sumeyyeakca/serbert-ottoman-turkish-bert-model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForMaskedLM.from_pretrained(model_path)
## Training Data
Kadi Registers that shed light on social life in the Ottoman Empire are one of the most important sources for today’s historians.
They were written in Turkish, Arabic, and Persian and maintained in book format.
These books (15-20th century) are one of the key sources of Turkish culture and history, which are also closely linked to Turkish economic and political life.
The corpus consists of historical legal documents that are distinct from other Ottoman historical materials.
This model was trained on the Istanbul Kadi Registers (100 volumes), which were transliterated and published by the Türkiye Diyanet Vakfı Centre for Islamic Studies (İSAM).
- Downloads last month
- -
Model tree for sumeyyeakca/serbert-ottoman-turkish-bert-model
Base model
dbmdz/bert-base-turkish-cased