Joseon Level 1 Span Model - SillokBert

This is the promoted Level 1 span/type model for the Joseon-to-Day project. It labels historical Korean/Chinese-source text spans used downstream by the translation pipeline.

The model is a continued BertForTokenClassification checkpoint based on ddokbaro/SillokBert-NER.

Labels

The checkpoint uses BIO labels:

  • PER: person/name spans
  • LOC: place spans
  • POH: book/title/institution-style historical evidence spans
  • DAT: date spans, inherited from the label space
  • O: outside

Evaluation

Promoted dev result from the local project split:

Metric Value
span F1 0.9888
token accuracy 0.9978
person F1 0.9962
place F1 0.9293
book/title/evidence F1 0.8933

The full local metrics are included as training_metrics.json.

Intended Use

Use this model as the Level 1 span labeler in a Joseon/Hanmun historical-text translation pipeline. It is intended to provide structured spans for later reading selection and translation, not as a general-purpose modern NER system.

Limitations

This model was trained and evaluated on project-specific processed data. It may not generalize to unrelated domains, modern Korean NER, or arbitrary Classical Chinese text without additional validation.

Loading

from transformers import AutoModelForTokenClassification, AutoTokenizer

repo_id = "suhjae/joseon-level1-span-sillokbert"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForTokenClassification.from_pretrained(repo_id)
Downloads last month
20
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for suhjae/joseon-level1-span-sillokbert

Finetuned
(1)
this model