Joseon Level 1 Span Model - SillokBert

This is the promoted Level 1 span/type model for the Joseon-to-Day project. It labels historical Korean/Chinese-source text spans used downstream by the translation pipeline.

The model is a continued BertForTokenClassification checkpoint based on ddokbaro/SillokBert-NER.

Labels

The checkpoint uses BIO labels:

PER: person/name spans
LOC: place spans
POH: book/title/institution-style historical evidence spans
DAT: date spans, inherited from the label space
O: outside

Evaluation

Promoted dev result from the local project split:

Metric	Value
span F1	0.9888
token accuracy	0.9978
person F1	0.9962
place F1	0.9293
book/title/evidence F1	0.8933

The full local metrics are included as training_metrics.json.

Intended Use

Use this model as the Level 1 span labeler in a Joseon/Hanmun historical-text translation pipeline. It is intended to provide structured spans for later reading selection and translation, not as a general-purpose modern NER system.

Limitations

This model was trained and evaluated on project-specific processed data. It may not generalize to unrelated domains, modern Korean NER, or arbitrary Classical Chinese text without additional validation.

Loading

from transformers import AutoModelForTokenClassification, AutoTokenizer

repo_id = "suhjae/joseon-level1-span-sillokbert"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForTokenClassification.from_pretrained(repo_id)

Downloads last month: 20

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for suhjae/joseon-level1-span-sillokbert

Base model

ddokbaro/SillokBert-NER

Finetuned

(1)

this model