Instructions to use suhjae/joseon-level1-span-sillokbert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use suhjae/joseon-level1-span-sillokbert with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="suhjae/joseon-level1-span-sillokbert")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("suhjae/joseon-level1-span-sillokbert") model = AutoModelForTokenClassification.from_pretrained("suhjae/joseon-level1-span-sillokbert") - Notebooks
- Google Colab
- Kaggle
Joseon Level 1 Span Model - SillokBert
This is the promoted Level 1 span/type model for the Joseon-to-Day project. It labels historical Korean/Chinese-source text spans used downstream by the translation pipeline.
The model is a continued BertForTokenClassification checkpoint based on
ddokbaro/SillokBert-NER.
Labels
The checkpoint uses BIO labels:
PER: person/name spansLOC: place spansPOH: book/title/institution-style historical evidence spansDAT: date spans, inherited from the label spaceO: outside
Evaluation
Promoted dev result from the local project split:
| Metric | Value |
|---|---|
| span F1 | 0.9888 |
| token accuracy | 0.9978 |
| person F1 | 0.9962 |
| place F1 | 0.9293 |
| book/title/evidence F1 | 0.8933 |
The full local metrics are included as training_metrics.json.
Intended Use
Use this model as the Level 1 span labeler in a Joseon/Hanmun historical-text translation pipeline. It is intended to provide structured spans for later reading selection and translation, not as a general-purpose modern NER system.
Limitations
This model was trained and evaluated on project-specific processed data. It may not generalize to unrelated domains, modern Korean NER, or arbitrary Classical Chinese text without additional validation.
Loading
from transformers import AutoModelForTokenClassification, AutoTokenizer
repo_id = "suhjae/joseon-level1-span-sillokbert"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForTokenClassification.from_pretrained(repo_id)
- Downloads last month
- 20
Model tree for suhjae/joseon-level1-span-sillokbert
Base model
ddokbaro/SillokBert-NER