Token Classification
Transformers
Safetensors
Korean
roberta
named-entity-recognition
timex
korean
Eval Results (legacy)
Instructions to use kwoncho/ko-sroberta-korean-time-expression-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kwoncho/ko-sroberta-korean-time-expression-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="kwoncho/ko-sroberta-korean-time-expression-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("kwoncho/ko-sroberta-korean-time-expression-classifier") model = AutoModelForTokenClassification.from_pretrained("kwoncho/ko-sroberta-korean-time-expression-classifier") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - ko | |
| license: other | |
| library_name: transformers | |
| base_model: jhgan/ko-sroberta-multitask | |
| tags: | |
| - token-classification | |
| - named-entity-recognition | |
| - timex | |
| - korean | |
| metrics: | |
| - f1 | |
| pipeline_tag: token-classification | |
| model-index: | |
| - name: ko-sroberta-korean-time-expression-classifier | |
| results: | |
| - task: | |
| type: token-classification | |
| name: Korean TIMEX3 Detection | |
| dataset: | |
| name: 158.시간 표현 탐지 데이터 | |
| type: private | |
| split: Validation | |
| metrics: | |
| - type: f1 | |
| name: Entity F1 | |
| value: 0.8266074116550786 | |
| - type: precision | |
| name: Entity Precision | |
| value: 0.8264533883728931 | |
| - type: recall | |
| name: Entity Recall | |
| value: 0.8267614923575464 | |
| # Korean Time Expression Classifier | |
| This model detects Korean TIMEX3 time expressions with BIO token classification labels. | |
| The backbone is [`jhgan/ko-sroberta-multitask`](https://huggingface.co/jhgan/ko-sroberta-multitask), fine-tuned on `158.시간 표현 탐지 데이터` for four TIMEX3 entity types: | |
| - `DATE` | |
| - `TIME` | |
| - `DURATION` | |
| - `SET` | |
| ## Intended Use | |
| Use this model to identify Korean time expressions in sentences or utterances. It predicts token-level BIO labels and can be used through the Hugging Face `token-classification` pipeline. | |
| This is an experimental model trained for TIMEX3 span detection. It does not extract EVENT or TLINK annotations. | |
| ## Training Data | |
| The model was trained on the official `Training` split and evaluated on the official `Validation` split of `158.시간 표현 탐지 데이터`. | |
| Training/evaluation preprocessing: | |
| - Unsupported, empty, malformed, or unalignable TIMEX3 spans are excluded. | |
| - Records whose TIMEX3 span would be truncated by `max_length=256` are excluded. | |
| - TIMEX-free records are retained as negative examples. | |
| - JSON `text` fields are used as the source text. | |
| ## Training Configuration | |
| ```bash | |
| python -m time_expression_classifier.train_token_classifier \ | |
| --data-root "158.시간 표현 탐지 데이터" \ | |
| --model-name jhgan/ko-sroberta-multitask \ | |
| --output-dir outputs/official_epoch2 \ | |
| --split-mode official \ | |
| --epochs 2 \ | |
| --learning-rate 3e-5 \ | |
| --batch-size 16 \ | |
| --max-length 256 | |
| ``` | |
| Key settings: | |
| | setting | value | | |
| | --- | --- | | |
| | backbone | `jhgan/ko-sroberta-multitask` | | |
| | epochs | 2 | | |
| | learning rate | 3e-5 | | |
| | batch size | 16 | | |
| | max length | 256 | | |
| | weight decay | 0.01 | | |
| | warmup ratio | 0.06 | | |
| | seed | 42 | | |
| ## Evaluation | |
| Metrics are entity-level exact match on the official `Validation` split. | |
| | metric | value | | |
| | --- | ---: | | |
| | entity precision | 0.8265 | | |
| | entity recall | 0.8268 | | |
| | entity F1 | 0.8266 | | |
| | token accuracy | 0.9899 | | |
| | eval loss | 0.0350 | | |
| Per-label entity-level results: | |
| | label | precision | recall | F1 | support | | |
| | --- | ---: | ---: | ---: | ---: | | |
| | DATE | 0.8495 | 0.8367 | 0.8430 | 23422 | | |
| | TIME | 0.7933 | 0.8033 | 0.7983 | 3665 | | |
| | DURATION | 0.7848 | 0.8247 | 0.8042 | 6810 | | |
| | SET | 0.7107 | 0.6910 | 0.7007 | 974 | | |
| ## Usage | |
| ```python | |
| from transformers import pipeline | |
| tagger = pipeline( | |
| "token-classification", | |
| model="kwoncho/ko-sroberta-korean-time-expression-classifier", | |
| aggregation_strategy="simple", | |
| ) | |
| text = "매주 토요일 저녁에 회의를 합니다." | |
| print(tagger(text)) | |
| ``` | |
| ## Limitations | |
| - The model is sensitive to ambiguous time expressions such as `주`, `하루`, `시간`, `한달`, `일주일`, and `매일`. | |
| - `SET` is the lowest-performing label due to smaller support and ambiguity between repeated events and duration expressions. | |
| - The model predicts TIMEX3 spans only. Normalization to calendar values is not included. | |
| - Evaluation uses exact span match, so partial boundary differences count as errors. | |
| ## Reproducibility | |
| Repository: `git@github.com:hyun2019/ko-sroberta-korean-time-expression-classifier.git` | |
| The local release artifact is tracked as `models/official_epoch2` via DVC. | |