| language: en | |
| license: apache-2.0 | |
| base_model: bert-base-cased | |
| tags: | |
| - bert | |
| - token-classification | |
| - ner | |
| - conll2003 | |
| datasets: | |
| - conll2003 | |
| metrics: | |
| - seqeval | |
| pipeline_tag: token-classification | |
| # BERT fine-tuned on CoNLL-2003 (NER) | |
| `bert-base-cased` fine-tuned for Named Entity Recognition on [CoNLL-2003](https://huggingface.co/datasets/conll2003). | |
| Recognizes 4 entity types: **PER**, **ORG**, **LOC**, **MISC**. | |
| ## Evaluation results | |
| | Metric | Score | | |
| |-----------|--------| | |
| | Precision | 0.7058 | | |
| | Recall | 0.5080 | | |
| | F1 | 0.5908 | | |
| | Accuracy | 0.9015 | | |
| Evaluated with [seqeval](https://github.com/chakki-works/seqeval) on the CoNLL-2003 test split. | |
| ## Usage | |
| ```python | |
| from transformers import pipeline | |
| ner = pipeline("ner", model="ZaharHR/bert-conll2003-ner", aggregation_strategy="simple") | |
| ner("Elon Musk founded SpaceX in California.") | |
| ``` | |
| ## Training details | |
| - **Base model:** `bert-base-cased` | |
| - **Dataset:** CoNLL-2003 | |
| - **Epochs:** 1 | |
| - **Effective batch size:** 16 (gradient accumulation) | |
| - **Optimizer:** AdamW, weight decay 0.01 | |
| - **Warmup steps:** 500 | |
| ## Label scheme | |
| ``` | |
| O, B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC, B-MISC, I-MISC | |
| ``` | |