| --- |
| language: en |
| license: mit |
| base_model: roberta-base |
| tags: |
| - token-classification |
| - ner |
| - named-entity-recognition |
| datasets: |
| - conll2003 |
| metrics: |
| - f1 |
| - precision |
| - recall |
| - accuracy |
| model-index: |
| - name: RoBERTa-base-NER-CoNLL2003 |
| results: |
| - task: |
| type: token-classification |
| name: Named Entity Recognition |
| dataset: |
| type: conll2003 |
| name: CoNLL-2003 (English) |
| metrics: |
| - type: f1 |
| value: 95.99 |
| --- |
| |
| ## Model description |
| This model is a fine-tuned version of roberta-base for the Named Entity Recognition (NER) task using the CoNLL-2003 dataset. It can identify four types of entities: Persons (PER), Organizations (ORG), Locations (LOC), and Miscellaneous (MISC). |
| ## Training procedure |
| * **Hardware:** NVIDIA V100 GPU |
| * **Optimizer:** AdamW |
| * **Learning Rate:** 2e-5 |
| * **Batch Size:** 16 |
| * **Weight Decay:** 0.01 |
| * **Epochs:** 5 |
| * **Mixed Precision Training:** FP16 enabled |
| ## Evaluation Results |
| | Metric) | Value | |
| | :--- | :--- | |
| | **F1 Score** | **95.99%** | |
| | **Precision** | **95.61%** | |
| | **Recall** | **96.38%** | |
| | **Accuracy** | **99.29%** | |
| | **Eval Loss** | **0.0464** | |
| ## How to use |
| ```python |
| from transformers import pipeline |
| model_id = "learnrr/roberta-NER-conll2003" |
| text = "Apple is looking at buying U.K. startup for $1 billion" |
| results = nlp(text) |
| for entity in results: |
| print(f"entity: {entity['word']} | class: {entity['entity_group']} | confidence: {entity['score']:.4f}") |