---
license: mit
datasets:
- ontonotes/conll2012_ontonotesv5
language:
- en
base_model:
- FacebookAI/roberta-large
pipeline_tag: token-classification
---

# RoBERTa-large fine-tuned on OntoNotes 5.0

This model is a fine-tuned version of [FacebookAI/roberta-large](https://huggingface.co/FacebookAI/roberta-large) on the English subset of the **OntoNotes 5.0** (CoNLL-2012) dataset. RoBERTa-large features 24 layers and ~355M parameters, providing enhanced semantic understanding for complex Named Entity Recognition (NER) tasks compared to the base architecture.

## 📊 Performance
The following results were achieved on the OntoNotes 5.0 (v12) test set:

| **Entity** | **Precision** | **Recall** | **F1-Score** | **Support** |
| :--- | :---: | :---: | :---: | :---: |
| CARDINAL | 0.7769 | 0.7900 | 0.7834 | 1005 |
| DATE | 0.8211 | 0.8533 | 0.8369 | 1786 |
| EVENT | 0.5702 | 0.7647 | 0.6533 | 85 |
| FAC | 0.7123 | 0.6980 | 0.7051 | 149 |
| GPE | 0.9262 | 0.9470 | 0.9365 | 2546 |
| LANGUAGE | 0.7500 | 0.6818 | 0.7143 | 22 |
| LAW | 0.5000 | 0.6364 | 0.5600 | 44 |
| LOC | 0.6597 | 0.7302 | 0.6932 | 215 |
| MONEY | 0.8730 | 0.9099 | 0.8910 | 355 |
| NORP | 0.9029 | 0.9485 | 0.9251 | 990 |
| ORDINAL | 0.6936 | 0.7874 | 0.7376 | 207 |
| ORG | 0.8870 | 0.9101 | 0.8984 | 2002 |
| PERCENT | 0.8703 | 0.9066 | 0.8881 | 407 |
| PERSON | 0.9250 | 0.9246 | 0.9248 | 2134 |
| PRODUCT | 0.7356 | 0.7111 | 0.7232 | 90 |
| QUANTITY | 0.6933 | 0.6797 | 0.6865 | 153 |
| TIME | 0.6211 | 0.6267 | 0.6239 | 225 |
| WORK_OF_ART | 0.6686 | 0.6923 | 0.6802 | 169 |
| **micro avg** | **0.8581** | **0.8831** | **0.8704** | **12584** |
| **macro avg** | **0.7548** | **0.7888** | **0.7701** | **12584** |
| **weighted avg** | **0.8596** | **0.8831** | **0.8710** | **12584** |

## 🛠 Training Details
To optimize the 24-layer transformer on 2xNVIDIA V100 GPUs:
- **Architecture**: `RobertaForTokenClassification`
- **Tokenizer**: `RobertaTokenizerFast` (with `add_prefix_space=True`)
- **Learning Rate**: 1e-5
- **Effective Batch Size**: 32 (4 per device × 4 gradient accumulation steps)
- **Epochs**: 5
- **Warmup Ratio**: 0.1
- **Mixed Precision**: FP16 enabled
- **Optimizer**: AdamW with `weight_decay=0.01`

## 📂 Project Assets
- **GitHub Repository**: [Learnrr/ontonotes5_ner_evaluation](https://github.com/Learnrr/ontonotes5_ner_evaluation.git)

| **Asset** | **File** | **Description** |
| :--- | :--- | :--- |
| **Model Weights** | `model.safetensors` | Fine-tuned Large weights (~1.42 GB). |
| **Configuration** | `config.json` | 24-layer configuration and `id2label` map. |
| **Vocabulary** | `vocab.json` / `merges.txt` | BPE vocabulary and byte-level merge rules. |
| **Tokenizer** | `tokenizer.json` / `tokenizer_config.json` | Complete fast tokenizer setup. |
| **Special Tokens** | `special_tokens_map.json` | Definitions for BOS, EOS, and Padding tokens. |
| **Training Args** | `training_args.bin` | Hyperparameters used during the training run. |

## 🚀 Usage
```python
from transformers import pipeline

model_checkpoint = "learnrr/roberta-large-ontonotes5-ner"
token_classifier = pipeline(
    "token-classification", 
    model=model_checkpoint, 
    aggregation_strategy="simple"
)

text = "The United Nations is headquartered in New York City."
results = token_classifier(text)

for entity in results:
    print(f"Entity: {entity['word']} | Label: {entity['entity_group']} | Score: {entity['score']:.4f}")
```