File size: 6,090 Bytes
488e584
 
 
 
 
24ea98d
c29bfea
 
 
488e584
 
 
 
 
 
 
c29bfea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
488e584
 
 
 
c29bfea
 
 
 
 
 
 
488e584
c29bfea
 
 
488e584
c29bfea
 
 
 
488e584
c29bfea
488e584
c29bfea
 
 
 
 
 
 
 
 
 
488e584
 
 
c29bfea
 
 
 
 
 
 
 
 
 
488e584
 
 
 
 
 
 
 
 
 
 
 
666e82d
488e584
 
 
 
 
666e82d
 
 
 
 
488e584
 
 
 
 
 
 
b15a775
c29bfea
 
 
 
d885d4c
c217705
c29bfea
 
c217705
 
 
 
 
 
c29bfea
 
 
 
 
 
c217705
 
 
3d58d47
 
 
 
 
 
 
c29bfea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
library_name: transformers
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
- generated_from_trainer
- named-entity-recognition
- token-classification
- modernbert
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: ModernBERT-base-NER
  results:
  - task:
      type: token-classification
    dataset:
      name: conll2003
      type: conll2003
    metrics:
    - name: Precision
      type: Precision
      value: 0.8986
    - name: Recall
      type: Recall
      value: 0.9295
    - name: F1
      type: F1
      value: 0.9138
    - name: Accuracy
      type: Accuracy
      value: 0.984
datasets:
- lhoestq/conll2003
language:
- en
pipeline_tag: token-classification
---

# ModernBERT-base-NER

This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) for Named Entity Recognition (NER) tasks on [conll2003](https://huggingface.co/datasets/lhoestq/conll2003) dataset.

## Model Description

ModernBERT-base-NER is a token classification model trained to identify and categorize named entities in text. Built on the ModernBERT-base architecture, this model leverages modern transformer optimizations for efficient and accurate entity extraction.

## Intended Uses

**Primary Use Cases:**
- Named Entity Recognition in text documents
- Information extraction pipelines

**Intended Users:**
- NLP researchers and practitioners
- Data scientists working with text data
- Developers building information extraction systems

## Limitations

**Known Limitations:**
- Performance may vary on domains significantly different from the training data
- Entity boundaries might be imperfect for complex or nested entities
- May require domain-specific fine-tuning for specialized applications (medical, legal, etc.)
- Performance on low-resource languages or code-switched text not evaluated

**Out-of-Scope Uses:**
- Real-time processing of sensitive personal information without proper privacy safeguards
- High-stakes decision making without human oversight
- Applications requiring 100% accuracy in entity detection

## Training and evaluation data

The model was trained on a dataset for named entity recognition. Specific details about the dataset composition, size, and entity types are not publicly disclosed in this release.

## Performance

It achieves the following results on the evaluation set:
- Loss: 0.0638
- Precision: 0.8986
- Recall: 0.9295
- F1: 0.9138
- Accuracy: 0.9840

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 5

### Training results

| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| No log        | 1.0   | 439  | 0.0820          | 0.8431    | 0.8899 | 0.8659 | 0.9766   |
| 0.1769        | 2.0   | 878  | 0.0645          | 0.8895    | 0.9212 | 0.9051 | 0.9823   |
| 0.0415        | 3.0   | 1317 | 0.0638          | 0.8986    | 0.9295 | 0.9138 | 0.9840   |
| 0.0143        | 4.0   | 1756 | 0.0659          | 0.9037    | 0.9335 | 0.9184 | 0.9849   |
| 0.0051        | 5.0   | 2195 | 0.0672          | 0.9041    | 0.9329 | 0.9182 | 0.9849   |


### Framework versions

- Transformers 5.1.0
- Pytorch 2.7.0a0+ecf3bae40a.nv25.02
- Datasets 4.5.0
- Tokenizers 0.22.2

## How to Use

```python
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline

# Create NER pipeline
ner_pipeline = pipeline(
    "token-classification",
    model="MatteoFasulo/ModernBERT-base-NER",
    aggregation_strategy="simple",
    dtype=torch.bfloat16,
)

# Example usage
text = "Apple Inc. was founded by Steve Jobs in Cupertino, California."
entities = ner_pipeline(text)

for entity in entities:
    print(
        f"{entity['word']}: {entity['entity_group']} (confidence: {entity['score']:.4f})"
    )

# Apple Inc.: ORG (confidence: 0.9673)
# founded: MISC (confidence: 0.4503)
# by: PER (confidence: 0.6405)
# Steve Jobs: PER (confidence: 0.9905)
# Cupertino: LOC (confidence: 0.9894)
# California: LOC (confidence: 0.9859)
```

## Ethical Considerations

**Privacy:** This model may extract personal information (names, locations, organizations) from text. Users should:
- Implement appropriate data protection measures
- Comply with relevant privacy regulations (GDPR, CCPA, etc.)
- Obtain necessary consent before processing personal data

**Bias:** The model's performance may reflect biases present in the training data, potentially affecting:
- Recognition rates across different demographic groups
- Entity detection in various cultural contexts
- Performance on minority or underrepresented entities

Users should validate the model's performance on their specific use cases and implement bias mitigation strategies as needed.

## Citation

If you use this model in your research, please cite ModernBERT model:

```bibtex
@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}
```

## License

This model is released under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details.

## Acknowledgments

This model was built using the ModernBERT-base architecture from Answer.AI and trained using the Hugging Face Transformers library.