|
|
|
|
|
--- |
|
|
tags: |
|
|
- transformers |
|
|
- token-classification |
|
|
- ner |
|
|
- bert |
|
|
- conll2003 |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- conll2003 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: token-classification |
|
|
authors: |
|
|
- Karan D Vasa (https://huggingface.co/starkdv123) |
|
|
--- |
|
|
|
|
|
# BERT (base-cased) for CoNLL-2003 NER — Full Fine-Tune |
|
|
|
|
|
This repository contains a **BERT base cased** model fine-tuned on **CoNLL-2003** (parquet version). |
|
|
Evaluated with **seqeval** (entity-level F1). |
|
|
|
|
|
## 📊 Result (this run) |
|
|
- **Entity Macro F1**: 0.9192 |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import pipeline |
|
|
clf = pipeline("token-classification", model="starkdv123/conll2003-bert-ner-full", aggregation_strategy="simple") |
|
|
clf("Chris Hoiles hit his 22nd homer for Baltimore.") |
|
|
``` |
|
|
|
|
|
## Training summary |
|
|
|
|
|
* Base: `bert-base-cased` |
|
|
* Epochs: 3, LR: 3e-5, batch 16/32, max_len 256, weight_decay 0.01, fp16 |
|
|
* Label alignment: -100 for subword continuations |
|
|
* Metric: seqeval F1 (entity-level) |
|
|
|
|
|
## Confusion Matrix |
|
|
``` |
|
|
LOC MISC O ORG PER |
|
|
LOC 411 6 21 32 3 |
|
|
MISC 9 2213 51 76 14 |
|
|
O 67 110 38063 58 17 |
|
|
ORG 31 77 32 2353 10 |
|
|
PER 3 42 15 24 2689 |
|
|
``` |
|
|
|