---
tags:
- transformers
- token-classification
- ner
- bert
- conll2003
license: apache-2.0
datasets:
- conll2003
language:
- en
pipeline_tag: token-classification
authors:
- Karan D Vasa (https://huggingface.co/starkdv123)
---

# BERT (base-cased) for CoNLL-2003 NER — Full Fine-Tune

This repository contains a **BERT base cased** model fine-tuned on **CoNLL-2003** (parquet version).
Evaluated with **seqeval** (entity-level F1).

## 📊 Result (this run)
- **Entity Macro F1**: 0.9192

## Usage
```python
from transformers import pipeline
clf = pipeline("token-classification", model="starkdv123/conll2003-bert-ner-full", aggregation_strategy="simple")
clf("Chris Hoiles hit his 22nd homer for Baltimore.")
```

## Training summary

* Base: `bert-base-cased`
* Epochs: 3, LR: 3e-5, batch 16/32, max_len 256, weight_decay 0.01, fp16
* Label alignment: -100 for subword continuations
* Metric: seqeval F1 (entity-level)

## Confusion Matrix
```
         LOC    MISC       O     ORG     PER
   LOC    411       6      21      32       3
  MISC      9    2213      51      76      14
     O     67     110   38063      58      17
   ORG     31      77      32    2353      10
   PER      3      42      15      24    2689
```