|
|
|
|
|
--- |
|
|
tags: |
|
|
- transformers |
|
|
- token-classification |
|
|
- ner |
|
|
- bert |
|
|
- peft |
|
|
- lora |
|
|
- conll2003 |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- conll2003 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: token-classification |
|
|
authors: |
|
|
- Karan D Vasa (https://huggingface.co/starkdv123) |
|
|
--- |
|
|
|
|
|
# BERT (base-cased) for CoNLL-2003 NER — LoRA Adapter (PEFT) |
|
|
|
|
|
This repository contains **LoRA adapter weights** trained on **CoNLL-2003** for BERT base cased. |
|
|
|
|
|
## 📊 Reference result (merged model from same adapter) |
|
|
- **Entity Macro F1**: 0.9052 |
|
|
|
|
|
## Usage (attach adapter) |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline |
|
|
from peft import PeftModel |
|
|
|
|
|
base = "bert-base-cased" |
|
|
adapter = "starkdv123/conll2003-bert-ner-lora" |
|
|
|
|
|
tok = AutoTokenizer.from_pretrained(base) |
|
|
base_model = AutoModelForTokenClassification.from_pretrained(base, num_labels=9) |
|
|
model = PeftModel.from_pretrained(base_model, adapter) |
|
|
|
|
|
clf = pipeline("token-classification", model=model, tokenizer=tok, aggregation_strategy="simple") |
|
|
clf("Chris Hoiles hit his 22nd homer for Baltimore.") |
|
|
``` |
|
|
|
|
|
## Training summary |
|
|
* LoRA: r=8, alpha=16, dropout=0.1 |
|
|
* Targets: [query, key, value, output.dense] |
|
|
* Epochs: 3, LR: 2e-4, warmup 0.1, batch 16/32 |
|
|
|
|
|
## Confusion Matrix |
|
|
``` |
|
|
LOC MISC O ORG PER |
|
|
LOC 384 6 35 43 5 |
|
|
MISC 12 2138 80 100 33 |
|
|
O 57 119 38060 58 21 |
|
|
ORG 43 109 36 2304 11 |
|
|
PER 1 27 18 22 2705 |
|
|
``` |
|
|
|