---
license: apache-2.0
base_model: distilbert-base-uncased
tags:
- token-classification
- ner
- bootstrap-labels
- eval-mentions-bootstrap-v2
metrics:
- seqeval
language:
- en
---

# davanstrien/eval-extraction-ner-v2

Token classifier trained on **bootstrap NER labels** from [`davanstrien/eval-mentions-bootstrap-v2`](https://huggingface.co/datasets/davanstrien/eval-mentions-bootstrap-v2). Demonstrates the `bootstrap-labels` skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them.

## Training data

- Source: `davanstrien/eval-mentions-bootstrap-v2`
- Bootstrap model: GLiNER (via `uv-scripts/gliner`)
- Score threshold: 0.8 (entities below this dropped)
- Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1']
- Train rows: 306
- Val rows: 35
- Token-label distribution (excluding `O`):
  - BENCHMARK_NAME: 3663
  - EVALUATION_METRIC: 719

## Eval results

| Metric | Value |
|---|---|
| F1 | 0.0000 |
| Precision | 0.0000 |
| Recall | 0.0000 |
| Accuracy | 0.9756 |

(Note: held-out 10% of bootstrap labels — these are *silver labels*, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.)

## Caveats

- This is a **V0 model** trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes.
- The intended use is *as the V1 in an active-learning loop*: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the [bootstrap-labels skill](https://github.com/huggingface/skills) for the full workflow.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v2", aggregation_strategy="simple")
ner("This model was evaluated on MMLU and HellaSwag.")
```