File size: 2,137 Bytes
9fb75af
2cc5f42
 
 
 
 
 
 
 
 
 
 
9fb75af
 
2cc5f42
9fb75af
2cc5f42
9fb75af
2cc5f42
9fb75af
2cc5f42
 
 
 
 
 
 
 
 
 
9fb75af
2cc5f42
9fb75af
2cc5f42
 
 
 
 
 
9fb75af
2cc5f42
9fb75af
2cc5f42
9fb75af
2cc5f42
 
9fb75af
2cc5f42
9fb75af
2cc5f42
 
9fb75af
2cc5f42
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: apache-2.0
base_model: microsoft/deberta-v3-base
tags:
- token-classification
- ner
- bootstrap-labels
- eval-mentions-bootstrap
metrics:
- seqeval
language:
- en
---

# davanstrien/eval-extraction-ner-v1

Token classifier trained on **bootstrap NER labels** from [`davanstrien/eval-mentions-bootstrap`](https://huggingface.co/datasets/davanstrien/eval-mentions-bootstrap). Demonstrates the `bootstrap-labels` skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them.

## Training data

- Source: `davanstrien/eval-mentions-bootstrap`
- Bootstrap model: GLiNER (via `uv-scripts/gliner`)
- Score threshold: 0.75 (entities below this dropped)
- Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1']
- Train rows: 1767
- Val rows: 197
- Token-label distribution (excluding `O`):
  - EVALUATION_METRIC: 9888
  - BENCHMARK_NAME: 4796
  - EVALUATION_DATASET: 4044

## Eval results

| Metric | Value |
|---|---|
| F1 | 0.0000 |
| Precision | 0.0000 |
| Recall | 0.0000 |
| Accuracy | 0.9784 |

(Note: held-out 10% of bootstrap labels — these are *silver labels*, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.)

## Caveats

- This is a **V0 model** trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes.
- The intended use is *as the V1 in an active-learning loop*: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the [bootstrap-labels skill](https://github.com/huggingface/skills) for the full workflow.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v1", aggregation_strategy="simple")
ner("This model was evaluated on MMLU and HellaSwag.")
```