davanstrien
/

eval-extraction-ner-v2

Token Classification

bootstrap-labels

eval-mentions-bootstrap-v2

Model card Files Files and versions

eval-extraction-ner-v2 / README.md

davanstrien's picture

davanstrien HF Staff

Upload README.md with huggingface_hub

9a01d32 verified 14 days ago

|

history blame contribute delete

2.11 kB

	---
	license: apache-2.0
	base_model: distilbert-base-uncased
	tags:
	- token-classification
	- ner
	- bootstrap-labels
	- eval-mentions-bootstrap-v2
	metrics:
	- seqeval
	language:
	- en
	---

	# davanstrien/eval-extraction-ner-v2

	Token classifier trained on bootstrap NER labels from [`davanstrien/eval-mentions-bootstrap-v2`](https://huggingface.co/datasets/davanstrien/eval-mentions-bootstrap-v2). Demonstrates the `bootstrap-labels` skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them.

	## Training data

	- Source: `davanstrien/eval-mentions-bootstrap-v2`
	- Bootstrap model: GLiNER (via `uv-scripts/gliner`)
	- Score threshold: 0.8 (entities below this dropped)
	- Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1']
	- Train rows: 306
	- Val rows: 35
	- Token-label distribution (excluding `O`):
	- BENCHMARK_NAME: 3663
	- EVALUATION_METRIC: 719

	## Eval results

	\| Metric \| Value \|
	\|---\|---\|
	\| F1 \| 0.0000 \|
	\| Precision \| 0.0000 \|
	\| Recall \| 0.0000 \|
	\| Accuracy \| 0.9756 \|

	(Note: held-out 10% of bootstrap labels — these are silver labels, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.)

	## Caveats

	- This is a V0 model trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes.
	- The intended use is as the V1 in an active-learning loop: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the [bootstrap-labels skill](https://github.com/huggingface/skills) for the full workflow.

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

	ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v2", aggregation_strategy="simple")
	ner("This model was evaluated on MMLU and HellaSwag.")
	```