davanstrien
/

eval-extraction-ner-v0

Token Classification

bootstrap-labels

eval-mentions-bootstrap

Model card Files Files and versions

eval-extraction-ner-v0 / README.md

davanstrien's picture

davanstrien HF Staff

Upload README.md with huggingface_hub

9b6ecf5 verified 14 days ago

|

history blame contribute delete

2.13 kB

	---
	license: apache-2.0
	base_model: distilbert-base-uncased
	tags:
	- token-classification
	- ner
	- bootstrap-labels
	- eval-mentions-bootstrap
	metrics:
	- seqeval
	language:
	- en
	---

	# davanstrien/eval-extraction-ner-v0

	Token classifier trained on bootstrap NER labels from [`davanstrien/eval-mentions-bootstrap`](https://huggingface.co/datasets/davanstrien/eval-mentions-bootstrap). Demonstrates the `bootstrap-labels` skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them.

	## Training data

	- Source: `davanstrien/eval-mentions-bootstrap`
	- Bootstrap model: GLiNER (via `uv-scripts/gliner`)
	- Score threshold: 0.8 (entities below this dropped)
	- Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1']
	- Train rows: 1194
	- Val rows: 133
	- Token-label distribution (excluding `O`):
	- EVALUATION_METRIC: 7537
	- BENCHMARK_NAME: 3104
	- EVALUATION_DATASET: 1918

	## Eval results

	\| Metric \| Value \|
	\|---\|---\|
	\| F1 \| 0.5573 \|
	\| Precision \| 0.5838 \|
	\| Recall \| 0.5332 \|
	\| Accuracy \| 0.9870 \|

	(Note: held-out 10% of bootstrap labels — these are silver labels, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.)

	## Caveats

	- This is a V0 model trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes.
	- The intended use is as the V1 in an active-learning loop: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the [bootstrap-labels skill](https://github.com/huggingface/skills) for the full workflow.

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

	ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v0", aggregation_strategy="simple")
	ner("This model was evaluated on MMLU and HellaSwag.")
	```