--- license: apache-2.0 base_model: distilbert-base-uncased tags: - token-classification - ner - bootstrap-labels - eval-mentions-bootstrap-v2 metrics: - seqeval language: - en --- # davanstrien/eval-extraction-ner-v2 Token classifier trained on **bootstrap NER labels** from [`davanstrien/eval-mentions-bootstrap-v2`](https://huggingface.co/datasets/davanstrien/eval-mentions-bootstrap-v2). Demonstrates the `bootstrap-labels` skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them. ## Training data - Source: `davanstrien/eval-mentions-bootstrap-v2` - Bootstrap model: GLiNER (via `uv-scripts/gliner`) - Score threshold: 0.8 (entities below this dropped) - Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1'] - Train rows: 306 - Val rows: 35 - Token-label distribution (excluding `O`): - BENCHMARK_NAME: 3663 - EVALUATION_METRIC: 719 ## Eval results | Metric | Value | |---|---| | F1 | 0.0000 | | Precision | 0.0000 | | Recall | 0.0000 | | Accuracy | 0.9756 | (Note: held-out 10% of bootstrap labels — these are *silver labels*, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.) ## Caveats - This is a **V0 model** trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes. - The intended use is *as the V1 in an active-learning loop*: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the [bootstrap-labels skill](https://github.com/huggingface/skills) for the full workflow. ## Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v2", aggregation_strategy="simple") ner("This model was evaluated on MMLU and HellaSwag.") ```