--- license: apache-2.0 base_model: distilbert-base-uncased tags: - token-classification - ner - bootstrap-labels - eval-mentions-bootstrap metrics: - seqeval language: - en --- # davanstrien/eval-extraction-ner-v0 Token classifier trained on **bootstrap NER labels** from [`davanstrien/eval-mentions-bootstrap`](https://huggingface.co/datasets/davanstrien/eval-mentions-bootstrap). Demonstrates the `bootstrap-labels` skill workflow: GLiNER bootstraps coarse labels, a small task-specific model is trained on them. ## Training data - Source: `davanstrien/eval-mentions-bootstrap` - Bootstrap model: GLiNER (via `uv-scripts/gliner`) - Score threshold: 0.8 (entities below this dropped) - Span blacklist: ['learning_rate', 'eval_batch_size', 'epsilon', 'lr_scheduler_warmup_ratio', 'lr_scheduler_type', 'epoch', 'batch_size', 'optimizer', 'gradient_accumulation_steps', 'warmup_ratio', 'seed', 'weight_decay', 'model', 'dataset', 'transformers', 'training dataset', 'training data', 'unknown dataset', 'f1'] - Train rows: 1194 - Val rows: 133 - Token-label distribution (excluding `O`): - EVALUATION_METRIC: 7537 - BENCHMARK_NAME: 3104 - EVALUATION_DATASET: 1918 ## Eval results | Metric | Value | |---|---| | F1 | 0.5573 | | Precision | 0.5838 | | Recall | 0.5332 | | Accuracy | 0.9870 | (Note: held-out 10% of bootstrap labels — these are *silver labels*, not human-reviewed gold. Numbers reflect agreement with GLiNER, not absolute accuracy.) ## Caveats - This is a **V0 model** trained on bootstrap labels with no human review pass. Expect it to inherit GLiNER's failure modes. - The intended use is *as the V1 in an active-learning loop*: deploy as Label Studio ML backend, route disagreements with GLiNER to humans, retrain on corrections. See the [bootstrap-labels skill](https://github.com/huggingface/skills) for the full workflow. ## Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline ner = pipeline("token-classification", model="davanstrien/eval-extraction-ner-v0", aggregation_strategy="simple") ner("This model was evaluated on MMLU and HellaSwag.") ```