# Sentinel-D spaCy NER Model (Stage 1 — NVD Parsing) ## Model Details - **Base Model**: spaCy blank English (`en_core_web_blank`) - **Task**: Named Entity Recognition (NER) - **Training Date**: 2026-03-04T21:49:41.890810 - **Framework**: spaCy 3.x - **Training Data Size**: 550 descriptions + 50-example test set - **Training Epochs**: 20 - **Dropout**: 0.35 ## Custom NER Labels 1. **VERSION_RANGE**: Semantic version strings or version constraints (e.g., "1.2.3", "< 2.0.0") 2. **API_SYMBOL**: Method, class, or function names (e.g., "queryset.filter()", "X.509") 3. **BREAKING_CHANGE**: References to incompatible API changes or deprecations 4. **FIX_ACTION**: Specific remediation steps or upgrade instructions ## Evaluation Metrics | Metric | Value | |--------|-------| | Precision | 0.9111 | | Recall | 0.7885 | | F1 Score | 0.8454 | | True Positives | 41 | | False Positives | 4 | | False Negatives | 11 | ## Usage ```python import spacy nlp = spacy.load("./spacy-nvd-ner-v1") text = "OpenSSL versions before 1.1.1n contain a buffer overflow in the X.509 verifier." doc = nlp(text) for ent in doc.ents: print(f"{ent.text} -> {ent.label_}") # Output: # 1.1.1n -> VERSION_RANGE # X.509 -> API_SYMBOL ``` ## Installation 1. Extract the zip archive to your project directory 2. Load the model using spaCy: ```python import spacy nlp = spacy.load("./spacy-nvd-ner-v1") ``` ## Architecture The model consists of: - **Input Layer**: Vectorized token representations - **Hidden Layer**: Feed-forward network with 0.35 dropout - **Output Layer**: 4-class NER tagger (softmax) ## Training Configuration - **Optimizer**: SGD - **Batch Size Range**: 8-32 (compounding) - **Training Data**: Real NVD descriptions auto-annotated with GLiNER teacher model - **Constraint**: Exactly 50-example held-out test set (Master Document requirement) ## Known Limitations - Model trained on NVD descriptions only; may not generalize to other security domains - Entity boundaries may not align perfectly with whitespace - Requires English text input ## License MIT