| # Sentinel-D spaCy NER Model (Stage 1 — NVD Parsing) |
|
|
| ## Model Details |
| - **Base Model**: spaCy blank English (`en_core_web_blank`) |
| - **Task**: Named Entity Recognition (NER) |
| - **Training Date**: 2026-03-04T21:49:41.890810 |
| - **Framework**: spaCy 3.x |
| - **Training Data Size**: 550 descriptions + 50-example test set |
| - **Training Epochs**: 20 |
| - **Dropout**: 0.35 |
|
|
| ## Custom NER Labels |
|
|
| 1. **VERSION_RANGE**: Semantic version strings or version constraints (e.g., "1.2.3", "< 2.0.0") |
| 2. **API_SYMBOL**: Method, class, or function names (e.g., "queryset.filter()", "X.509") |
| 3. **BREAKING_CHANGE**: References to incompatible API changes or deprecations |
| 4. **FIX_ACTION**: Specific remediation steps or upgrade instructions |
|
|
| ## Evaluation Metrics |
|
|
| | Metric | Value | |
| |--------|-------| |
| | Precision | 0.9111 | |
| | Recall | 0.7885 | |
| | F1 Score | 0.8454 | |
| | True Positives | 41 | |
| | False Positives | 4 | |
| | False Negatives | 11 | |
|
|
| ## Usage |
|
|
| ```python |
| import spacy |
| |
| nlp = spacy.load("./spacy-nvd-ner-v1") |
| |
| text = "OpenSSL versions before 1.1.1n contain a buffer overflow in the X.509 verifier." |
| doc = nlp(text) |
| |
| for ent in doc.ents: |
| print(f"{ent.text} -> {ent.label_}") |
| # Output: |
| # 1.1.1n -> VERSION_RANGE |
| # X.509 -> API_SYMBOL |
| ``` |
|
|
| ## Installation |
|
|
| 1. Extract the zip archive to your project directory |
| 2. Load the model using spaCy: |
| ```python |
| import spacy |
| nlp = spacy.load("./spacy-nvd-ner-v1") |
| ``` |
|
|
| ## Architecture |
|
|
| The model consists of: |
| - **Input Layer**: Vectorized token representations |
| - **Hidden Layer**: Feed-forward network with 0.35 dropout |
| - **Output Layer**: 4-class NER tagger (softmax) |
|
|
| ## Training Configuration |
|
|
| - **Optimizer**: SGD |
| - **Batch Size Range**: 8-32 (compounding) |
| - **Training Data**: Real NVD descriptions auto-annotated with GLiNER teacher model |
| - **Constraint**: Exactly 50-example held-out test set (Master Document requirement) |
|
|
| ## Known Limitations |
|
|
| - Model trained on NVD descriptions only; may not generalize to other security domains |
| - Entity boundaries may not align perfectly with whitespace |
| - Requires English text input |
|
|
| ## License |
|
|
| MIT |
|
|