File size: 2,096 Bytes
68dd6f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# Sentinel-D spaCy NER Model (Stage 1 — NVD Parsing)

## Model Details
- **Base Model**: spaCy blank English (`en_core_web_blank`)
- **Task**: Named Entity Recognition (NER)
- **Training Date**: 2026-03-04T21:49:41.890810
- **Framework**: spaCy 3.x
- **Training Data Size**: 550 descriptions + 50-example test set
- **Training Epochs**: 20
- **Dropout**: 0.35

## Custom NER Labels

1. **VERSION_RANGE**: Semantic version strings or version constraints (e.g., "1.2.3", "< 2.0.0")
2. **API_SYMBOL**: Method, class, or function names (e.g., "queryset.filter()", "X.509")
3. **BREAKING_CHANGE**: References to incompatible API changes or deprecations
4. **FIX_ACTION**: Specific remediation steps or upgrade instructions

## Evaluation Metrics

| Metric | Value |
|--------|-------|
| Precision | 0.9111 |
| Recall | 0.7885 |
| F1 Score | 0.8454 |
| True Positives | 41 |
| False Positives | 4 |
| False Negatives | 11 |

## Usage

```python
import spacy

nlp = spacy.load("./spacy-nvd-ner-v1")

text = "OpenSSL versions before 1.1.1n contain a buffer overflow in the X.509 verifier."
doc = nlp(text)

for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")
    # Output:
    # 1.1.1n -> VERSION_RANGE
    # X.509 -> API_SYMBOL
```

## Installation

1. Extract the zip archive to your project directory
2. Load the model using spaCy:
   ```python
   import spacy
   nlp = spacy.load("./spacy-nvd-ner-v1")
   ```

## Architecture

The model consists of:
- **Input Layer**: Vectorized token representations
- **Hidden Layer**: Feed-forward network with 0.35 dropout
- **Output Layer**: 4-class NER tagger (softmax)

## Training Configuration

- **Optimizer**: SGD
- **Batch Size Range**: 8-32 (compounding)
- **Training Data**: Real NVD descriptions auto-annotated with GLiNER teacher model
- **Constraint**: Exactly 50-example held-out test set (Master Document requirement)

## Known Limitations

- Model trained on NVD descriptions only; may not generalize to other security domains
- Entity boundaries may not align perfectly with whitespace
- Requires English text input

## License

MIT