Exploit Intel commited on
Commit
0b5983c
·
verified ·
1 Parent(s): 14f09b4

Add model card with eval results (exact 0.676 / micro 0.702 / macro 0.511)

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-8B
4
+ datasets:
5
+ - eiphuggincve/cve-cwe-consensus
6
+ language:
7
+ - en
8
+ tags:
9
+ - cybersecurity
10
+ - vulnerability
11
+ - cve
12
+ - cwe
13
+ - text-classification
14
+ - qlora
15
+ - unsloth
16
+ pipeline_tag: text-generation
17
+ library_name: transformers
18
+ ---
19
+
20
+ # CVE → CWE Classifier (Qwen3-8B)
21
+
22
+ A QLoRA fine-tune of **Qwen3-8B** that maps a free-text **CVE description** to the **CWE weakness
23
+ ID(s)** it corresponds to. The LoRA adapter is merged into the base and released in 16-bit, so it
24
+ loads directly with `transformers`.
25
+
26
+ Trained only on labels where **NVD and the CNA agree** after roll-up to **CWE View-1003** — see the
27
+ [`cve-cwe-consensus`](https://huggingface.co/datasets/eiphuggincve/cve-cwe-consensus) dataset.
28
+
29
+ ## Results (held-out test split, 6,802 rows)
30
+
31
+ | Metric | This model | Prior baseline |
32
+ |---|---|---|
33
+ | Exact-match | **0.676** | 0.29 |
34
+ | Micro-F1 | **0.702** | 0.32 |
35
+ | Macro-F1 (125 CWEs) | **0.511** | 0.067 |
36
+
37
+ By difficulty (does the description *name* the weakness, or must it be inferred?):
38
+
39
+ | Stratum | n | Exact-match | Micro-F1 |
40
+ |---|---|---|---|
41
+ | Easy (weakness named) | 2,046 | 0.841 | 0.870 |
42
+ | Hard (must infer) | 4,756 | 0.605 | 0.628 |
43
+
44
+ The high macro-F1 reflects a dataset that caps majority CWEs (e.g. CWE-79) so rare weaknesses are
45
+ actually learned rather than drowned out.
46
+
47
+ ## Usage
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer
52
+
53
+ mid = "eiphuggincve/cve-cwe-qwen3-8b"
54
+ tok = AutoTokenizer.from_pretrained(mid)
55
+ model = AutoModelForCausalLM.from_pretrained(mid, torch_dtype="auto", device_map="auto")
56
+
57
+ messages = [
58
+ {"role": "system", "content": "You are a vulnerability analyst. Given a CVE description, "
59
+ "reply with only the CWE ID(s) it maps to, comma-separated."},
60
+ {"role": "user", "content": "A SQL injection vulnerability in the login endpoint allows an "
61
+ "unauthenticated attacker to execute arbitrary SQL via the username parameter."},
62
+ ]
63
+ inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
64
+ out = model.generate(inputs, max_new_tokens=32, do_sample=False)
65
+ print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
66
+ # -> CWE-89
67
+ ```
68
+
69
+ ## Training
70
+
71
+ - **Base:** `Qwen/Qwen3-8B` (trained 4-bit via `unsloth/qwen3-8b-unsloth-bnb-4bit`)
72
+ - **Method:** QLoRA (4-bit) with Unsloth, merged to 16-bit · released checkpoint: **checkpoint-960** (final; eval loss declined monotonically through training)
73
+ - **Dataset:** [`eiphuggincve/cve-cwe-consensus`](https://huggingface.co/datasets/eiphuggincve/cve-cwe-consensus) — 69,386 rows (55,810 / 6,774 / 6,802), majority CWEs capped at 2,500
74
+ - **Epochs:** 2 · **Context:** 512 · **LR:** 2e-4 · **Optimizer:** AdamW 8-bit · **Scheduler:** linear · **Batch:** 32 · **Weight decay:** 0.01 · **Seed:** 3407
75
+ - **LoRA:** rank 16 / alpha 32 / dropout 0 · **Packing:** on · **Train-on-completions-only:** off
76
+
77
+ ## Prompt format
78
+
79
+ ChatML (Qwen3 standard). System prompt fixed; the description is the only user input — never feed the
80
+ label or CVE-ID.
81
+
82
+ - **system:** `You are a vulnerability analyst. Given a CVE description, reply with only the CWE ID(s) it maps to, comma-separated.`
83
+ - **user:** the CVE description
84
+ - **assistant:** `CWE-79, CWE-80`
85
+
86
+ ## Limitations
87
+
88
+ - CWEs below the dataset's 50-example floor are not in the label space and won't be predicted.
89
+ - Outputs CWE IDs as text and can occasionally emit a malformed/non-existent ID — validate against
90
+ the official CWE list.
91
+ - English-only; descriptions only (no code, CVSS, or references).
92
+ - A triage/assist aid, not an authoritative CWE assignment — human-review before acting.
93
+
94
+ ## License
95
+
96
+ Apache-2.0 (inherited from Qwen3-8B). Dataset derives from public upstreams (NVD, MITRE CVE/CWE).