cedricbonhomme commited on
Commit
d226ee0
·
verified ·
1 Parent(s): a2fdad6

Update model card with evaluation metrics

Browse files
Files changed (1) hide show
  1. README.md +63 -54
README.md CHANGED
@@ -1,75 +1,84 @@
1
  ---
 
 
 
2
  library_name: transformers
3
- license: apache-2.0
4
- base_model: hfl/chinese-macbert-base
5
  tags:
6
- - generated_from_trainer
7
- metrics:
8
- - accuracy
9
- model-index:
10
- - name: vulnerability-severity-classification-chinese-macbert-base
11
- results: []
 
 
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- # vulnerability-severity-classification-chinese-macbert-base
18
 
19
- This model is a fine-tuned version of [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) on an unknown dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 2.5112
22
- - Accuracy: 0.7649
23
- - F1 Macro: 0.6854
24
- - Low Precision: 0.5950
25
- - Low Recall: 0.4213
26
- - Low F1: 0.4933
27
- - Medium Precision: 0.7735
28
- - Medium Recall: 0.8355
29
- - Medium F1: 0.8033
30
- - High Precision: 0.7812
31
- - High Recall: 0.7391
32
- - High F1: 0.7595
33
 
34
- ## Model description
35
 
36
- More information needed
 
37
 
38
- ## Intended uses & limitations
 
 
 
39
 
40
- More information needed
 
 
 
41
 
42
- ## Training and evaluation data
43
 
44
- More information needed
45
 
46
- ## Training procedure
47
 
48
- ### Training hyperparameters
49
 
50
- The following hyperparameters were used during training:
51
- - learning_rate: 3e-05
52
- - train_batch_size: 64
53
- - eval_batch_size: 64
54
- - seed: 42
55
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
56
- - lr_scheduler_type: linear
57
- - num_epochs: 5
58
 
59
- ### Training results
60
 
61
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Macro | Low Precision | Low Recall | Low F1 | Medium Precision | Medium Recall | Medium F1 | High Precision | High Recall | High F1 |
62
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:-------------:|:----------:|:------:|:----------------:|:-------------:|:---------:|:--------------:|:-----------:|:-------:|
63
- | 2.5562 | 1.0 | 1608 | 2.4217 | 0.7411 | 0.6552 | 0.5078 | 0.3882 | 0.44 | 0.7645 | 0.7937 | 0.7788 | 0.7474 | 0.7461 | 0.7467 |
64
- | 1.9715 | 2.0 | 3216 | 2.3503 | 0.7604 | 0.6720 | 0.6029 | 0.3710 | 0.4593 | 0.7701 | 0.8306 | 0.7992 | 0.7684 | 0.7467 | 0.7574 |
65
- | 1.9176 | 3.0 | 4824 | 2.2952 | 0.7594 | 0.6822 | 0.5748 | 0.4389 | 0.4977 | 0.7632 | 0.8405 | 0.8000 | 0.7914 | 0.7110 | 0.7490 |
66
- | 1.6097 | 4.0 | 6432 | 2.3841 | 0.7670 | 0.6829 | 0.6091 | 0.3966 | 0.4804 | 0.7743 | 0.8387 | 0.8052 | 0.7808 | 0.7461 | 0.7631 |
67
- | 1.3958 | 5.0 | 8040 | 2.5112 | 0.7649 | 0.6854 | 0.5950 | 0.4213 | 0.4933 | 0.7735 | 0.8355 | 0.8033 | 0.7812 | 0.7391 | 0.7595 |
68
 
 
 
 
 
 
 
 
 
69
 
70
- ### Framework versions
71
 
72
- - Transformers 5.8.1
73
- - Pytorch 2.12.0+cu130
74
- - Datasets 4.8.5
75
- - Tokenizers 0.22.2
 
1
  ---
2
+ language:
3
+ - zh
4
+ license: cc-by-4.0
5
  library_name: transformers
 
 
6
  tags:
7
+ - text-classification
8
+ - vulnerability
9
+ - severity
10
+ - cybersecurity
11
+ - cnvd
12
+ datasets:
13
+ - CIRCL/Vulnerability-CNVD
14
+ base_model: hfl/chinese-macbert-base
15
+ pipeline_tag: text-classification
16
  ---
17
 
18
+ # VLAI: Automated Vulnerability Severity Classification (Chinese Text)
19
+
20
+ A fine-tuned [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) model for classifying Chinese vulnerability descriptions from the [China National Vulnerability Database (CNVD)](https://www.cnvd.org.cn/) into three severity levels: **Low**, **Medium**, and **High**.
21
+
22
+ Trained on the [CIRCL/Vulnerability-CNVD](https://huggingface.co/datasets/CIRCL/Vulnerability-CNVD) dataset as part of the [VulnTrain](https://github.com/vulnerability-lookup/VulnTrain) project.
23
+
24
+ ## Evaluation results
25
+
26
+ Evaluated on a **deduplicated test set** (25,664 samples) where no description text appears in both train and test splits, preventing data leakage from CNVD's reuse of boilerplate descriptions across different vulnerability IDs.
27
+
28
+ | Class | Precision | Recall | F1-score | Support |
29
+ |--------|-----------|--------|----------|---------|
30
+ | Low | 0.6091 | 0.3966 | 0.4804 | 2,267 |
31
+ | Medium | 0.7743 | 0.8387 | 0.8052 | 14,353 |
32
+ | High | 0.7808 | 0.7461 | 0.7631 | 9,044 |
33
+
34
+ - **Overall accuracy**: 76.70%
35
+ - **Macro F1**: 0.6829
36
 
37
+ ### Class distribution
38
 
39
+ The dataset is imbalanced: Low (8.8%), Medium (55.9%), High (35.2%).
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
+ ## Usage
42
 
43
+ ```python
44
+ from transformers import pipeline
45
 
46
+ classifier = pipeline(
47
+ "text-classification",
48
+ model="CIRCL/vulnerability-severity-classification-chinese-macbert-base"
49
+ )
50
 
51
+ description = "TOTOLINK A3600R存在缓冲区溢出漏洞,攻击者可利用该漏洞在系统上执行任意代码或者导致拒绝服务。"
52
+ result = classifier(description)
53
+ print(result)
54
+ ```
55
 
56
+ ## Known limitations
57
 
58
+ - **Low severity recall**: the Low class has the lowest recall. Approximately 60% of Low-severity entries are misclassified, mostly as Medium. This reflects the vocabulary overlap between Low and Medium descriptions in CNVD data. Class-weighted loss and focal loss were tested but all degraded Medium recall disproportionately without a net benefit.
59
 
60
+ - **Keyword dependency**: the model biases toward a vulnerability type's typical severity. For example, buffer overflow descriptions are predicted as High regardless of the actual assigned severity. On entries where the actual severity deviates from the type's typical severity, accuracy drops from ~89% to ~55%.
61
 
62
+ - **Negation blindness**: the model does not understand negation. Descriptions like "does NOT allow remote code execution" can still produce high-confidence High severity predictions.
63
 
64
+ - **CVE overlap**: 81% of CNVD entries have a corresponding CVE. The model primarily adds value for the ~19% of CNVD-only entries (concentrated in Chinese domestic software) where no CVE severity assessment exists.
 
 
 
 
 
 
 
65
 
66
+ These limitations were identified through independent analysis in [VulnTrain#19](https://github.com/vulnerability-lookup/VulnTrain/issues/19).
67
 
68
+ ## Training details
 
 
 
 
 
 
69
 
70
+ - **Base model**: [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base)
71
+ - **Dataset**: [CIRCL/Vulnerability-CNVD](https://huggingface.co/datasets/CIRCL/Vulnerability-CNVD)
72
+ - **Train/test split**: deduplicated on description text (no leakage), 80/20 split
73
+ - **Loss**: uniform cross-entropy (no class weighting)
74
+ - **Learning rate**: 3e-05
75
+ - **Batch size**: 16
76
+ - **Epochs**: 5
77
+ - **Best model selection**: by accuracy
78
 
79
+ ## References
80
 
81
+ - [Vulnerability-Lookup](https://vulnerability.circl.lu) — the vulnerability data source
82
+ - [VulnTrain](https://github.com/vulnerability-lookup/VulnTrain) — training pipeline
83
+ - [ML-Gateway](https://github.com/vulnerability-lookup/ML-Gateway) — inference API
84
+ - [VLAI paper](https://arxiv.org/abs/2507.03607) — Bonhomme, C., Dulaunoy, A. (2025)