cedricbonhomme commited on
Commit
c082709
·
verified ·
1 Parent(s): efd3487

Update model card with honest metrics and known limitations

Browse files
Files changed (1) hide show
  1. README.md +64 -54
README.md CHANGED
@@ -1,75 +1,85 @@
1
  ---
2
- library_name: transformers
 
3
  license: apache-2.0
4
- base_model: hfl/chinese-macbert-base
5
  tags:
6
- - generated_from_trainer
7
- metrics:
8
- - accuracy
9
- model-index:
10
- - name: vulnerability-severity-classification-chinese-macbert-base-test
11
- results: []
 
 
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- # vulnerability-severity-classification-chinese-macbert-base-test
18
 
19
- This model is a fine-tuned version of [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) on an unknown dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 2.4903
22
- - Accuracy: 0.7699
23
- - F1 Macro: 0.6888
24
- - Low Precision: 0.6247
25
- - Low Recall: 0.4030
26
- - Low F1: 0.4899
27
- - Medium Precision: 0.7837
28
- - Medium Recall: 0.8259
29
- - Medium F1: 0.8042
30
- - High Precision: 0.7706
31
- - High Recall: 0.7740
32
- - High F1: 0.7723
33
 
34
- ## Model description
35
 
36
- More information needed
 
37
 
38
- ## Intended uses & limitations
 
 
 
39
 
40
- More information needed
 
 
 
 
41
 
42
- ## Training and evaluation data
43
 
44
- More information needed
45
 
46
- ## Training procedure
47
 
48
- ### Training hyperparameters
49
 
50
- The following hyperparameters were used during training:
51
- - learning_rate: 3e-05
52
- - train_batch_size: 64
53
- - eval_batch_size: 64
54
- - seed: 42
55
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
56
- - lr_scheduler_type: linear
57
- - num_epochs: 5
58
 
59
- ### Training results
60
 
61
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Macro | Low Precision | Low Recall | Low F1 | Medium Precision | Medium Recall | Medium F1 | High Precision | High Recall | High F1 |
62
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:-------------:|:----------:|:------:|:----------------:|:-------------:|:---------:|:--------------:|:-----------:|:-------:|
63
- | 2.2603 | 1.0 | 1590 | 2.3939 | 0.7494 | 0.6085 | 0.6964 | 0.1771 | 0.2823 | 0.7452 | 0.8456 | 0.7922 | 0.7604 | 0.7419 | 0.7510 |
64
- | 2.3054 | 2.0 | 3180 | 2.2652 | 0.7657 | 0.6696 | 0.6445 | 0.3345 | 0.4404 | 0.7669 | 0.8450 | 0.8040 | 0.7799 | 0.7494 | 0.7643 |
65
- | 1.7666 | 3.0 | 4770 | 2.3074 | 0.7663 | 0.6847 | 0.6148 | 0.3995 | 0.4843 | 0.7821 | 0.8202 | 0.8007 | 0.7649 | 0.7736 | 0.7692 |
66
- | 1.7766 | 4.0 | 6360 | 2.3694 | 0.7670 | 0.6888 | 0.5959 | 0.4200 | 0.4927 | 0.7901 | 0.8082 | 0.7990 | 0.7604 | 0.7893 | 0.7746 |
67
- | 1.4851 | 5.0 | 7950 | 2.4903 | 0.7699 | 0.6888 | 0.6247 | 0.4030 | 0.4899 | 0.7837 | 0.8259 | 0.8042 | 0.7706 | 0.7740 | 0.7723 |
68
 
 
 
 
 
 
 
 
 
69
 
70
- ### Framework versions
71
 
72
- - Transformers 5.4.0
73
- - Pytorch 2.11.0+cu130
74
- - Datasets 4.8.4
75
- - Tokenizers 0.22.2
 
1
  ---
2
+ language:
3
+ - zh
4
  license: apache-2.0
5
+ library_name: transformers
6
  tags:
7
+ - text-classification
8
+ - vulnerability
9
+ - severity
10
+ - cybersecurity
11
+ - cnvd
12
+ datasets:
13
+ - CIRCL/Vulnerability-CNVD
14
+ base_model: hfl/chinese-macbert-base
15
+ pipeline_tag: text-classification
16
  ---
17
 
18
+ # VLAI: Automated Vulnerability Severity Classification (Chinese Text)
19
+
20
+ A fine-tuned [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) model for classifying Chinese vulnerability descriptions from the [China National Vulnerability Database (CNVD)](https://www.cnvd.org.cn/) into three severity levels: **Low**, **Medium**, and **High**.
21
+
22
+ Trained on the [CIRCL/Vulnerability-CNVD](https://huggingface.co/datasets/CIRCL/Vulnerability-CNVD) dataset as part of the [VulnTrain](https://github.com/vulnerability-lookup/VulnTrain) project.
23
+
24
+ ## Evaluation results
25
+
26
+ Evaluated on a **deduplicated test set** (25,845 samples) where no description text appears in both train and test splits, preventing data leakage from CNVD's reuse of boilerplate descriptions across different vulnerability IDs.
27
+
28
+ | Class | Precision | Recall | F1-score | Support |
29
+ |--------|-----------|--------|----------|---------|
30
+ | Low | 0.5968 | 0.4099 | 0.4860 | 2,293 |
31
+ | Medium | 0.7867 | 0.8165 | 0.8013 | 14,351 |
32
+ | High | 0.7662 | 0.7809 | 0.7735 | 9,201 |
33
+
34
+ - **Overall accuracy**: 76.8%
35
+ - **Macro F1**: 0.6870
36
+ - **Weighted F1**: 0.7634
37
 
38
+ ### Class distribution
39
 
40
+ The dataset is imbalanced: Low (8.9%), Medium (55.5%), High (35.6%).
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
+ ## Usage
43
 
44
+ ```python
45
+ from transformers import pipeline
46
 
47
+ classifier = pipeline(
48
+ "text-classification",
49
+ model="CIRCL/vulnerability-severity-classification-chinese-macbert-base"
50
+ )
51
 
52
+ description = "TOTOLINK A3600R存在缓冲区溢出漏洞,攻击者可利用该漏洞在系统上执行任意代码或者导致拒绝服务。"
53
+ result = classifier(description)
54
+ print(result)
55
+ # [{'label': 'High', 'score': 0.98}]
56
+ ```
57
 
58
+ ## Known limitations
59
 
60
+ - **Low severity recall is ~41%**: approximately 60% of Low-severity entries are misclassified, mostly as Medium. This reflects the vocabulary overlap between Low and Medium descriptions in CNVD data. Class-weighted loss and focal loss were tested but all degraded Medium recall disproportionately without a net benefit.
61
 
62
+ - **Keyword dependency**: the model biases toward a vulnerability type's typical severity. For example, buffer overflow descriptions are predicted as High regardless of the actual assigned severity. On entries where the actual severity deviates from the type's typical severity, accuracy drops from ~89% to ~55%.
63
 
64
+ - **Negation blindness**: the model does not understand negation. Descriptions like "does NOT allow remote code execution" can still produce high-confidence High severity predictions.
65
 
66
+ - **CVE overlap**: 81% of CNVD entries have a corresponding CVE. The model primarily adds value for the ~19% of CNVD-only entries (concentrated in Chinese domestic software) where no CVE severity assessment exists.
 
 
 
 
 
 
 
67
 
68
+ These limitations were identified through independent analysis in [VulnTrain#19](https://github.com/vulnerability-lookup/VulnTrain/issues/19).
69
 
70
+ ## Training details
 
 
 
 
 
 
71
 
72
+ - **Base model**: [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base)
73
+ - **Dataset**: [CIRCL/Vulnerability-CNVD](https://huggingface.co/datasets/CIRCL/Vulnerability-CNVD)
74
+ - **Train/test split**: deduplicated on description text (no leakage), 80/20 split
75
+ - **Loss**: uniform cross-entropy (no class weighting)
76
+ - **Learning rate**: 3e-05
77
+ - **Batch size**: 16
78
+ - **Epochs**: 5
79
+ - **Best model selection**: by accuracy
80
 
81
+ ## References
82
 
83
+ - [Vulnerability-Lookup](https://vulnerability.circl.lu) - the vulnerability data source
84
+ - [VulnTrain](https://github.com/vulnerability-lookup/VulnTrain) - training pipeline
85
+ - [ML-Gateway](https://github.com/vulnerability-lookup/ML-Gateway) - inference API