CIRCL
/

vulnerability-severity-classification-chinese-macbert-base

@@ -1,84 +1,75 @@
 ---
-language:
-- zh
-license: cc-by-4.0
 library_name: transformers
-tags:
-- text-classification
-- vulnerability
-- severity
-- cybersecurity
-- cnvd
-datasets:
-- CIRCL/Vulnerability-CNVD
 base_model: hfl/chinese-macbert-base
-pipeline_tag: text-classification
 ---
-# VLAI: Automated Vulnerability Severity Classification (Chinese Text)
-A fine-tuned [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) model for classifying Chinese vulnerability descriptions from the [China National Vulnerability Database (CNVD)](https://www.cnvd.org.cn/) into three severity levels: **Low**, **Medium**, and **High**.
-Trained on the [CIRCL/Vulnerability-CNVD](https://huggingface.co/datasets/CIRCL/Vulnerability-CNVD) dataset as part of the [VulnTrain](https://github.com/vulnerability-lookup/VulnTrain) project.
-## Evaluation results
-Evaluated on a **deduplicated test set** (25,664 samples) where no description text appears in both train and test splits, preventing data leakage from CNVD's reuse of boilerplate descriptions across different vulnerability IDs.
-| Class  | Precision | Recall | F1-score | Support |
-|--------|-----------|--------|----------|---------|
-| Low    | 0.6091    | 0.3966 | 0.4804   | 2,267   |
-| Medium | 0.7743 | 0.8387 | 0.8052 | 14,353 |
-| High   | 0.7808   | 0.7461 | 0.7631   | 9,044   |
-- **Overall accuracy**: 76.70%
-- **Macro F1**: 0.6829
-### Class distribution
-The dataset is imbalanced: Low (8.8%), Medium (55.9%), High (35.2%).
-## Usage
-```python
-from transformers import pipeline
-classifier = pipeline(
-    "text-classification",
-    model="CIRCL/vulnerability-severity-classification-chinese-macbert-base"
-)
-description = "TOTOLINK A3600R存在缓冲区溢出漏洞，攻击者可利用该漏洞在系统上执行任意代码或者导致拒绝服务。"
-result = classifier(description)
-print(result)
-```
-## Known limitations
-- **Low severity recall**: the Low class has the lowest recall. Approximately 60% of Low-severity entries are misclassified, mostly as Medium. This reflects the vocabulary overlap between Low and Medium descriptions in CNVD data. Class-weighted loss and focal loss were tested but all degraded Medium recall disproportionately without a net benefit.
-- **Keyword dependency**: the model biases toward a vulnerability type's typical severity. For example, buffer overflow descriptions are predicted as High regardless of the actual assigned severity. On entries where the actual severity deviates from the type's typical severity, accuracy drops from ~89% to ~55%.
-- **Negation blindness**: the model does not understand negation. Descriptions like "does NOT allow remote code execution" can still produce high-confidence High severity predictions.
-- **CVE overlap**: 81% of CNVD entries have a corresponding CVE. The model primarily adds value for the ~19% of CNVD-only entries (concentrated in Chinese domestic software) where no CVE severity assessment exists.
-These limitations were identified through independent analysis in [VulnTrain#19](https://github.com/vulnerability-lookup/VulnTrain/issues/19).
-## Training details
-- **Base model**: [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base)
-- **Dataset**: [CIRCL/Vulnerability-CNVD](https://huggingface.co/datasets/CIRCL/Vulnerability-CNVD)
-- **Train/test split**: deduplicated on description text (no leakage), 80/20 split
-- **Loss**: uniform cross-entropy (no class weighting)
-- **Learning rate**: 3e-05
-- **Batch size**: 16
-- **Epochs**: 5
-- **Best model selection**: by accuracy
-## References
-- [Vulnerability-Lookup](https://vulnerability.circl.lu) — the vulnerability data source
-- [VulnTrain](https://github.com/vulnerability-lookup/VulnTrain) — training pipeline
-- [ML-Gateway](https://github.com/vulnerability-lookup/ML-Gateway) — inference API
-- [VLAI paper](https://arxiv.org/abs/2507.03607) — Bonhomme, C., Dulaunoy, A. (2025)

 ---
 library_name: transformers
+license: apache-2.0
 base_model: hfl/chinese-macbert-base
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: vulnerability-severity-classification-chinese-macbert-base
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# vulnerability-severity-classification-chinese-macbert-base
+This model is a fine-tuned version of [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.3186
+- Accuracy: 0.7657
+- F1 Macro: 0.6796
+- Low Precision: 0.5544
+- Low Recall: 0.3987
+- Low F1: 0.4638
+- Medium Precision: 0.7805
+- Medium Recall: 0.8196
+- Medium F1: 0.7996
+- High Precision: 0.7787
+- High Recall: 0.7720
+- High F1: 0.7753
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 3e-05
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 5
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss | Accuracy | F1 Macro | Low Precision | Low Recall | Low F1 | Medium Precision | Medium Recall | Medium F1 | High Precision | High Recall | High F1 |
+|:-------------:|:-----:|:-----:|:---------------:|:--------:|:--------:|:-------------:|:----------:|:------:|:----------------:|:-------------:|:---------:|:--------------:|:-----------:|:-------:|
+| 1.2165        | 1.0   | 3221  | 1.2363          | 0.7429   | 0.5960   | 0.6380        | 0.1631     | 0.2598 | 0.7335           | 0.8549        | 0.7895    | 0.7679         | 0.7115      | 0.7386  |
+| 1.1430        | 2.0   | 6442  | 1.1676          | 0.7625   | 0.6464   | 0.6316        | 0.2643     | 0.3726 | 0.7548           | 0.8568        | 0.8026    | 0.7909         | 0.7386      | 0.7639  |
+| 0.8890        | 3.0   | 9663  | 1.1915          | 0.7631   | 0.6690   | 0.5884        | 0.3470     | 0.4365 | 0.7833           | 0.8091        | 0.7960    | 0.7564         | 0.7933      | 0.7744  |
+| 0.8253        | 4.0   | 12884 | 1.2354          | 0.7675   | 0.6796   | 0.5739        | 0.3874     | 0.4626 | 0.7765           | 0.8305        | 0.8026    | 0.7849         | 0.7630      | 0.7738  |
+| 0.5851        | 5.0   | 16105 | 1.3186          | 0.7657   | 0.6796   | 0.5544        | 0.3987     | 0.4638 | 0.7805           | 0.8196        | 0.7996    | 0.7787         | 0.7720      | 0.7753  |
+### Framework versions
+- Transformers 5.9.0
+- Pytorch 2.12.0+cu130
+- Datasets 4.8.5
+- Tokenizers 0.22.2

config.json CHANGED Viewed

@@ -39,7 +39,7 @@
   "pooler_type": "first_token_transform",
   "problem_type": "single_label_classification",
   "tie_word_embeddings": true,
-  "transformers_version": "5.8.1",
   "type_vocab_size": 2,
   "use_cache": false,
   "vocab_size": 21128

   "pooler_type": "first_token_transform",
   "problem_type": "single_label_classification",
   "tie_word_embeddings": true,
+  "transformers_version": "5.9.0",
   "type_vocab_size": 2,
   "use_cache": false,
   "vocab_size": 21128

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:85b042c63a5b88954da96b16469622785fb8241352817f2c25b7685e1f663473
 size 409103316

 version https://git-lfs.github.com/spec/v1
+oid sha256:9c3c95c84cd5eda329a19bb85b5541b787efe845b3928db83a1fc51f11ec0245
 size 409103316

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:575baa156bdc5e5f07e63e7d51c0971eb2b304d73bb6c1c735b76e9ac8a00b4c
 size 5329

 version https://git-lfs.github.com/spec/v1
+oid sha256:fe676f030dfeaf223c576a6c298bff30cc2b00d682b7d519687ed39eaba7c8da
 size 5329