CIRCL
/

vulnerability-severity-classification-roberta-base

@@ -1,68 +1,37 @@
 ---
 library_name: transformers
-license: cc-by-4.0
 base_model: roberta-base
-metrics:
-- accuracy
 tags:
 - generated_from_trainer
-- text-classification
-- classification
-- nlp
-- vulnerability
 model-index:
 - name: vulnerability-severity-classification-roberta-base
   results: []
-datasets:
-- CIRCL/vulnerability-scores
 ---
-# VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification
-# Severity classification
-This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the dataset [CIRCL/vulnerability-scores](https://huggingface.co/datasets/CIRCL/vulnerability-scores).
-The model was presented in the paper [VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification](https://huggingface.co/papers/2507.03607) [[arXiv](https://arxiv.org/abs/2507.03607)].
-**Abstract:** VLAI is a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.
-You can read [this page](https://www.vulnerability-lookup.org/user-manual/ai/) for more information.
 ## Model description
-It is a classification model and is aimed to assist in classifying vulnerabilities by severity based on their descriptions.
-## How to get started with the model
-```python
-from transformers import AutoModelForSequenceClassification, AutoTokenizer
-import torch
-labels = ["low", "medium", "high", "critical"]
-model_name = "CIRCL/vulnerability-severity-classification-roberta-base"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForSequenceClassification.from_pretrained(model_name)
-model.eval()
-test_description = "SAP NetWeaver Visual Composer Metadata Uploader is not protected with a proper authorization, allowing unauthenticated agent to upload potentially malicious executable binaries \
-that could severely harm the host system. This could significantly affect the confidentiality, integrity, and availability of the targeted system."
-inputs = tokenizer(test_description, return_tensors="pt", truncation=True, padding=True)
-# Run inference
-with torch.no_grad():
-    outputs = model(**inputs)
-    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
-# Print results
-print("Predictions:", predictions)
-predicted_class = torch.argmax(predictions, dim=-1).item()
-print("Predicted severity:", labels[predicted_class])
-```
 ## Training procedure
@@ -77,24 +46,20 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - num_epochs: 5
-It achieves the following results on the evaluation set:
-- Loss: 0.4941
-- Accuracy: 0.8231
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss | Accuracy |
 |:-------------:|:-----:|:-----:|:---------------:|:--------:|
-| 0.6002        | 1.0   | 15170 | 0.6387          | 0.7427   |
-| 0.5361        | 2.0   | 30340 | 0.5631          | 0.7751   |
-| 0.4608        | 3.0   | 45510 | 0.5208          | 0.7970   |
-| 0.3383        | 4.0   | 60680 | 0.4975          | 0.8150   |
-| 0.3325        | 5.0   | 75850 | 0.4941          | 0.8231   |
 ### Framework versions
-- Transformers 4.57.3
-- Pytorch 2.9.1+cu128
-- Datasets 4.4.2
 - Tokenizers 0.22.2

 ---
 library_name: transformers
+license: mit
 base_model: roberta-base
 tags:
 - generated_from_trainer
+metrics:
+- accuracy
 model-index:
 - name: vulnerability-severity-classification-roberta-base
   results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# vulnerability-severity-classification-roberta-base
+This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 2.0264
+- Accuracy: 0.8207
 ## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
 ## Training procedure
 - lr_scheduler_type: linear
 - num_epochs: 5
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss | Accuracy |
 |:-------------:|:-----:|:-----:|:---------------:|:--------:|
+| 2.3559        | 1.0   | 15202 | 2.5301          | 0.7425   |
+| 2.2821        | 2.0   | 30404 | 2.2508          | 0.7737   |
+| 2.0705        | 3.0   | 45606 | 2.1307          | 0.7943   |
+| 1.9612        | 4.0   | 60808 | 2.0244          | 0.8115   |
+| 1.3880        | 5.0   | 76010 | 2.0264          | 0.8207   |
 ### Framework versions
+- Transformers 5.0.0
+- Pytorch 2.10.0+cu128
+- Datasets 4.5.0
 - Tokenizers 0.22.2

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a6ebf29a82cfd62d9e9c545449c61b1fd0664ceb068baa1ef840a38afca676e4
 size 498618952

 version https://git-lfs.github.com/spec/v1
+oid sha256:6d7be584f282015b62437c57066da9868a258b1f50b087e3fec163cf1fb2f22e
 size 498618952

tokenizer.json CHANGED Viewed

@@ -59,7 +59,7 @@
       "single_word": false,
       "lstrip": true,
       "rstrip": false,
-      "normalized": false,
       "special": true
     }
   ],

       "single_word": false,
       "lstrip": true,
       "rstrip": false,
+      "normalized": true,
       "special": true
     }
   ],

tokenizer_config.json CHANGED Viewed

@@ -1,53 +1,11 @@
 {
   "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "0": {
-      "content": "<s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "1": {
-      "content": "<pad>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "2": {
-      "content": "</s>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "3": {
-      "content": "<unk>",
-      "lstrip": false,
-      "normalized": true,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "50264": {
-      "content": "<mask>",
-      "lstrip": true,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    }
-  },
   "bos_token": "<s>",
-  "clean_up_tokenization_spaces": false,
   "cls_token": "<s>",
   "eos_token": "</s>",
   "errors": "replace",
-  "extra_special_tokens": {},
   "mask_token": "<mask>",
   "model_max_length": 512,
   "pad_token": "<pad>",

 {
   "add_prefix_space": false,
+  "backend": "tokenizers",
   "bos_token": "<s>",
   "cls_token": "<s>",
   "eos_token": "</s>",
   "errors": "replace",
+  "is_local": false,
   "mask_token": "<mask>",
   "model_max_length": 512,
   "pad_token": "<pad>",