Sana9's picture
Fixed bug tokenizer SecureBERT (#1)
d59be39
|
Raw
History Blame Contribute Delete
5.48 kB
---
license: mit
language:
- en
tags:
- cybersecurity
- vulnerability
- mitre-attack
- text-classification
- fine-tuned
- securebert
base_model: ehsanaghaei/SecureBERT
---
# SecureBERT β€” CVE-LMTune ATT&CK Classifier (Flat)
<div align="center" style="display:inline-flex; gap:18px; align-items:center; flex-wrap:nowrap;"> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/Logo_Universit%C3%A9_de_Lorraine.svg/1280px-Logo_Universit%C3%A9_de_Lorraine.svg.png" alt="Universite de Lorraine" style="height:50px; width:auto;" /> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Inr_logo_rouge.svg/1280px-Inr_logo_rouge.svg.png" alt="INRIA" style="height:50px; width:auto;" /> <img src="https://upload.wikimedia.org/wikipedia/fr/6/6e/Logo_loria_abrege_couleur.png" alt="LORIA" style="height:70px; width:auto;" /> <img src="https://www.pepr-cybersecurite.fr/wp-content/uploads/2023/09/pep-cybersecurite-550x250-1.png" alt="SuperViZ" style="height:70px; width:auto;" /> </div>
[![GitHub](https://img.shields.io/badge/GitHub-CVE--LMTune-black?logo=github)](https://github.com/terranovafr/CVE-LMTune)
[![Paper](https://img.shields.io/badge/Paper-HAL-green?logo=information&logoColor=white)](https://hal.science/hal-05500820)
[![PhD theses.fr](https://img.shields.io/badge/Project-theses.fr-orange?logo=university&logoColor=white)](https://theses.fr/s371241)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Zenodo Data](https://img.shields.io/badge/Zenodo-Data%20Repository-lightblue?logo=information&logoColor=white)](https://doi.org/10.5281/zenodo.16936476)
Part of the **CVE-LMTune** model suite, a collection of language models fine-tuned for multi-taxonomy vulnerability classification across widely used cybersecurity taxonomies, including CWE, CAPEC, and MITRE ATT&CK.
## Paper
> Franco Terranova, Sana Rekbi, Abdelkader Lahmadi, Isabelle Chrisment.
> *Multi-Taxonomy Vulnerability Classification with Hierarchically Finetuned Language Models.*
> The 23rd Conference on Detection of Intrusions and Malware & Vulnerability Assessment **(DIMVA '26)**.
## Overview
This model performs **multi-label ATT&CK classification** from vulnerability descriptions. Given a CVE-style description, it predicts one or more ATT&CK identifiers associated with the described vulnerability.
| Property | Value |
|----------|-------|
| Taxonomy | MITRE ATT&CK Enterprise Subtechniques |
| Task | Multi-label text classification |
| Input | Vulnerability description (e.g., CVE summary) |
| Output | One or more ATT&CK identifiers |
| Number of labels | 175 |
| Number of samples | 231,009 |
| Latest CVE update included | 17/06/2026 |
| Split | train (60%), val (20%), test (20%) |
## Evaluation Results
The model was evaluated on the held-out test set using standard multi-label classification metrics using sigmoid activation and a default threshold of 0.5.
**Ranking Metrics**
| LRAP | MRR | Coverage Error | Label Ranking Loss | P@1 | P@3 | P@5 | R@1 | R@3 | R@5 |
|------|-----|----------------|--------------------|-----|-----|-----|-----|-----|-----|
| 0.9152 | 0.9460 | 18.79 | 0.0173 | 0.9321 | 0.9084 | 0.8458 | 0.1286 | 0.3779 | 0.5554 |
**Threshold = 0.5**
| Micro P | Micro R | Micro F1 | Macro F1 | Weighted F1 | Hamming Loss | Subset Accuracy |
|--------|--------|----------|----------|------------|--------------|----------------|
| 0.8612 | 0.7767 | 0.8168 | 0.4286 | 0.8093 | 0.0264 | 0.6874 |
## Quick Start
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("Sana9/securebert-vuln2attack-flat", use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained("Sana9/securebert-vuln2attack-flat")
text = "Buffer overflow vulnerability in OpenSSL allows remote attackers to execute arbitrary code."
with torch.no_grad():
probs = torch.sigmoid(
model(**tokenizer(text, return_tensors="pt", truncation=True)).logits
)[0]
predictions = {
model.config.id2label[i]: p.item()
for i, p in enumerate(probs)
if p > 0.5
}
print(predictions)
```
## Citation
```bibtex
@inproceedings{terranova2026multitaxonomy,
author = {Franco Terranova and Sana Rekbi and Abdelkader Lahmadi and Isabelle Chrisment},
title = {Multi-Taxonomy Vulnerability Classification with Hierarchically Finetuned Language Models},
booktitle = {Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA)},
year = {2026},
month = jul,
address = {Chania, Crete, Greece},
note = {HAL identifier: hal-05500820v2}
}
```
## Related Resources
- πŸ€— [Full model suite on Hugging Face](https://huggingface.co/Sana9)
- πŸ’» [CVE-LMTune β€” Training code (GitHub)](https://github.com/terranovafr/CVE-LMTune)
- πŸ“¦ [Zenodo β€” Data repository](https://doi.org/10.5281/zenodo.16936476)
## Disclaimers
- This product is a result of the use of the NVD API but is not endorsed or certified by the NVD. The same for the CVE2CAPEC project and the Hugging Face API.
- This project relies on data publicly available from the CWE, CAPEC, and MITRE ATT&CK projects.
- This work has been partially supported by the French National Research Agency under the France 2030 label (Superviz ANR-22-PECY-0008). The views reflected herein do not necessarily reflect the opinion of the French government.