--- license: mit language: - en tags: - cybersecurity - vulnerability - mitre-attack - text-classification - fine-tuned - securebert base_model: ehsanaghaei/SecureBERT --- # SecureBERT — CVE-LMTune ATT&CK Classifier (Flat)

[![GitHub](https://img.shields.io/badge/GitHub-CVE--LMTune-black?logo=github)](https://github.com/terranovafr/CVE-LMTune) [![Paper](https://img.shields.io/badge/Paper-HAL-green?logo=information&logoColor=white)](https://hal.science/hal-05500820) [![PhD theses.fr](https://img.shields.io/badge/Project-theses.fr-orange?logo=university&logoColor=white)](https://theses.fr/s371241) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Zenodo Data](https://img.shields.io/badge/Zenodo-Data%20Repository-lightblue?logo=information&logoColor=white)](https://doi.org/10.5281/zenodo.16936476) Part of the **CVE-LMTune** model suite, a collection of language models fine-tuned for multi-taxonomy vulnerability classification across widely used cybersecurity taxonomies, including CWE, CAPEC, and MITRE ATT&CK. ## Paper > Franco Terranova, Sana Rekbi, Abdelkader Lahmadi, Isabelle Chrisment. > *Multi-Taxonomy Vulnerability Classification with Hierarchically Finetuned Language Models.* > The 23rd Conference on Detection of Intrusions and Malware & Vulnerability Assessment **(DIMVA '26)**. ## Overview This model performs **multi-label ATT&CK classification** from vulnerability descriptions. Given a CVE-style description, it predicts one or more ATT&CK identifiers associated with the described vulnerability. | Property | Value | |----------|-------| | Taxonomy | MITRE ATT&CK Enterprise Subtechniques | | Task | Multi-label text classification | | Input | Vulnerability description (e.g., CVE summary) | | Output | One or more ATT&CK identifiers | | Number of labels | 175 | | Number of samples | 231,009 | | Latest CVE update included | 17/06/2026 | | Split | train (60%), val (20%), test (20%) | ## Evaluation Results The model was evaluated on the held-out test set using standard multi-label classification metrics using sigmoid activation and a default threshold of 0.5. **Ranking Metrics** | LRAP | MRR | Coverage Error | Label Ranking Loss | P@1 | P@3 | P@5 | R@1 | R@3 | R@5 | |------|-----|----------------|--------------------|-----|-----|-----|-----|-----|-----| | 0.9152 | 0.9460 | 18.79 | 0.0173 | 0.9321 | 0.9084 | 0.8458 | 0.1286 | 0.3779 | 0.5554 | **Threshold = 0.5** | Micro P | Micro R | Micro F1 | Macro F1 | Weighted F1 | Hamming Loss | Subset Accuracy | |--------|--------|----------|----------|------------|--------------|----------------| | 0.8612 | 0.7767 | 0.8168 | 0.4286 | 0.8093 | 0.0264 | 0.6874 | ## Quick Start ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("Sana9/securebert-vuln2attack-flat", use_fast=False) model = AutoModelForSequenceClassification.from_pretrained("Sana9/securebert-vuln2attack-flat") text = "Buffer overflow vulnerability in OpenSSL allows remote attackers to execute arbitrary code." with torch.no_grad(): probs = torch.sigmoid( model(**tokenizer(text, return_tensors="pt", truncation=True)).logits )[0] predictions = { model.config.id2label[i]: p.item() for i, p in enumerate(probs) if p > 0.5 } print(predictions) ``` ## Citation ```bibtex @inproceedings{terranova2026multitaxonomy, author = {Franco Terranova and Sana Rekbi and Abdelkader Lahmadi and Isabelle Chrisment}, title = {Multi-Taxonomy Vulnerability Classification with Hierarchically Finetuned Language Models}, booktitle = {Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA)}, year = {2026}, month = jul, address = {Chania, Crete, Greece}, note = {HAL identifier: hal-05500820v2} } ``` ## Related Resources - 🤗 [Full model suite on Hugging Face](https://huggingface.co/Sana9) - 💻 [CVE-LMTune — Training code (GitHub)](https://github.com/terranovafr/CVE-LMTune) - 📦 [Zenodo — Data repository](https://doi.org/10.5281/zenodo.16936476) ## Disclaimers - This product is a result of the use of the NVD API but is not endorsed or certified by the NVD. The same for the CVE2CAPEC project and the Hugging Face API. - This project relies on data publicly available from the CWE, CAPEC, and MITRE ATT&CK projects. - This work has been partially supported by the French National Research Agency under the France 2030 label (Superviz ANR-22-PECY-0008). The views reflected herein do not necessarily reflect the opinion of the French government.