Model Card for vuteco-cb-e2e
vuteco-cb-e2e is a fine-tuned CodeBERT that classifies pairs of JUnit test methods and vulnerability descriptions (from CVE) into two classes:
Relatedif it the method is testing the vulnerability described.NotRelatedif it the method is not testing the vulnerability described.
Model Details
Model Description
VuTeCo is a framework for finding vulnerability-witnessing test cases in Java repositories (Finding) and match them with the right known vulnerability (Matching). More info in its GitHub repository.
This model (vuteco-cb-e2e) is a fine-tuned CodeBERT with a classification head on top of it.
This model is used in VuTeCo for the "Matching" task, which can classify a pair of (1) JUnit test method and (2) an English description of a vulnerability (e.g., the one from CVE) into two classes (it actually returns a probability, with 0.5 used as a classification threshold):
Relatedif it the method is testing the vulnerability described.NotRelatedif it the method is not testing the vulnerability described.
The model input is (1) the raw text of a JUnit test method and (2) the raw text of a vulnerability description, both with no preprocessing.
- Developed by: Hamburg University of Technology
- Funded by: Sec4AI4Sec (Horizon EU)
- Shared by:: Hugging Face
- Model type: Text Classification
- Language(s) (NLP): en
- License: Apache-2.0
- Finetuned from model: CodeBERT
Model Sources [optional]
- Repository: VuTeCo's GitHub repository
- Paper: MSR'26 paper
Uses
Direct Use
The model can be used right away to classify specific types of vulnerability-witnessing tests, e.g., distinguishing the exact vulnerability types that is tested.
Downstream Use [optional]
The model can be further fine-tuned to classify specific types of vulnerability-witnessing tests, e.g., distinguishing the exact vulnerability types that is tested.
It could also be fine-tuned for other testing frameworks (beyond JUnit) and programming languages (Python).
Out-of-Scope Use
N/A
Bias, Risks, and Limitations
The model predictions may be inaccurate (misclassified test methods).
In particular, the reported performance show the model has limited recall, so it often says NotRelated (i.e., returns low probability scores).
Recommendations
Manually validate the predictions made by the model.
How to Get Started with the Model
Please, refer to VuTeCo's GitHub repository for loading and using the model in the correct way.
Training Details
Training Data
This model was fine-tuned on Java repositories and vulnerabilities from Vul4J. Please refer to VuTeCo's GitHub repository for loading the dataset in the correct way.
Training Procedure
Please refer to VuTeCo's GitHub repository for customizing the model training.
Evaluation
Please refer to VuTeCo's GitHub repository for customizing the model evaluation.
Results
Please, refer to the MSR'26 paper for an overview of the main evaluation results. The complete raw results can be found in the paper's online appendix on Zenodo.
Model Examination [optional]
[More Information Needed]
Environmental Impact
N/A
Citation
If you use this model, please cite the MSR'26 paper (the publisher's reference will be available soon):
BibTeX:
@misc{iannone2026matchheavenaidrivenmatching,
title={A Match Made in Heaven? AI-driven Matching of Vulnerabilities and Security Unit Tests},
author={Emanuele Iannone and Quang-Cuong Bui and Riccardo Scandariato},
year={2026},
eprint={2502.03365},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2502.03365},
}
Model Card Authors
- Downloads last month
- 12
Model tree for emaiannone/vuteco-cb-e2e
Base model
microsoft/codebert-base