--- license: apache-2.0 language: - en base_model: - microsoft/codebert-base pipeline_tag: text-classification library_name: transformers tags: - code --- # Model Card for vuteco-cb-e2e `vuteco-cb-e2e` is a fine-tuned [CodeBERT](https://huggingface.co/microsoft/codebert-base) that classifies pairs of JUnit test methods and vulnerability descriptions (from CVE) into two classes: - `Related` if it the method is testing the vulnerability described. - `NotRelated` if it the method is not testing the vulnerability described. ## Model Details ### Model Description VuTeCo is a framework for finding vulnerability-witnessing test cases in Java repositories (Finding) and match them with the right known vulnerability (Matching). More info in its [GitHub repository](https://github.com/tuhh-softsec/vuteco). This model (`vuteco-cb-e2e`) is a fine-tuned [CodeBERT](https://huggingface.co/microsoft/codebert-base) with a classification head on top of it. This model is used in VuTeCo for the "Matching" task, which can classify a pair of (1) JUnit test method and (2) an English description of a vulnerability (e.g., the one from CVE) into two classes (it actually returns a probability, with `0.5` used as a classification threshold): - `Related` if it the method is testing the vulnerability described. - `NotRelated` if it the method is not testing the vulnerability described. The model input is (1) the raw text of a JUnit test method and (2) the raw text of a vulnerability description, both with no preprocessing. - **Developed by:** Hamburg University of Technology - **Funded by:** [Sec4AI4Sec](https://www.sec4ai4sec-project.eu/) (Horizon EU) - **Shared by:**: Hugging Face - **Model type:** Text Classification - **Language(s) (NLP):** en - **License:** Apache-2.0 - **Finetuned from model:** [CodeBERT](https://huggingface.co/microsoft/codebert-base) ### Model Sources [optional] - **Repository:** [VuTeCo's GitHub repository](https://github.com/tuhh-softsec/vuteco) - **Paper:** [MSR'26 paper](https://arxiv.org/abs/2502.03365) ## Uses ### Direct Use The model can be used right away to classify specific types of vulnerability-witnessing tests, e.g., distinguishing the exact vulnerability types that is tested. ### Downstream Use [optional] The model can be further fine-tuned to classify specific types of vulnerability-witnessing tests, e.g., distinguishing the exact vulnerability types that is tested. It could also be fine-tuned for other testing frameworks (beyond JUnit) and programming languages (Python). ### Out-of-Scope Use N/A ## Bias, Risks, and Limitations The model predictions may be inaccurate (misclassified test methods). In particular, the reported performance show the model has limited recall, so it often says `NotRelated` (i.e., returns low probability scores). ### Recommendations Manually validate the predictions made by the model. ## How to Get Started with the Model Please, refer to [VuTeCo's GitHub repository](https://github.com/tuhh-softsec/vuteco) for loading and using the model in the correct way. ## Training Details ### Training Data This model was fine-tuned on Java repositories and vulnerabilities from [Vul4J](https://github.com/tuhh-softsec/vul4j). Please refer to [VuTeCo's GitHub repository](https://github.com/tuhh-softsec/vuteco) for loading the dataset in the correct way. ### Training Procedure Please refer to [VuTeCo's GitHub repository](https://github.com/tuhh-softsec/vuteco) for customizing the model training. ## Evaluation Please refer to [VuTeCo's GitHub repository](https://github.com/tuhh-softsec/vuteco) for customizing the model evaluation. ### Results Please, refer to the [MSR'26 paper](https://arxiv.org/abs/2502.03365) for an overview of the main evaluation results. The complete raw results can be found in the paper's online appendix on [Zenodo](https://doi.org/10.5281/zenodo.18258566). ## Model Examination [optional] [More Information Needed] ## Environmental Impact N/A ## Citation If you use this model, please cite the [MSR'26 paper](https://arxiv.org/abs/2502.03365) (the publisher's reference will be available soon): **BibTeX:** ``` @misc{iannone2026matchheavenaidrivenmatching, title={A Match Made in Heaven? AI-driven Matching of Vulnerabilities and Security Unit Tests}, author={Emanuele Iannone and Quang-Cuong Bui and Riccardo Scandariato}, year={2026}, eprint={2502.03365}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2502.03365}, } ``` ## Model Card Authors [emaiannone](https://huggingface.co/emaiannone)