YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

CodeT5+ Vulnerability Fixer

A code repair model that generates secure fixes for vulnerable code. Given vulnerable code + CWE type + programming language, it produces the patched version.

Fine-tuned from Salesforce/codet5p-220m (220M parameters) on 7,374 vulnerable→fixed code pairs.

Quick Start

from transformers import AutoTokenizer, T5ForConditionalGeneration

model_id = "ayshajavd/codet5p-vuln-fixer"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = T5ForConditionalGeneration.from_pretrained(model_id)
model.eval()

# CWE-aware input format
code = """
def get_user(username):
    query = f"SELECT * FROM users WHERE username = '{username}'"
    conn = sqlite3.connect('db.sqlite')
    return conn.execute(query).fetchone()
"""

input_text = f"fix SQL Injection vulnerability in python: {code}"
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)

import torch
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        num_beams=5,
        early_stopping=True,
        no_repeat_ngram_size=3,
    )

fixed_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(fixed_code)

Model Details

Property	Value
Architecture	T5ForConditionalGeneration (encoder-decoder, 8 layers each)
Base Model	Salesforce/codet5p-220m
Parameters	222,882,048 (222M)
Task	Seq2Seq code repair (vulnerable → fixed)
Input Format	`fix <CWE_NAME> vulnerability in <language>: <code>`
Max Sequence Length	512 tokens (input and output)
Generation	Beam search (num_beams=5)

Evaluation Results (Test Set — 941 samples)

Metric	Score
BLEU	81.0
ROUGE-1	0.802
ROUGE-2	0.745
ROUGE-L	0.788
Exact Match	1.4%
Eval Loss	0.175

vs Previous Model (flan-t5-small)

	Old (v1)	New (v2)	Improvement
Base model	flan-t5-small (60M)	CodeT5+ 220M	3.7x larger
Eval loss	0.547	0.175	3.1x better
CWE-aware input	❌	✅	Context about vulnerability type
BLEU evaluation	❌	81.0	Proper code similarity metric

Supported Languages

Python, JavaScript, Java, C, C++, PHP, Go, Ruby

The model was trained on a diverse multi-language dataset. Performance is strongest on C/C++ (largest training subset from BigVul).

Training Details

Parameter	Value
Learning Rate	1e-4 (constant schedule)
Effective Batch Size	32 (8/device × 2 GPUs × 2 grad_accum)
Epochs	6 (early stopped at epoch 3 best)
Best Epoch	3 (eval_loss=0.1752)
Precision	fp16
Gradient Checkpointing	Enabled
Early Stopping	Patience=3
Optimizer	AdamW
Hardware	2× NVIDIA T4 16GB (Kaggle)

Training Recipe References

T5APR (arxiv:2309.15742): lr=1e-4, constant scheduler — Optuna-validated for CodeT5 code repair
MultiMend (arxiv:2501.16044): Same config, validated on 6 benchmarks

Training Data

Trained on the code-security-vulnerability-dataset:

7,374 training samples (vulnerable code with fixes)
994 validation samples
941 test samples

Filtered from 175K total samples to only include vulnerable samples with meaningful code fixes (>10 characters).

Input Format

The model uses a CWE-aware input format that tells it what vulnerability to fix:

fix <Vulnerability Name> vulnerability in <language>: <vulnerable code>

Examples:

fix SQL Injection vulnerability in python: <code>
fix Buffer Overflow vulnerability in c: <code>
fix Cross-Site Scripting vulnerability in javascript: <code>

Limitations

512 token limit: Long functions are truncated — fix quality degrades for very long code
Formatting: Generated fixes may lose original indentation/formatting
Rare CWEs: Performance is lower on vulnerability types with few training examples
Not a replacement: Should complement manual code review and established SAST tools
Language bias: Strongest on C/C++ (largest training subset)

Interactive Demo

Try the model in our Code Security Analyzer Space — paste any code and get vulnerability detection + fix suggestions.

Citation

@misc{codet5p-vuln-fixer,
  title={CodeT5+ Vulnerability Fixer: CWE-Aware Code Repair with Seq2Seq Generation},
  author={ayshajavd},
  year={2025},
  url={https://huggingface.co/ayshajavd/codet5p-vuln-fixer}
}

Downloads last month: 64

Safetensors

Model size

77M params

Tensor type

F32

Model tree for ayshajavd/codet5p-vuln-fixer

Base model

Salesforce/codet5p-220m

Finetuned

(95)

this model

Dataset used to train ayshajavd/codet5p-vuln-fixer

Space using ayshajavd/codet5p-vuln-fixer 1

Papers for ayshajavd/codet5p-vuln-fixer

MultiMend: Multilingual Program Repair with Context Augmentation and Multi-Hunk Patch Generation

Paper • 2501.16044 • Published Jan 27, 2025

T5APR: Empowering Automated Program Repair across Languages through Checkpoint Ensemble

Paper • 2309.15742 • Published Sep 27, 2023 • 1

Evaluation results

BLEU on Code Security Vulnerability Dataset
test set self-reported

81.000
ROUGE-L on Code Security Vulnerability Dataset
test set self-reported

0.788
ROUGE-1 on Code Security Vulnerability Dataset
test set self-reported

0.802
ROUGE-2 on Code Security Vulnerability Dataset
test set self-reported

0.745