IntelliSA-220m

IntelliSA-220m is a fine-tuned Salesforce/codet5p-220m model for detecting security vulnerabilities in Infrastructure as Code (IaC) configurations across Chef, Ansible, and Puppet.

Model Details

  • Base Model: Salesforce/codet5p-220m (220M parameters)
  • Architecture: T5ForSequenceClassification
  • Task: Binary classification (secure vs vulnerable)
  • License: MIT

Performance

Technology F1 Score
Ansible 0.884
Puppet 0.756
Chef 0.698
Combined 0.779

Usage

from transformers import T5ForSequenceClassification, RobertaTokenizer
import torch

model = T5ForSequenceClassification.from_pretrained("colemei/IntelliSA-220m")
tokenizer = RobertaTokenizer.from_pretrained("colemei/IntelliSA-220m")

THRESHOLD = 0.61  # Classification threshold

def predict_vulnerability(code_snippet):
    inputs = tokenizer(code_snippet, return_tensors="pt", max_length=512,
                      truncation=True, padding=True)

    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

    score = predictions[0][1].item()
    is_vulnerable = score >= THRESHOLD
    return score, is_vulnerable

# Example
code = """
cookbook_file '/tmp/file' do
  mode '0777'
end
"""
score, is_vulnerable = predict_vulnerability(code)
print(f"Vulnerability score: {score:.3f}, Vulnerable: {is_vulnerable}")

Training Data

Training data is maintained in a separate repository for transparency and reusability:

  • Dataset Repository: colemei/IntelliSA-dataset
  • Training Configuration:
    • Learning Rate: 4e-5, Batch Size: 8, Epochs: 6, Weight Decay: 0.01
    • Framework: Transformers 4.45.2, PyTorch
    • Training Data: 2,300 pseudo-labeled instances from Claude-4

For complete dataset information including oracle ground truth and detailed statistics, see the dataset repository.

Citation

PLACEHOLDER
Downloads last month
18
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for colemei/IntelliSA-220m

Finetuned
(90)
this model

Evaluation results