DeBERTa-v3-Base for CVE → CWE Classification

Fine-tuned DeBERTa-v3-Base model for predicting Common Weakness Enumeration (CWE) IDs from Common Vulnerabilities and Exposures (CVE) descriptions.

Model Details

Base Model: microsoft/deberta-v3-base (86M parameters) Task: Multi-class text classification (695 CWE classes) Training Dataset: stasvinokur/cve-and-cwe-dataset-1999-2025 Cleaned Dataset: LorenzoNava/cve-cwe-dataset-cleaned (225,144 samples)

Training Configuration

Hardware

GPUs: 4x NVIDIA L4 (24GB each, 96GB total VRAM)
Precision: bfloat16 (bf16)

Hyperparameters

learning_rate = 2e-5
num_train_epochs = 10
per_device_train_batch_size = 8  # 32 total across 4 GPUs
gradient_accumulation_steps = 1
warmup_ratio = 0.1
lr_scheduler_type = "cosine"
weight_decay = 0.01
max_sequence_length = 256
optimizer = "paged_adamw_8bit"
gradient_checkpointing = False  # Disabled for stability

Training Details

Total samples: 225,144 (after filtering "NVD-CWE-Other")
Train/Val split: 90/10
Early stopping: patience=5 on F1 score
Evaluation metric: Weighted F1 score
Training time: ~5-6 hours on 4x L4 GPUs

Dataset Preparation

The original dataset contained 280,694 samples, including 55,550 samples (19.79%) labeled as "NVD-CWE-Other" (non-standard CWE classification).

Cleaning process:

Removed samples with CWE-ID = "NVD-CWE-Other"
Removed samples with missing/null CWE-IDs
Kept only standard CWE-XXXX format (numeric IDs)
Final dataset: 225,144 samples with 695 unique CWE classes

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("LorenzoNava/deberta-v3-base-cve-cwe-classifier")
tokenizer = AutoTokenizer.from_pretrained("LorenzoNava/deberta-v3-base-cve-cwe-classifier")

# Example CVE description
cve_description = """
A buffer overflow vulnerability in the web server component allows
remote attackers to execute arbitrary code via a crafted HTTP request.
"""

# Tokenize and predict
inputs = tokenizer(cve_description, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()
predicted_cwe = model.config.id2label[predicted_class]

print(f"Predicted CWE: {predicted_cwe}")

Performance

Metric	Score
Accuracy	TBD
Weighted F1	TBD
Training Loss	TBD
Validation Loss	TBD

Metrics will be updated after training completes

Training Script

The model was trained using the following configuration:

python3 train.py \
  --model deberta-v3-base \
  --epochs 10 \
  --batch-size 32 \
  --learning-rate 2e-5 \
  --max-length 256 \
  --early-stopping 5

Full training script included in model repository: train.py

CWE Classes

The model predicts from 695 unique CWE classes including:

CWE-79 (Cross-site Scripting)
CWE-89 (SQL Injection)
CWE-119 (Buffer Errors)
CWE-20 (Improper Input Validation)
CWE-200 (Information Exposure)
... and 690 more

Use Cases

Automated vulnerability classification from CVE descriptions
Security assessment and triage
Weakness pattern identification in vulnerability reports
CVE database enrichment and standardization

Limitations

Trained only on CVE descriptions (English text)
Performance may vary on non-CVE vulnerability descriptions
Does not predict "NVD-CWE-Other" or other non-standard classifications
Limited to CWEs present in training data (695 classes)

Citation

@model{deberta-v3-cve-cwe-2024,
  author = {Berghem - Smart Information Security},
  title = {DeBERTa-v3-Base for CVE-CWE Classification},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/LorenzoNava/deberta-v3-base-cve-cwe-classifier}
}

License

MIT License - See LICENSE file

Developed By

Berghem - Smart Information Security

For issues or questions, visit the model repository.

Downloads last month: 1

Safetensors

Model size

0.4B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support