YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DeBERTa-v3-Base for CVE β†’ CWE Classification

Fine-tuned DeBERTa-v3-Base model for predicting Common Weakness Enumeration (CWE) IDs from Common Vulnerabilities and Exposures (CVE) descriptions.

Model Details

Base Model: microsoft/deberta-v3-base (86M parameters) Task: Multi-class text classification (695 CWE classes) Training Dataset: stasvinokur/cve-and-cwe-dataset-1999-2025 Cleaned Dataset: LorenzoNava/cve-cwe-dataset-cleaned (225,144 samples)

Training Configuration

Hardware

  • GPUs: 4x NVIDIA L4 (24GB each, 96GB total VRAM)
  • Precision: bfloat16 (bf16)

Hyperparameters

learning_rate = 2e-5
num_train_epochs = 10
per_device_train_batch_size = 8  # 32 total across 4 GPUs
gradient_accumulation_steps = 1
warmup_ratio = 0.1
lr_scheduler_type = "cosine"
weight_decay = 0.01
max_sequence_length = 256
optimizer = "paged_adamw_8bit"
gradient_checkpointing = False  # Disabled for stability

Training Details

  • Total samples: 225,144 (after filtering "NVD-CWE-Other")
  • Train/Val split: 90/10
  • Early stopping: patience=5 on F1 score
  • Evaluation metric: Weighted F1 score
  • Training time: ~5-6 hours on 4x L4 GPUs

Dataset Preparation

The original dataset contained 280,694 samples, including 55,550 samples (19.79%) labeled as "NVD-CWE-Other" (non-standard CWE classification).

Cleaning process:

  1. Removed samples with CWE-ID = "NVD-CWE-Other"
  2. Removed samples with missing/null CWE-IDs
  3. Kept only standard CWE-XXXX format (numeric IDs)
  4. Final dataset: 225,144 samples with 695 unique CWE classes

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("LorenzoNava/deberta-v3-base-cve-cwe-classifier")
tokenizer = AutoTokenizer.from_pretrained("LorenzoNava/deberta-v3-base-cve-cwe-classifier")

# Example CVE description
cve_description = """
A buffer overflow vulnerability in the web server component allows
remote attackers to execute arbitrary code via a crafted HTTP request.
"""

# Tokenize and predict
inputs = tokenizer(cve_description, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()
predicted_cwe = model.config.id2label[predicted_class]

print(f"Predicted CWE: {predicted_cwe}")

Performance

Metric Score
Accuracy TBD
Weighted F1 TBD
Training Loss TBD
Validation Loss TBD

Metrics will be updated after training completes

Training Script

The model was trained using the following configuration:

python3 train.py \
  --model deberta-v3-base \
  --epochs 10 \
  --batch-size 32 \
  --learning-rate 2e-5 \
  --max-length 256 \
  --early-stopping 5

Full training script included in model repository: train.py

CWE Classes

The model predicts from 695 unique CWE classes including:

  • CWE-79 (Cross-site Scripting)
  • CWE-89 (SQL Injection)
  • CWE-119 (Buffer Errors)
  • CWE-20 (Improper Input Validation)
  • CWE-200 (Information Exposure)
  • ... and 690 more

Use Cases

  • Automated vulnerability classification from CVE descriptions
  • Security assessment and triage
  • Weakness pattern identification in vulnerability reports
  • CVE database enrichment and standardization

Limitations

  • Trained only on CVE descriptions (English text)
  • Performance may vary on non-CVE vulnerability descriptions
  • Does not predict "NVD-CWE-Other" or other non-standard classifications
  • Limited to CWEs present in training data (695 classes)

Citation

@model{deberta-v3-cve-cwe-2024,
  author = {Berghem - Smart Information Security},
  title = {DeBERTa-v3-Base for CVE-CWE Classification},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/LorenzoNava/deberta-v3-base-cve-cwe-classifier}
}

License

MIT License - See LICENSE file

Developed By

Berghem - Smart Information Security

For issues or questions, visit the model repository.

Downloads last month
4
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support