LorenzoNava's picture
fix: Correct hardware configuration to 4x-l4 (4x NVIDIA L4 GPUs)
aacbb01

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: DeBERTa CWE Classification Training
emoji: πŸ€–
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: false
license: mit
hardware: 4x-l4
python_version: '3.10'
disable_embedding: false

πŸ€– DeBERTa CWE Classification - Fine-Tuning Interface

Production-grade Gradio interface for training DeBERTa models on CVE→CWE classification task with real-time monitoring.

Features

  • πŸ“Š Real-time Training Monitoring - Live progress updates, metrics streaming
  • πŸ“ˆ Interactive Dashboard - Visualize loss, accuracy, F1 score, learning rate
  • βš™οΈ Hyperparameter Configuration - Full control over training parameters
  • πŸ’Ύ Model Export - Automatic export to local directory
  • 🎯 Optimal Settings - Pre-configured with best hyperparameters (10 epochs, batch 16)
  • πŸ”₯ GPU Acceleration - Automatic CUDA/MPS/CPU detection
  • ⏸️ Training Control - Early stopping, checkpoint management
  • πŸ“ Live Logs - Real-time training logs streaming

Dataset

Name: stasvinokur/cve-and-cwe-dataset-1999-2025 Size: ~300K CVE-CWE pairs from 1999-2025 Task: Single-label classification (CVE description β†’ CWE-ID)

Optimal Hyperparameters

  • Epochs: 10 (for best quality)
  • Batch Size: 16 (effective: 64 with gradient accumulation)
  • Learning Rate: 2e-5 with cosine schedule
  • Warmup Ratio: 0.1
  • Gradient Accumulation: 4 steps
  • Early Stopping: 5 patience

Usage

  1. Select model architecture (Base recommended)
  2. Configure hyperparameters (or use defaults)
  3. Click "πŸš€ Start Training"
  4. Monitor real-time progress in dashboard
  5. Model exports automatically to local directory

Output

Trained model saved within Space to: ./models/deberta-cwe-final/

To use locally: Download files from Space's "Files" tab and copy to your local CWE MCP directory.

Hardware

This Space requires GPU for efficient training. Configured with A10G Large for optimal performance.

Training Time Estimates:

  • DeBERTa-Base on A10G: ~2-3 hours
  • DeBERTa-Large on A10G: ~6-8 hours

Developed By

Berghem - Smart Information Security Licensed under MIT

Force rebuild at Tue Nov 18 08:20:41 -03 2025