YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸ”’ Vulnerability Classifier Training Notebooks

βœ… ALL 4 NOTEBOOKS COMPLETE β€” Project Finished

Final Results

Model Key Metric Score
GraphCodeBERT Classifier Macro F1 0.476 (+311% vs baseline 0.116)
Weighted F1 0.945
Safe Detection F1 0.982
CodeT5+ Fixer BLEU 81.0
ROUGE-L 0.788
Eval Loss 0.175 (3.1x better than v1's 0.547)

πŸ“‹ Run Order (on Kaggle with free T4 GPU)

Notebook 1: Classifier Phase 1 (~2-3 hours) βœ… COMPLETE

  • Loads microsoft/graphcodebert-base
  • Freezes bottom 8/12 layers
  • Trains top layers + classifier head with ASL loss (4 epochs)
  • Saved to HF Hub branch phase1-checkpoint

Notebook 2: Classifier Phase 2 (~4-5 hours) βœ… COMPLETE

  • Loads Phase 1 checkpoint from HF Hub
  • Unfreezes ALL layers
  • Full fine-tuning with lower LR (5 epochs, early stopping)
  • Saved to HF Hub branch phase2-checkpoint

Notebook 3: Thresholds + Calibration (~30 min) βœ… COMPLETE

  • Loads Phase 2 model
  • Per-class threshold optimization on validation set
  • Temperature scaling calibration (T=0.6163)
  • Full test set evaluation with classification report
  • Pushed final model to main branch with all configs

Notebook 4: Fixer Training (~7 hours) βœ… COMPLETE

  • Use notebook4_fixer_training_v3_FINAL.py (the definitive version)
  • Trained CodeT5+ 220M with CWE-aware input format
  • lr=1e-4, constant scheduler (T5APR/MultiMend validated)
  • Early stopped at epoch 6 (best=epoch 3, eval_loss=0.1752)
  • BLEU=81.0, ROUGE-L=0.788 on test set
  • Pushed to HF Hub

πŸš€ Deployed Resources

Resource URL Status
Classifier Model graphcodebert-vuln-classifier βœ… Live
Fixer Model codet5p-vuln-fixer βœ… Live
Dataset code-security-vulnerability-dataset βœ… 175K samples
Demo Space code-security-analyzer βœ… v2 deployed

What Was Improved (v1 β†’ v2)

Improvement Description
GraphCodeBERT-base 125M params, 12 layers (was CodeBERTa-small 83M, 6 layers)
Asymmetric Loss (ASL) γ⁻=4, γ⁺=0 β€” designed for 90% safe class imbalance
Two-phase training Phase 1: freeze bottom 8 layers β†’ Phase 2: full fine-tune
Per-class thresholds Optimal threshold per CWE (not global 0.3)
Temperature calibration Probabilities become meaningful (T=0.6163)
CodeT5+ 220M fixer 3.7x larger than old flan-t5-small
CWE-aware input Fixer model knows what vulnerability to fix
lr=1e-4 constant Research-validated (T5APR + MultiMend papers)
BLEU + ROUGE eval Proper fix quality evaluation

⚠️ Notebook 4 Notes

Use notebook4_fixer_training_v3_FINAL.py β€” the other versions have bugs:

  • notebook4_fixer_training.py β€” ❌ Original (15 critical bugs)
  • notebook4_fixer_training_v2_FIXED.py β€” ❌ Partially fixed (still crashes)
  • notebook4_fixer_training_v3_FINAL.py β€” βœ… All bugs fixed

Key modifications needed for Kaggle (2025):

  1. Cell 1: Uninstall peft first (pip uninstall -y peft) to fix StrictDataclassDefinitionError
  2. Cell 8: Use lr=1e-4, lr_scheduler_type="constant", fp16=True, predict_with_generate=False
  3. Cell 9: Add trainer.args.predict_with_generate = True before trainer.predict()

πŸ“ Files

  • notebook1_classifier_phase1.py β€” Phase 1 training βœ…
  • notebook2_classifier_phase2.py β€” Phase 2 fine-tuning βœ…
  • notebook3_thresholds_calibration_eval.py β€” Optimization + evaluation βœ…
  • notebook4_fixer_training_v3_FINAL.py β€” βœ… Fixer training (use this one)
  • notebook4_fixer_training.py β€” ❌ Original (broken)
  • notebook4_fixer_training_v2_FIXED.py β€” ❌ Partially fixed (still crashes)
  • updated_app.py β€” βœ… Deployed to Space (v2 with calibration + thresholds)

πŸ“š References

  • T5APR (arxiv:2309.15742) β€” lr=1e-4, constant scheduler for CodeT5 code repair
  • MultiMend (arxiv:2501.16044) β€” Same config validated on 6 APR benchmarks
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for ayshajavd/vuln-classifier-training-notebooks