MultiMend: Multilingual Program Repair with Context Augmentation and Multi-Hunk Patch Generation
Paper β’ 2501.16044 β’ Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
| Model | Key Metric | Score |
|---|---|---|
| GraphCodeBERT Classifier | Macro F1 | 0.476 (+311% vs baseline 0.116) |
| Weighted F1 | 0.945 | |
| Safe Detection F1 | 0.982 | |
| CodeT5+ Fixer | BLEU | 81.0 |
| ROUGE-L | 0.788 | |
| Eval Loss | 0.175 (3.1x better than v1's 0.547) |
microsoft/graphcodebert-basephase1-checkpointphase2-checkpointnotebook4_fixer_training_v3_FINAL.py (the definitive version)| Resource | URL | Status |
|---|---|---|
| Classifier Model | graphcodebert-vuln-classifier | β Live |
| Fixer Model | codet5p-vuln-fixer | β Live |
| Dataset | code-security-vulnerability-dataset | β 175K samples |
| Demo Space | code-security-analyzer | β v2 deployed |
| Improvement | Description |
|---|---|
| GraphCodeBERT-base | 125M params, 12 layers (was CodeBERTa-small 83M, 6 layers) |
| Asymmetric Loss (ASL) | Ξ³β»=4, Ξ³βΊ=0 β designed for 90% safe class imbalance |
| Two-phase training | Phase 1: freeze bottom 8 layers β Phase 2: full fine-tune |
| Per-class thresholds | Optimal threshold per CWE (not global 0.3) |
| Temperature calibration | Probabilities become meaningful (T=0.6163) |
| CodeT5+ 220M fixer | 3.7x larger than old flan-t5-small |
| CWE-aware input | Fixer model knows what vulnerability to fix |
| lr=1e-4 constant | Research-validated (T5APR + MultiMend papers) |
| BLEU + ROUGE eval | Proper fix quality evaluation |
Use notebook4_fixer_training_v3_FINAL.py β the other versions have bugs:
notebook4_fixer_training.py β β Original (15 critical bugs)notebook4_fixer_training_v2_FIXED.py β β Partially fixed (still crashes)notebook4_fixer_training_v3_FINAL.py β β
All bugs fixedpip uninstall -y peft) to fix StrictDataclassDefinitionErrorlr=1e-4, lr_scheduler_type="constant", fp16=True, predict_with_generate=Falsetrainer.args.predict_with_generate = True before trainer.predict()notebook1_classifier_phase1.py β Phase 1 training β
notebook2_classifier_phase2.py β Phase 2 fine-tuning β
notebook3_thresholds_calibration_eval.py β Optimization + evaluation β
notebook4_fixer_training_v3_FINAL.py β β
Fixer training (use this one)notebook4_fixer_training.py β β Original (broken)notebook4_fixer_training_v2_FIXED.py β β Partially fixed (still crashes)updated_app.py β β
Deployed to Space (v2 with calibration + thresholds)