Instructions to use OzLabs/VericodingEBM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use OzLabs/VericodingEBM with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
VericodingEBM — Hybrid-Averse checkpoint
A LoRA + per-line scalar head trained on top of Qwen2.5-Coder-1.5B-Instruct to score every line of a Verus implementation with an energy proxy for "this line is the bug."
This is the canonical Hybrid-Averse checkpoint reported in the paper — the post-fix model that learned to anti-correlate with the // FAILS debug marker rather than rely on it.
Submitted to the Apart × Atlas Computing Secure Program Synthesis Hackathon, Track 3 (Vericoding).
📄 Paper: see paper/main.pdf
💾 Code + reproducibility: https://github.com/ozlabsai/VericodingEBM
📊 Training data: OzLabs/VericodingEBM-data
Headline results (Hybrid-Averse)
| Measurement | Hybrid-Averse (this model) | Best frontier LLM |
|---|---|---|
| Per-line top-3 recall on Verus dev-test (n=609 FAILs) | 0.84 | 0.74 (Claude Opus 4.7) |
| Whole-impl discrimination AUROC | 0.78 | 0.91 (GPT-5.5) |
| Closed-loop CEGIS repair@1 (n=100) | 25% | 30% (LLM self-judged) |
What's in this repo
adapter/— LoRA adapter (PEFT format, rank 16, alpha 32, embed_lora_rank 8) for Qwen2.5-Coder-1.5B-Instructhead.pt— per-line scoring head weights (small MLP over sentinel-token hidden states)scalar_head.pt— whole-impl attention-pool head weights
To run inference you need all three files plus the training code at https://github.com/ozlabsai/VericodingEBM.
Marker-leak audit (paper §4.6)
This checkpoint is marker-AVERSE: per-line top-1 recall jumps from 4% → 56% when the // FAILS debug markers are stripped from the input (delta = −52pp). This is the result of the counterfactual-marker augmentation described in paper §B. The pre-audit Sentinel-Reliant checkpoint (not released here) shows the opposite regime — signal collapses without markers, exposing the leak that motivated this work.
License
MIT (see GitHub repo).
- Downloads last month
- -
Model tree for OzLabs/VericodingEBM
Base model
Qwen/Qwen2.5-1.5B
Task type is invalid.