How to use from the
Use from the
PEFT library
Task type is invalid.

VericodingEBM — Hybrid-Averse checkpoint

A LoRA + per-line scalar head trained on top of Qwen2.5-Coder-1.5B-Instruct to score every line of a Verus implementation with an energy proxy for "this line is the bug."

This is the canonical Hybrid-Averse checkpoint reported in the paper — the post-fix model that learned to anti-correlate with the // FAILS debug marker rather than rely on it.

Submitted to the Apart × Atlas Computing Secure Program Synthesis Hackathon, Track 3 (Vericoding).

📄 Paper: see paper/main.pdf 💾 Code + reproducibility: https://github.com/ozlabsai/VericodingEBM 📊 Training data: OzLabs/VericodingEBM-data

Headline results (Hybrid-Averse)

Measurement Hybrid-Averse (this model) Best frontier LLM
Per-line top-3 recall on Verus dev-test (n=609 FAILs) 0.84 0.74 (Claude Opus 4.7)
Whole-impl discrimination AUROC 0.78 0.91 (GPT-5.5)
Closed-loop CEGIS repair@1 (n=100) 25% 30% (LLM self-judged)

What's in this repo

  • adapter/ — LoRA adapter (PEFT format, rank 16, alpha 32, embed_lora_rank 8) for Qwen2.5-Coder-1.5B-Instruct
  • head.pt — per-line scoring head weights (small MLP over sentinel-token hidden states)
  • scalar_head.pt — whole-impl attention-pool head weights

To run inference you need all three files plus the training code at https://github.com/ozlabsai/VericodingEBM.

Marker-leak audit (paper §4.6)

This checkpoint is marker-AVERSE: per-line top-1 recall jumps from 4% → 56% when the // FAILS debug markers are stripped from the input (delta = −52pp). This is the result of the counterfactual-marker augmentation described in paper §B. The pre-audit Sentinel-Reliant checkpoint (not released here) shows the opposite regime — signal collapses without markers, exposing the leak that motivated this work.

License

MIT (see GitHub repo).

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OzLabs/VericodingEBM

Adapter
(107)
this model