DeBERTa-v3-xsmall fine-tuned for three-way classification: **allow**, **deny**, or **abstain**. ONNX + INT8 quantized, under 80MB, p99 <30ms on CPU. Margin-based thresholds (not argmax) — uncertain queries route to clarification instead of forcing a guess.
**Eval results (adversarial test sets, ~470-480 examples per vertical):**
DeBERTa-v3-xsmall fine-tuned for three-way classification: **allow**, **deny**, or **abstain**. ONNX + INT8 quantized, under 80MB, p99 <30ms on CPU. Margin-based thresholds (not argmax) — uncertain queries route to clarification instead of forcing a guess.
**Eval results (adversarial test sets, ~470-480 examples per vertical):**
Quick update on the SecureCode dataset family. We've restructured things and fixed several issues:
**What changed:**
- The datasets are now properly split into three repos: [unified](scthornton/securecode) (2,185), [web](scthornton/securecode-web) (1,378), [AI/ML](scthornton/securecode-aiml) (750) - All repos now use Parquet format -- load_dataset() just works, no deprecated loading scripts - SecureCode Web now includes 219 framework-specific examples (Express, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS) - Data cards have been corrected and split sizes fixed
**Why it matters:**
With AI-generated code accounting for 60%+ of some codebases (Checkmarx 2025), security training data is more important than ever. Every example in SecureCode is grounded in a real CVE with 4-turn conversations that mirror actual developer-AI workflows.
If you're working on code generation models, I'd love to hear how you're approaching the security angle. Are there vulnerability categories or frameworks you'd like to see covered?