Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
scthornton 
posted an update about 5 hours ago
Post
26
SecureCode: security-aware code models (3B–20B), trained for review + remediation

I’ve been frustrated by how often code assistants recommend patterns that pass tests but fail security review (e.g., string-built SQL, brittle auth logic, unsafe parsing, insecure defaults, etc.). So I built **SecureCode**: a collection of **8 code models (3B → 20B)** trained to behave more like a security reviewer.

What you should expect from SecureCode:

- identify likely vuln patterns and explain *why* they’re risky
- outline plausible abuse paths (defensive framing)
- propose a secure rewrite (drop-in where possible)
- include defense-in-depth guidance + regression tests/checks

Links:

- **Models:** https://huggingface.co/collections/scthornton/securecode
- **Dataset:** scthornton/securecode-v2
- **Paper:** https://arxiv.org/html/2512.18542v1 SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models (2512.18542)

**How to test it (copy/paste prompt):**


> You are a senior application security engineer. Review the code below.
>  Output: (1) findings with severity, (2) likely exploit scenarios (high level), (3) secure rewrite,
>  (4) defense-in-depth recommendations, (5) regression tests/checks.
>  Code: `...`



**I’m looking for real-world feedback**

- Your “this slipped through review once” snippets (sanitized is fine)
- False positives / false negatives you observe
- Contributions of new CVE-grounded examples

If you drop a snippet, please include language/framework + what the *correct* remediation looks like in your environment. If you have any contributions or suggestions for the dataset, I'd be happy to hear them. I have some new features and enhancements planned for v3 that are already underway, but for now, I'm focused on testing as many use cases as possible. Appreciate you all!

In this post