AI & ML interests

None defined yet.

Recent Activity

scthornton 
posted an update about 3 hours ago
view post
Post
17
SecureCode: security-aware code models (3B–20B), trained for review + remediation

I’ve been frustrated by how often code assistants recommend patterns that pass tests but fail security review (e.g., string-built SQL, brittle auth logic, unsafe parsing, insecure defaults, etc.). So I built **SecureCode**: a collection of **8 code models (3B → 20B)** trained to behave more like a security reviewer.

What you should expect from SecureCode:

- identify likely vuln patterns and explain *why* they’re risky
- outline plausible abuse paths (defensive framing)
- propose a secure rewrite (drop-in where possible)
- include defense-in-depth guidance + regression tests/checks

Links:

- **Models:** https://huggingface.co/collections/scthornton/securecode
- **Dataset:** scthornton/securecode-v2
- **Paper:** https://arxiv.org/html/2512.18542v1 SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models (2512.18542)

**How to test it (copy/paste prompt):**


> You are a senior application security engineer. Review the code below.
>  Output: (1) findings with severity, (2) likely exploit scenarios (high level), (3) secure rewrite,
>  (4) defense-in-depth recommendations, (5) regression tests/checks.
>  Code: `...`



**I’m looking for real-world feedback**

- Your “this slipped through review once” snippets (sanitized is fine)
- False positives / false negatives you observe
- Contributions of new CVE-grounded examples

If you drop a snippet, please include language/framework + what the *correct* remediation looks like in your environment. If you have any contributions or suggestions for the dataset, I'd be happy to hear them. I have some new features and enhancements planned for v3 that are already underway, but for now, I'm focused on testing as many use cases as possible. Appreciate you all!