File size: 13,714 Bytes
0c2e3fe f98677e 0c2e3fe 232ff80 18af573 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 637696d 232ff80 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 |
---
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-14B-Instruct
tags:
- code
- security
- qwen
- securecode
- owasp
- vulnerability-detection
datasets:
- scthornton/securecode-v2
language:
- en
library_name: transformers
pipeline_tag: text-generation
arxiv: 2512.18542
---
# Qwen 2.5-Coder 14B - SecureCode Edition
<div align="center">
[](https://opensource.org/licenses/Apache-2.0)
[](https://huggingface.co/datasets/scthornton/securecode-v2)
[](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct)
[](https://perfecxion.ai)
**Enterprise-grade code security - powerful reasoning with production efficiency**
[π Paper](https://arxiv.org/abs/2512.18542) | [π€ Model Card](https://huggingface.co/scthornton/qwen2.5-coder-14b-securecode) | [π Dataset](https://huggingface.co/datasets/scthornton/securecode-v2) | [π» perfecXion.ai](https://perfecxion.ai)
</div>
---
## π― What is This?
This is **Qwen 2.5-Coder 14B Instruct** fine-tuned on the **SecureCode v2.0 dataset** - the sweet spot between code intelligence and computational efficiency, now enhanced with production-grade security knowledge.
Qwen 2.5-Coder 14B delivers exceptional code understanding from the same architecture that powers the best-in-class 7B model, scaled up for enterprise complexity. Combined with SecureCode training, this model delivers:
β
**Advanced security reasoning** across complex codebases
β
**Production-ready efficiency** - fits comfortably on single GPU
β
**Enterprise-scale analysis** with 128K context window
β
**Best-in-class code understanding** at the 14B parameter tier
**The Result:** An enterprise-ready security expert that runs efficiently on standard hardware.
**Why Qwen 2.5-Coder 14B?** This model offers the optimal balance:
- π― **Superior to smaller models** - More nuanced security analysis than 7B
- β‘ **More efficient than 32B+** - 2x faster training, lower deployment cost
- π **92 programming languages** - Comprehensive language coverage
- π **128K context window** - Analyze entire applications at once
- π’ **Enterprise deployable** - Runs on single A100 or 2x RTX 4090
---
## π¨ The Problem This Solves
**AI coding assistants produce vulnerable code in 45% of security-relevant scenarios** (Veracode 2025). While smaller models miss nuanced vulnerabilities and larger models demand excessive resources, the 14B tier delivers the security intelligence enterprises need with the efficiency they demand.
**Real-world enterprise impact:**
- Equifax breach: **$425 million** settlement + reputation damage
- Capital One: **100 million** customer records, $80M fine
- SolarWinds: **18,000** organizations compromised
Qwen 2.5-Coder 14B SecureCode Edition brings advanced security analysis to enterprise-scale codebases without the infrastructure costs of 32B+ models.
---
## π‘ Key Features
### π Enterprise-Scale Code Intelligence
**Qwen 2.5-Coder 14B** delivers exceptional performance:
- HumanEval: **89.0%** pass@1 (surpasses many 30B+ models)
- MBPP: **77.6%** pass@1
- MultiPL-E: **82.1%** average across languages
- Matches or exceeds 32B models on most benchmarks
Now enhanced with **1,209 security-focused examples** covering OWASP Top 10:2025.
### π Advanced Security Pattern Recognition
Trained on real-world security incidents:
- **224 examples** of Broken Access Control vulnerabilities
- **199 examples** of Authentication Failures
- **125 examples** of Injection attacks (SQL, Command, XSS)
- **115 examples** of Cryptographic Failures
- Complete **OWASP Top 10:2025** coverage
### π Production-Ready Multi-Language Support
Fine-tuned on security examples across:
- Python (Django, Flask, FastAPI)
- JavaScript/TypeScript (Express, NestJS, React)
- Java (Spring Boot)
- Go (Gin framework)
- PHP (Laravel, Symfony)
- C# (ASP.NET Core)
- Ruby (Rails)
- Rust (Actix, Rocket)
- **Plus 84 more languages from Qwen's base training**
### π Sophisticated Security Analysis
Every response includes:
1. **Multi-layered vulnerability analysis** with attack chain identification
2. **Defense-in-depth implementations** with enterprise patterns
3. **Concrete exploitation demonstrations** proving security flaws
4. **Operational guidance** including monitoring, logging, and SIEM integration
---
## π Training Details
| Parameter | Value |
|-----------|-------|
| **Base Model** | Qwen/Qwen2.5-Coder-14B-Instruct |
| **Fine-tuning Method** | LoRA (Low-Rank Adaptation) |
| **Training Dataset** | [SecureCode v2.0](https://huggingface.co/datasets/scthornton/securecode-v2) |
| **Dataset Size** | 841 training examples |
| **Training Epochs** | 3 |
| **LoRA Rank (r)** | 16 |
| **LoRA Alpha** | 32 |
| **Learning Rate** | 2e-4 |
| **Quantization** | 4-bit (bitsandbytes) |
| **Trainable Parameters** | ~74M (0.53% of 14B total) |
| **Total Parameters** | 14B |
| **Context Window** | 128K tokens (inherited from base) |
| **GPU Used** | NVIDIA A100 40GB |
| **Training Time** | ~8 hours (estimated) |
### Training Methodology
**LoRA (Low-Rank Adaptation)** preserves Qwen's exceptional code abilities:
- Trains only 0.53% of model parameters
- Maintains SOTA code generation quality
- Adds security-specific knowledge without catastrophic forgetting
- Enables deployment with minimal memory overhead
**4-bit Quantization** enables efficient training while maintaining model quality.
**Extended Context:** Qwen's 128K context window allows analyzing entire applications, making it ideal for enterprise security audits.
---
## π Usage
### Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
base_model = "Qwen/Qwen2.5-Coder-14B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
# Load SecureCode LoRA adapter
model = PeftModel.from_pretrained(model, "scthornton/qwen2.5-coder-14b-securecode")
# Analyze enterprise codebase for vulnerabilities
prompt = """### User:
Perform a comprehensive security audit of this microservices authentication system:
```python
# auth-service/middleware.py
async def verify_token(request):
token = request.headers.get('Authorization')
if not token:
return None
payload = jwt.decode(token, settings.SECRET_KEY, algorithms=['HS256'])
user = await User.get(id=payload['user_id'])
return user
# payment-service/api.py
@app.post('/transfer')
async def transfer_funds(request):
user = await verify_token(request)
amount = request.json.get('amount')
recipient = request.json.get('recipient_id')
await process_transfer(user.id, recipient, amount)
return {'status': 'success'}
```
### Assistant:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=3072,
temperature=0.3, # Lower temperature for precise analysis
top_p=0.95,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Enterprise Deployment (4-bit Quantization)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# 4-bit quantization - runs on 24GB GPU
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="bfloat16"
)
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-14B-Instruct",
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
model = PeftModel.from_pretrained(base_model, "scthornton/qwen2.5-coder-14b-securecode")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-14B-Instruct", trust_remote_code=True)
# Production-ready: Runs on RTX 4090, A5000, or A100
```
### Large-Scale Codebase Analysis
```python
# Analyze multiple related files with 128K context
files_to_review = {
"auth.py": open("backend/auth.py").read(),
"middleware.py": open("backend/middleware.py").read(),
"models.py": open("backend/models.py").read(),
}
combined_code = "\n\n".join([f"# {name}\n{code}" for name, code in files_to_review.items()])
prompt = f"""### User:
Perform a comprehensive security analysis of this authentication system. Identify:
1. All OWASP Top 10 vulnerabilities
2. Attack chains that combine multiple vulnerabilities
3. Race conditions and timing attacks
4. Authorization bypass opportunities
```python
{combined_code}
```
### Assistant:
"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=65536).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.3)
analysis = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(analysis)
```
---
## π― Use Cases
### 1. **Enterprise Security Architecture Review**
Analyze complex multi-service architectures:
```
Review this microservices platform for security vulnerabilities, focusing on authentication flows, service-to-service authorization, and data validation boundaries
```
### 2. **Large Codebase Vulnerability Scanning**
With 128K context, analyze entire modules:
```
Audit this 10,000-line payment processing system for injection attacks, authorization bypasses, and cryptographic failures
```
### 3. **Advanced Attack Chain Analysis**
Identify sophisticated multi-step attacks:
```
Analyze how an attacker could chain CSRF, XSS, and session fixation to achieve account takeover in this web application
```
### 4. **Production Security Hardening**
Get operational security recommendations:
```
Design a defense-in-depth security architecture for this e-commerce platform handling 1M+ transactions/day
```
### 5. **Compliance-Focused Code Generation**
Generate SOC 2, PCI-DSS, HIPAA-compliant code:
```
Create a HIPAA-compliant patient data API with comprehensive audit logging, encryption at rest and in transit, and role-based access control
```
---
## β οΈ Limitations
### What This Model Does Well
β
Complex security reasoning across large codebases
β
Multi-file analysis with 128K context window
β
Advanced attack chain identification
β
Enterprise-scale architecture security review
β
Detailed operational guidance
### What This Model Doesn't Do
β **Not a security scanner** - Use tools like Semgrep, CodeQL, or Snyk
β **Not a penetration testing tool** - Cannot perform active exploitation
β **Not legal/compliance advice** - Consult security professionals
β **Not a replacement for security experts** - Critical systems need professional review
### Known Characteristics
- Detailed analysis may generate verbose responses (trained on comprehensive security explanations)
- Optimized for common vulnerability patterns (OWASP Top 10) vs novel 0-days
- Best performance on code within OWASP taxonomy
---
## π Performance Benchmarks
### Hardware Requirements
**Minimum:**
- 28GB RAM
- 20GB GPU VRAM (with 4-bit quantization)
**Recommended:**
- 48GB RAM
- 24GB+ GPU (RTX 4090, A5000, A100)
**Inference Speed (on A100 40GB):**
- ~55 tokens/second (4-bit quantization)
- ~75 tokens/second (bfloat16)
### Code Generation Benchmarks (Base Qwen 2.5-Coder)
| Benchmark | Score | Rank |
|-----------|-------|------|
| HumanEval | 89.0% | #1 in 14B class |
| MBPP | 77.6% | Top tier |
| LiveCodeBench | 38.4% | Top 5 overall |
| MultiPL-E | 82.1% | Best multi-language |
**Performance:** Matches or exceeds many 32B+ models while requiring half the compute.
---
## π¬ Dataset Information
Trained on **[SecureCode v2.0](https://huggingface.co/datasets/scthornton/securecode-v2)**:
- **1,209 examples** with real CVE grounding
- **100% incident validation**
- **OWASP Top 10:2025** complete coverage
- **Expert security review**
---
## π License
**Model:** Apache 2.0 | **Dataset:** CC BY-NC-SA 4.0
---
## π Citation
```bibtex
@misc{thornton2025securecode-qwen14b,
title={Qwen 2.5-Coder 14B - SecureCode Edition},
author={Thornton, Scott},
year={2025},
publisher={perfecXion.ai},
url={https://huggingface.co/scthornton/qwen2.5-coder-14b-securecode}
}
```
---
## π Acknowledgments
- **Alibaba Cloud & Qwen Team** for the exceptional Qwen 2.5-Coder base model
- **OWASP Foundation** for vulnerability taxonomy
- **MITRE** for CVE database
- **Enterprise security community** for real-world validation
---
## π Related Models
- **[llama-3.2-3b-securecode](https://huggingface.co/scthornton/llama-3.2-3b-securecode)** - Most accessible (3B)
- **[qwen-coder-7b-securecode](https://huggingface.co/scthornton/qwen-coder-7b-securecode)** - Smaller Qwen variant (7B)
- **[deepseek-coder-6.7b-securecode](https://huggingface.co/scthornton/deepseek-coder-6.7b-securecode)** - Security-optimized (6.7B)
- **[codellama-13b-securecode](https://huggingface.co/scthornton/codellama-13b-securecode)** - Enterprise trusted (13B)
- **[starcoder2-15b-securecode](https://huggingface.co/scthornton/starcoder2-15b-securecode)** - Multi-language (15B)
[View Collection](https://huggingface.co/collections/scthornton/securecode)
---
<div align="center">
**Built with β€οΈ for secure enterprise software development**
[perfecXion.ai](https://perfecxion.ai) | [Contact](mailto:scott@perfecxion.ai)
</div>
|