File size: 13,714 Bytes
0c2e3fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f98677e
0c2e3fe
 
232ff80
 
 
 
 
 
 
 
 
 
 
18af573
232ff80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
637696d
232ff80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
637696d
 
232ff80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
637696d
232ff80
 
 
637696d
232ff80
637696d
232ff80
 
 
637696d
232ff80
 
 
 
 
 
 
637696d
232ff80
 
 
 
 
 
637696d
232ff80
 
637696d
232ff80
 
637696d
232ff80
637696d
232ff80
 
 
 
 
 
 
637696d
232ff80
637696d
232ff80
 
 
 
 
 
637696d
232ff80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
637696d
232ff80
637696d
232ff80
637696d
232ff80
637696d
232ff80
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
---
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-14B-Instruct
tags:
- code
- security
- qwen
- securecode
- owasp
- vulnerability-detection
datasets:
- scthornton/securecode-v2
language:
- en
library_name: transformers
pipeline_tag: text-generation
arxiv: 2512.18542
---

# Qwen 2.5-Coder 14B - SecureCode Edition

<div align="center">

[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Training Dataset](https://img.shields.io/badge/dataset-SecureCode%20v2.0-green.svg)](https://huggingface.co/datasets/scthornton/securecode-v2)
[![Base Model](https://img.shields.io/badge/base-Qwen%202.5%20Coder%2014B-orange.svg)](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct)
[![perfecXion.ai](https://img.shields.io/badge/by-perfecXion.ai-purple.svg)](https://perfecxion.ai)

**Enterprise-grade code security - powerful reasoning with production efficiency**

[πŸ“„ Paper](https://arxiv.org/abs/2512.18542) | [πŸ€— Model Card](https://huggingface.co/scthornton/qwen2.5-coder-14b-securecode) | [πŸ“Š Dataset](https://huggingface.co/datasets/scthornton/securecode-v2) | [πŸ’» perfecXion.ai](https://perfecxion.ai)

</div>

---

## 🎯 What is This?

This is **Qwen 2.5-Coder 14B Instruct** fine-tuned on the **SecureCode v2.0 dataset** - the sweet spot between code intelligence and computational efficiency, now enhanced with production-grade security knowledge.

Qwen 2.5-Coder 14B delivers exceptional code understanding from the same architecture that powers the best-in-class 7B model, scaled up for enterprise complexity. Combined with SecureCode training, this model delivers:

βœ… **Advanced security reasoning** across complex codebases
βœ… **Production-ready efficiency** - fits comfortably on single GPU
βœ… **Enterprise-scale analysis** with 128K context window
βœ… **Best-in-class code understanding** at the 14B parameter tier

**The Result:** An enterprise-ready security expert that runs efficiently on standard hardware.

**Why Qwen 2.5-Coder 14B?** This model offers the optimal balance:
- 🎯 **Superior to smaller models** - More nuanced security analysis than 7B
- ⚑ **More efficient than 32B+** - 2x faster training, lower deployment cost
- 🌍 **92 programming languages** - Comprehensive language coverage
- πŸ“ **128K context window** - Analyze entire applications at once
- 🏒 **Enterprise deployable** - Runs on single A100 or 2x RTX 4090

---

## 🚨 The Problem This Solves

**AI coding assistants produce vulnerable code in 45% of security-relevant scenarios** (Veracode 2025). While smaller models miss nuanced vulnerabilities and larger models demand excessive resources, the 14B tier delivers the security intelligence enterprises need with the efficiency they demand.

**Real-world enterprise impact:**
- Equifax breach: **$425 million** settlement + reputation damage
- Capital One: **100 million** customer records, $80M fine
- SolarWinds: **18,000** organizations compromised

Qwen 2.5-Coder 14B SecureCode Edition brings advanced security analysis to enterprise-scale codebases without the infrastructure costs of 32B+ models.

---

## πŸ’‘ Key Features

### πŸ† Enterprise-Scale Code Intelligence

**Qwen 2.5-Coder 14B** delivers exceptional performance:
- HumanEval: **89.0%** pass@1 (surpasses many 30B+ models)
- MBPP: **77.6%** pass@1
- MultiPL-E: **82.1%** average across languages
- Matches or exceeds 32B models on most benchmarks

Now enhanced with **1,209 security-focused examples** covering OWASP Top 10:2025.

### πŸ” Advanced Security Pattern Recognition

Trained on real-world security incidents:
- **224 examples** of Broken Access Control vulnerabilities
- **199 examples** of Authentication Failures
- **125 examples** of Injection attacks (SQL, Command, XSS)
- **115 examples** of Cryptographic Failures
- Complete **OWASP Top 10:2025** coverage

### 🌍 Production-Ready Multi-Language Support

Fine-tuned on security examples across:
- Python (Django, Flask, FastAPI)
- JavaScript/TypeScript (Express, NestJS, React)
- Java (Spring Boot)
- Go (Gin framework)
- PHP (Laravel, Symfony)
- C# (ASP.NET Core)
- Ruby (Rails)
- Rust (Actix, Rocket)
- **Plus 84 more languages from Qwen's base training**

### πŸ“‹ Sophisticated Security Analysis

Every response includes:
1. **Multi-layered vulnerability analysis** with attack chain identification
2. **Defense-in-depth implementations** with enterprise patterns
3. **Concrete exploitation demonstrations** proving security flaws
4. **Operational guidance** including monitoring, logging, and SIEM integration

---

## πŸ“Š Training Details

| Parameter | Value |
|-----------|-------|
| **Base Model** | Qwen/Qwen2.5-Coder-14B-Instruct |
| **Fine-tuning Method** | LoRA (Low-Rank Adaptation) |
| **Training Dataset** | [SecureCode v2.0](https://huggingface.co/datasets/scthornton/securecode-v2) |
| **Dataset Size** | 841 training examples |
| **Training Epochs** | 3 |
| **LoRA Rank (r)** | 16 |
| **LoRA Alpha** | 32 |
| **Learning Rate** | 2e-4 |
| **Quantization** | 4-bit (bitsandbytes) |
| **Trainable Parameters** | ~74M (0.53% of 14B total) |
| **Total Parameters** | 14B |
| **Context Window** | 128K tokens (inherited from base) |
| **GPU Used** | NVIDIA A100 40GB |
| **Training Time** | ~8 hours (estimated) |

### Training Methodology

**LoRA (Low-Rank Adaptation)** preserves Qwen's exceptional code abilities:
- Trains only 0.53% of model parameters
- Maintains SOTA code generation quality
- Adds security-specific knowledge without catastrophic forgetting
- Enables deployment with minimal memory overhead

**4-bit Quantization** enables efficient training while maintaining model quality.

**Extended Context:** Qwen's 128K context window allows analyzing entire applications, making it ideal for enterprise security audits.

---

## πŸš€ Usage

### Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = "Qwen/Qwen2.5-Coder-14B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

# Load SecureCode LoRA adapter
model = PeftModel.from_pretrained(model, "scthornton/qwen2.5-coder-14b-securecode")

# Analyze enterprise codebase for vulnerabilities
prompt = """### User:
Perform a comprehensive security audit of this microservices authentication system:

```python
# auth-service/middleware.py
async def verify_token(request):
    token = request.headers.get('Authorization')
    if not token:
        return None

    payload = jwt.decode(token, settings.SECRET_KEY, algorithms=['HS256'])
    user = await User.get(id=payload['user_id'])
    return user

# payment-service/api.py
@app.post('/transfer')
async def transfer_funds(request):
    user = await verify_token(request)
    amount = request.json.get('amount')
    recipient = request.json.get('recipient_id')

    await process_transfer(user.id, recipient, amount)
    return {'status': 'success'}
```

### Assistant:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=3072,
    temperature=0.3,  # Lower temperature for precise analysis
    top_p=0.95,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Enterprise Deployment (4-bit Quantization)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# 4-bit quantization - runs on 24GB GPU
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16"
)

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-14B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(base_model, "scthornton/qwen2.5-coder-14b-securecode")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-14B-Instruct", trust_remote_code=True)

# Production-ready: Runs on RTX 4090, A5000, or A100
```

### Large-Scale Codebase Analysis

```python
# Analyze multiple related files with 128K context
files_to_review = {
    "auth.py": open("backend/auth.py").read(),
    "middleware.py": open("backend/middleware.py").read(),
    "models.py": open("backend/models.py").read(),
}

combined_code = "\n\n".join([f"# {name}\n{code}" for name, code in files_to_review.items()])

prompt = f"""### User:
Perform a comprehensive security analysis of this authentication system. Identify:
1. All OWASP Top 10 vulnerabilities
2. Attack chains that combine multiple vulnerabilities
3. Race conditions and timing attacks
4. Authorization bypass opportunities

```python
{combined_code}
```

### Assistant:
"""

inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=65536).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.3)
analysis = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(analysis)
```

---

## 🎯 Use Cases

### 1. **Enterprise Security Architecture Review**
Analyze complex multi-service architectures:
```
Review this microservices platform for security vulnerabilities, focusing on authentication flows, service-to-service authorization, and data validation boundaries
```

### 2. **Large Codebase Vulnerability Scanning**
With 128K context, analyze entire modules:
```
Audit this 10,000-line payment processing system for injection attacks, authorization bypasses, and cryptographic failures
```

### 3. **Advanced Attack Chain Analysis**
Identify sophisticated multi-step attacks:
```
Analyze how an attacker could chain CSRF, XSS, and session fixation to achieve account takeover in this web application
```

### 4. **Production Security Hardening**
Get operational security recommendations:
```
Design a defense-in-depth security architecture for this e-commerce platform handling 1M+ transactions/day
```

### 5. **Compliance-Focused Code Generation**
Generate SOC 2, PCI-DSS, HIPAA-compliant code:
```
Create a HIPAA-compliant patient data API with comprehensive audit logging, encryption at rest and in transit, and role-based access control
```

---

## ⚠️ Limitations

### What This Model Does Well
βœ… Complex security reasoning across large codebases
βœ… Multi-file analysis with 128K context window
βœ… Advanced attack chain identification
βœ… Enterprise-scale architecture security review
βœ… Detailed operational guidance

### What This Model Doesn't Do
❌ **Not a security scanner** - Use tools like Semgrep, CodeQL, or Snyk
❌ **Not a penetration testing tool** - Cannot perform active exploitation
❌ **Not legal/compliance advice** - Consult security professionals
❌ **Not a replacement for security experts** - Critical systems need professional review

### Known Characteristics
- Detailed analysis may generate verbose responses (trained on comprehensive security explanations)
- Optimized for common vulnerability patterns (OWASP Top 10) vs novel 0-days
- Best performance on code within OWASP taxonomy

---

## πŸ“ˆ Performance Benchmarks

### Hardware Requirements

**Minimum:**
- 28GB RAM
- 20GB GPU VRAM (with 4-bit quantization)

**Recommended:**
- 48GB RAM
- 24GB+ GPU (RTX 4090, A5000, A100)

**Inference Speed (on A100 40GB):**
- ~55 tokens/second (4-bit quantization)
- ~75 tokens/second (bfloat16)

### Code Generation Benchmarks (Base Qwen 2.5-Coder)

| Benchmark | Score | Rank |
|-----------|-------|------|
| HumanEval | 89.0% | #1 in 14B class |
| MBPP | 77.6% | Top tier |
| LiveCodeBench | 38.4% | Top 5 overall |
| MultiPL-E | 82.1% | Best multi-language |

**Performance:** Matches or exceeds many 32B+ models while requiring half the compute.

---

## πŸ”¬ Dataset Information

Trained on **[SecureCode v2.0](https://huggingface.co/datasets/scthornton/securecode-v2)**:
- **1,209 examples** with real CVE grounding
- **100% incident validation**
- **OWASP Top 10:2025** complete coverage
- **Expert security review**

---

## πŸ“„ License

**Model:** Apache 2.0 | **Dataset:** CC BY-NC-SA 4.0

---

## πŸ“š Citation

```bibtex
@misc{thornton2025securecode-qwen14b,
  title={Qwen 2.5-Coder 14B - SecureCode Edition},
  author={Thornton, Scott},
  year={2025},
  publisher={perfecXion.ai},
  url={https://huggingface.co/scthornton/qwen2.5-coder-14b-securecode}
}
```

---

## πŸ™ Acknowledgments

- **Alibaba Cloud & Qwen Team** for the exceptional Qwen 2.5-Coder base model
- **OWASP Foundation** for vulnerability taxonomy
- **MITRE** for CVE database
- **Enterprise security community** for real-world validation

---

## πŸ”— Related Models

- **[llama-3.2-3b-securecode](https://huggingface.co/scthornton/llama-3.2-3b-securecode)** - Most accessible (3B)
- **[qwen-coder-7b-securecode](https://huggingface.co/scthornton/qwen-coder-7b-securecode)** - Smaller Qwen variant (7B)
- **[deepseek-coder-6.7b-securecode](https://huggingface.co/scthornton/deepseek-coder-6.7b-securecode)** - Security-optimized (6.7B)
- **[codellama-13b-securecode](https://huggingface.co/scthornton/codellama-13b-securecode)** - Enterprise trusted (13B)
- **[starcoder2-15b-securecode](https://huggingface.co/scthornton/starcoder2-15b-securecode)** - Multi-language (15B)

[View Collection](https://huggingface.co/collections/scthornton/securecode)

---

<div align="center">

**Built with ❀️ for secure enterprise software development**

[perfecXion.ai](https://perfecxion.ai) | [Contact](mailto:scott@perfecxion.ai)

</div>