Qwen 2.5-Coder 14B - SecureCode Edition

License Training Dataset Base Model perfecXion.ai

Enterprise-grade code security - powerful reasoning with production efficiency

πŸ“„ Paper | πŸ€— Model Card | πŸ“Š Dataset | πŸ’» perfecXion.ai


🎯 What is This?

This is Qwen 2.5-Coder 14B Instruct fine-tuned on the SecureCode v2.0 dataset - the sweet spot between code intelligence and computational efficiency, now enhanced with production-grade security knowledge.

Qwen 2.5-Coder 14B delivers exceptional code understanding from the same architecture that powers the best-in-class 7B model, scaled up for enterprise complexity. Combined with SecureCode training, this model delivers:

βœ… Advanced security reasoning across complex codebases βœ… Production-ready efficiency - fits comfortably on single GPU βœ… Enterprise-scale analysis with 128K context window βœ… Best-in-class code understanding at the 14B parameter tier

The Result: An enterprise-ready security expert that runs efficiently on standard hardware.

Why Qwen 2.5-Coder 14B? This model offers the optimal balance:

  • 🎯 Superior to smaller models - More nuanced security analysis than 7B
  • ⚑ More efficient than 32B+ - 2x faster training, lower deployment cost
  • 🌍 92 programming languages - Comprehensive language coverage
  • πŸ“ 128K context window - Analyze entire applications at once
  • 🏒 Enterprise deployable - Runs on single A100 or 2x RTX 4090

🚨 The Problem This Solves

AI coding assistants produce vulnerable code in 45% of security-relevant scenarios (Veracode 2025). While smaller models miss nuanced vulnerabilities and larger models demand excessive resources, the 14B tier delivers the security intelligence enterprises need with the efficiency they demand.

Real-world enterprise impact:

  • Equifax breach: $425 million settlement + reputation damage
  • Capital One: 100 million customer records, $80M fine
  • SolarWinds: 18,000 organizations compromised

Qwen 2.5-Coder 14B SecureCode Edition brings advanced security analysis to enterprise-scale codebases without the infrastructure costs of 32B+ models.


πŸ’‘ Key Features

πŸ† Enterprise-Scale Code Intelligence

Qwen 2.5-Coder 14B delivers exceptional performance:

  • HumanEval: 89.0% pass@1 (surpasses many 30B+ models)
  • MBPP: 77.6% pass@1
  • MultiPL-E: 82.1% average across languages
  • Matches or exceeds 32B models on most benchmarks

Now enhanced with 1,209 security-focused examples covering OWASP Top 10:2025.

πŸ” Advanced Security Pattern Recognition

Trained on real-world security incidents:

  • 224 examples of Broken Access Control vulnerabilities
  • 199 examples of Authentication Failures
  • 125 examples of Injection attacks (SQL, Command, XSS)
  • 115 examples of Cryptographic Failures
  • Complete OWASP Top 10:2025 coverage

🌍 Production-Ready Multi-Language Support

Fine-tuned on security examples across:

  • Python (Django, Flask, FastAPI)
  • JavaScript/TypeScript (Express, NestJS, React)
  • Java (Spring Boot)
  • Go (Gin framework)
  • PHP (Laravel, Symfony)
  • C# (ASP.NET Core)
  • Ruby (Rails)
  • Rust (Actix, Rocket)
  • Plus 84 more languages from Qwen's base training

πŸ“‹ Sophisticated Security Analysis

Every response includes:

  1. Multi-layered vulnerability analysis with attack chain identification
  2. Defense-in-depth implementations with enterprise patterns
  3. Concrete exploitation demonstrations proving security flaws
  4. Operational guidance including monitoring, logging, and SIEM integration

πŸ“Š Training Details

Parameter Value
Base Model Qwen/Qwen2.5-Coder-14B-Instruct
Fine-tuning Method LoRA (Low-Rank Adaptation)
Training Dataset SecureCode v2.0
Dataset Size 841 training examples
Training Epochs 3
LoRA Rank (r) 16
LoRA Alpha 32
Learning Rate 2e-4
Quantization 4-bit (bitsandbytes)
Trainable Parameters ~74M (0.53% of 14B total)
Total Parameters 14B
Context Window 128K tokens (inherited from base)
GPU Used NVIDIA A100 40GB
Training Time ~8 hours (estimated)

Training Methodology

LoRA (Low-Rank Adaptation) preserves Qwen's exceptional code abilities:

  • Trains only 0.53% of model parameters
  • Maintains SOTA code generation quality
  • Adds security-specific knowledge without catastrophic forgetting
  • Enables deployment with minimal memory overhead

4-bit Quantization enables efficient training while maintaining model quality.

Extended Context: Qwen's 128K context window allows analyzing entire applications, making it ideal for enterprise security audits.


πŸš€ Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = "Qwen/Qwen2.5-Coder-14B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

# Load SecureCode LoRA adapter
model = PeftModel.from_pretrained(model, "scthornton/qwen2.5-coder-14b-securecode")

# Analyze enterprise codebase for vulnerabilities
prompt = """### User:
Perform a comprehensive security audit of this microservices authentication system:

```python
# auth-service/middleware.py
async def verify_token(request):
    token = request.headers.get('Authorization')
    if not token:
        return None

    payload = jwt.decode(token, settings.SECRET_KEY, algorithms=['HS256'])
    user = await User.get(id=payload['user_id'])
    return user

# payment-service/api.py
@app.post('/transfer')
async def transfer_funds(request):
    user = await verify_token(request)
    amount = request.json.get('amount')
    recipient = request.json.get('recipient_id')

    await process_transfer(user.id, recipient, amount)
    return {'status': 'success'}

Assistant:

"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=3072, temperature=0.3, # Lower temperature for precise analysis top_p=0.95, do_sample=True )

response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response)


### Enterprise Deployment (4-bit Quantization)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# 4-bit quantization - runs on 24GB GPU
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16"
)

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-14B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(base_model, "scthornton/qwen2.5-coder-14b-securecode")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-14B-Instruct", trust_remote_code=True)

# Production-ready: Runs on RTX 4090, A5000, or A100

Large-Scale Codebase Analysis

# Analyze multiple related files with 128K context
files_to_review = {
    "auth.py": open("backend/auth.py").read(),
    "middleware.py": open("backend/middleware.py").read(),
    "models.py": open("backend/models.py").read(),
}

combined_code = "\n\n".join([f"# {name}\n{code}" for name, code in files_to_review.items()])

prompt = f"""### User:
Perform a comprehensive security analysis of this authentication system. Identify:
1. All OWASP Top 10 vulnerabilities
2. Attack chains that combine multiple vulnerabilities
3. Race conditions and timing attacks
4. Authorization bypass opportunities

```python
{combined_code}

Assistant:

"""

inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=65536).to(model.device) outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.3) analysis = tokenizer.decode(outputs[0], skip_special_tokens=True) print(analysis)


---

## 🎯 Use Cases

### 1. **Enterprise Security Architecture Review**
Analyze complex multi-service architectures:

Review this microservices platform for security vulnerabilities, focusing on authentication flows, service-to-service authorization, and data validation boundaries


### 2. **Large Codebase Vulnerability Scanning**
With 128K context, analyze entire modules:

Audit this 10,000-line payment processing system for injection attacks, authorization bypasses, and cryptographic failures


### 3. **Advanced Attack Chain Analysis**
Identify sophisticated multi-step attacks:

Analyze how an attacker could chain CSRF, XSS, and session fixation to achieve account takeover in this web application


### 4. **Production Security Hardening**
Get operational security recommendations:

Design a defense-in-depth security architecture for this e-commerce platform handling 1M+ transactions/day


### 5. **Compliance-Focused Code Generation**
Generate SOC 2, PCI-DSS, HIPAA-compliant code:

Create a HIPAA-compliant patient data API with comprehensive audit logging, encryption at rest and in transit, and role-based access control


---

## ⚠️ Limitations

### What This Model Does Well
βœ… Complex security reasoning across large codebases
βœ… Multi-file analysis with 128K context window
βœ… Advanced attack chain identification
βœ… Enterprise-scale architecture security review
βœ… Detailed operational guidance

### What This Model Doesn't Do
❌ **Not a security scanner** - Use tools like Semgrep, CodeQL, or Snyk
❌ **Not a penetration testing tool** - Cannot perform active exploitation
❌ **Not legal/compliance advice** - Consult security professionals
❌ **Not a replacement for security experts** - Critical systems need professional review

### Known Characteristics
- Detailed analysis may generate verbose responses (trained on comprehensive security explanations)
- Optimized for common vulnerability patterns (OWASP Top 10) vs novel 0-days
- Best performance on code within OWASP taxonomy

---

## πŸ“ˆ Performance Benchmarks

### Hardware Requirements

**Minimum:**
- 28GB RAM
- 20GB GPU VRAM (with 4-bit quantization)

**Recommended:**
- 48GB RAM
- 24GB+ GPU (RTX 4090, A5000, A100)

**Inference Speed (on A100 40GB):**
- ~55 tokens/second (4-bit quantization)
- ~75 tokens/second (bfloat16)

### Code Generation Benchmarks (Base Qwen 2.5-Coder)

| Benchmark | Score | Rank |
|-----------|-------|------|
| HumanEval | 89.0% | #1 in 14B class |
| MBPP | 77.6% | Top tier |
| LiveCodeBench | 38.4% | Top 5 overall |
| MultiPL-E | 82.1% | Best multi-language |

**Performance:** Matches or exceeds many 32B+ models while requiring half the compute.

---

## πŸ”¬ Dataset Information

Trained on **[SecureCode v2.0](https://huggingface.co/datasets/scthornton/securecode-v2)**:
- **1,209 examples** with real CVE grounding
- **100% incident validation**
- **OWASP Top 10:2025** complete coverage
- **Expert security review**

---

## πŸ“„ License

**Model:** Apache 2.0 | **Dataset:** CC BY-NC-SA 4.0

---

## πŸ“š Citation

```bibtex
@misc{thornton2025securecode-qwen14b,
  title={Qwen 2.5-Coder 14B - SecureCode Edition},
  author={Thornton, Scott},
  year={2025},
  publisher={perfecXion.ai},
  url={https://huggingface.co/scthornton/qwen2.5-coder-14b-securecode}
}

πŸ™ Acknowledgments

  • Alibaba Cloud & Qwen Team for the exceptional Qwen 2.5-Coder base model
  • OWASP Foundation for vulnerability taxonomy
  • MITRE for CVE database
  • Enterprise security community for real-world validation

πŸ”— Related Models

View Collection


Built with ❀️ for secure enterprise software development

perfecXion.ai | Contact

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for scthornton/qwen2.5-coder-14b-securecode

Base model

Qwen/Qwen2.5-14B
Finetuned
(61)
this model

Dataset used to train scthornton/qwen2.5-coder-14b-securecode

Collection including scthornton/qwen2.5-coder-14b-securecode

Paper for scthornton/qwen2.5-coder-14b-securecode