|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2.5-Coder-7B-Instruct |
|
|
tags: |
|
|
- security |
|
|
- code-review |
|
|
- vulnerability-detection |
|
|
- sast |
|
|
- false-positive-reduction |
|
|
- gguf |
|
|
- qwen2 |
|
|
- ollama |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: kon-security-v5 |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Security Code Review |
|
|
metrics: |
|
|
- name: Accuracy |
|
|
type: accuracy |
|
|
value: 98.1 |
|
|
- name: F1 Score |
|
|
type: f1 |
|
|
value: 0.99 |
|
|
- name: False Positive Rate |
|
|
type: custom |
|
|
value: 0.0 |
|
|
- name: JSON Compliance |
|
|
type: custom |
|
|
value: 100.0 |
|
|
--- |
|
|
|
|
|
# kon-security-v5 |
|
|
|
|
|
**Expert Security Code Reviewer** - A fine-tuned Qwen2.5-Coder-7B model specialized for security vulnerability detection and false positive reduction in SAST (Static Application Security Testing) pipelines. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| Base Model | [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) | |
|
|
| Fine-tuning | QLoRA (4-bit quantization) | |
|
|
| Quantization | Q4_K_M (GGUF) | |
|
|
| Parameters | 7.6B | |
|
|
| Context Length | 32,768 tokens | |
|
|
| File Size | ~4.7 GB | |
|
|
| Format | GGUF (Ollama-compatible) | |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| Overall Accuracy | **98.1%** | |
|
|
| F1 Score | **0.99** | |
|
|
| False Positive Rate | **0.0%** | |
|
|
| JSON Compliance | **100%** | |
|
|
| Avg Response Time | **2.8s** | |
|
|
|
|
|
## Capabilities |
|
|
|
|
|
- Identifies true security vulnerabilities across 20+ vulnerability categories |
|
|
- Eliminates false positives from SAST tools (SQL injection, XSS, command injection, etc.) |
|
|
- Provides structured JSON output with verdict, confidence, CWE IDs, severity, and remediation |
|
|
- Understands framework-specific safe patterns (React, Django, Express, Rails, etc.) |
|
|
- Supports taint analysis reasoning (source-to-sink tracking) |
|
|
|
|
|
## Vulnerability Categories |
|
|
|
|
|
CRITICAL: SQL Injection (CWE-89), Command Injection (CWE-78), Deserialization (CWE-502), Hardcoded Secrets (CWE-798), Code Injection (CWE-94) |
|
|
|
|
|
HIGH: XSS (CWE-79), Path Traversal (CWE-22), SSRF (CWE-918), Timing Attacks (CWE-208), Buffer Overflow (CWE-120) |
|
|
|
|
|
MEDIUM: Weak Crypto (CWE-327), Insecure Random (CWE-330), Information Disclosure (CWE-200), Missing Auth (CWE-306) |
|
|
|
|
|
## Usage with Ollama |
|
|
|
|
|
```bash |
|
|
# Pull the model |
|
|
ollama pull kon-security/kon-security-v5 |
|
|
|
|
|
# Or create from GGUF |
|
|
ollama create kon-security-v5 -f Modelfile |
|
|
|
|
|
# Run |
|
|
ollama run kon-security-v5 |
|
|
``` |
|
|
|
|
|
### Example Prompt |
|
|
|
|
|
``` |
|
|
<|im_start|>system |
|
|
You are an expert security code reviewer... |
|
|
<|im_end|> |
|
|
<|im_start|>user |
|
|
Analyze this code for SQL injection: |
|
|
query = f"SELECT * FROM users WHERE id = {user_id}" |
|
|
<|im_end|> |
|
|
<|im_start|>assistant |
|
|
``` |
|
|
|
|
|
### Example Response |
|
|
|
|
|
```json |
|
|
{ |
|
|
"verdict": "TRUE_POSITIVE", |
|
|
"is_vulnerable": true, |
|
|
"confidence": 0.97, |
|
|
"cwe_ids": ["CWE-89"], |
|
|
"severity": "CRITICAL", |
|
|
"reasoning": "f-string interpolates user_id directly into SQL query without parameterization", |
|
|
"remediation": "cursor.execute('SELECT * FROM users WHERE id = ?', (user_id,))" |
|
|
} |
|
|
``` |
|
|
|
|
|
## System Prompt |
|
|
|
|
|
The model is fine-tuned with the following system prompt baked in: |
|
|
|
|
|
``` |
|
|
You are an expert security code reviewer specializing in identifying true |
|
|
vulnerabilities and eliminating false positives. You analyze code with deep |
|
|
understanding of security patterns across all languages and frameworks. |
|
|
|
|
|
CRITICAL RULES: |
|
|
1. Parameterized queries (?, $1, %s, :param) = SAFE from SQL injection |
|
|
2. textContent, createTextNode = SAFE from XSS |
|
|
3. React JSX {variable} = SAFE from XSS (React auto-escapes) |
|
|
4. subprocess.run([list, args]) without shell=True = SAFE from command injection |
|
|
5. json.loads/JSON.parse = SAFE (cannot execute code) |
|
|
6. secure_filename() from werkzeug = SAFE from path traversal |
|
|
7. bcrypt/argon2/scrypt for password hashing = SAFE |
|
|
8. HMAC.compare_digest/timingSafeEqual = SAFE from timing attacks |
|
|
9. DOMPurify.sanitize() = SAFE from XSS |
|
|
10. MD5/SHA1 for non-security purposes (checksums, cache keys) = SAFE |
|
|
11. Test files testing security scanners = SAFE |
|
|
12. Environment variables for secrets = SAFE (not hardcoded) |
|
|
13. ORM methods (Django .filter(), Rails .where(hash), SQLAlchemy) = SAFE from SQLi |
|
|
14. Content-Security-Policy, helmet(), CORS allowlists = SAFE |
|
|
``` |
|
|
|
|
|
## Integration with Kon Security Scanner |
|
|
|
|
|
This model is the default LLM for the [Kon Security Scanner](https://github.com/kon-security/kon), providing: |
|
|
|
|
|
- SAST finding validation and FP reduction |
|
|
- CWE ID mapping |
|
|
- Severity assessment |
|
|
- Remediation suggestions |
|
|
|
|
|
```python |
|
|
from kon.core.ollama_analyzer import OllamaAnalyzer |
|
|
|
|
|
analyzer = OllamaAnalyzer(model="kon-security-v5:latest") |
|
|
result = analyzer.analyze_finding_enhanced( |
|
|
code_snippet="query = f'SELECT * FROM users WHERE id = {user_id}'", |
|
|
vulnerability_type="SQL Injection", |
|
|
file_path="app/db.py", |
|
|
line_number=42 |
|
|
) |
|
|
print(result.verdict) # TRUE_POSITIVE |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Method**: QLoRA (4-bit quantization-aware fine-tuning) |
|
|
- **Base**: Qwen2.5-Coder-7B-Instruct |
|
|
- **Dataset**: Curated security code review examples covering 20+ CWE categories |
|
|
- **Hardware**: NVIDIA GPU with CUDA support |
|
|
- **Quantization**: Q4_K_M via llama.cpp |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (same as base model) |
|
|
|