CyberCoder-7B-v1 🛡️
A cybersecurity-focused code model fine-tuned from Qwen/Qwen2.5-Coder-7B-Instruct for:
- CVE vulnerability analysis with structured JSON output
- AST-based code security review
- GDB crash trace analysis and exploitability assessment
- ROP chain construction and binary exploitation
- MITRE ATT&CK mapping and threat intelligence
- Code reasoning with chain-of-thought
Training Recipe
Based on CyberPal 2.0 methodology:
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-Coder-7B-Instruct |
| Method | SFT with LoRA (r=64, α=128) |
| Learning rate | 4e-5 |
| Warmup ratio | 0.15 |
| Epochs | 2 |
| Max seq length | 4096 |
| Optimizer | AdamW + cosine schedule |
| Dataset | moro72842/cybersecurity-sft-dataset (20K examples) |
Dataset Composition
| Source | Count | Description |
|---|---|---|
| CVE Records | 10,000 | Multi-turn CVE analysis from 297K records |
| Code Feedback | 5,000 | Code reasoning with iterative refinement |
| OpenCodeReasoning | 5,000 | Chain-of-thought code problem solving |
| Synthetic Security | 8 | JSON-structured CVE, AST, GDB, ROP examples |
Capabilities
JSON Structured Output
Trained on examples that require structured JSON output with <reasoning> blocks followed by JSON. Pattern:
<reasoning>
Step-by-step analysis...
</reasoning>
```json
{...structured output...}
### Cybersecurity Domains
- Vulnerability analysis (CVE/CWE)
- Static code analysis with AST parsing
- Binary exploitation (ROP chains, buffer overflows)
- Crash dump / GDB trace analysis
- Threat intelligence (MITRE ATT&CK mapping)
- Malware behavior classification
- Network intrusion detection
## Usage
```python
from transformers import pipeline
pipe = pipeline("text-generation", model="moro72842/CyberCoder-7B-v1", torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are a cybersecurity expert. Provide detailed analysis with structured JSON output."},
{"role": "user", "content": "Analyze CVE-2021-44228 and provide the analysis as JSON."}
]
response = pipe(messages, max_new_tokens=2048, temperature=0.1)
print(response[0]["generated_text"][-1]["content"])
Architecture & Efficiency Considerations
This model demonstrates the approach described in the training documentation for building cybersecurity-capable models:
- MoE consideration: For production 100B+ models, sparse MoE (DeepSeek-V3 style) with 64-128 experts reduces active params to ~37B
- MLA attention: Multi-Head Latent Attention compresses KV cache for long-context inference
- LoRA efficiency: This 7B model uses LoRA (r=64), training only ~2% of parameters while achieving strong domain performance
- Structured output: JSON structured output trained via SFT examples rather than constrained decoding (per RL-Struct findings)
License
Apache 2.0 (inherited from Qwen2.5-Coder)