# CyberCoder-7B-v1 🛡️ A cybersecurity-focused code model fine-tuned from [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) for: - **CVE vulnerability analysis** with structured JSON output - **AST-based code security review** - **GDB crash trace analysis** and exploitability assessment - **ROP chain construction** and binary exploitation - **MITRE ATT&CK mapping** and threat intelligence - **Code reasoning** with chain-of-thought ## Training Recipe Based on [CyberPal 2.0](https://arxiv.org/abs/2510.14113) methodology: | Parameter | Value | |-----------|-------| | Base model | Qwen/Qwen2.5-Coder-7B-Instruct | | Method | SFT with LoRA (r=64, α=128) | | Learning rate | 4e-5 | | Warmup ratio | 0.15 | | Epochs | 2 | | Max seq length | 4096 | | Optimizer | AdamW + cosine schedule | | Dataset | moro72842/cybersecurity-sft-dataset (20K examples) | ## Dataset Composition | Source | Count | Description | |--------|-------|-------------| | CVE Records | 10,000 | Multi-turn CVE analysis from 297K records | | Code Feedback | 5,000 | Code reasoning with iterative refinement | | OpenCodeReasoning | 5,000 | Chain-of-thought code problem solving | | Synthetic Security | 8 | JSON-structured CVE, AST, GDB, ROP examples | ## Capabilities ### JSON Structured Output Trained on examples that require structured JSON output with `` blocks followed by JSON. Pattern: ``` Step-by-step analysis... ```json {...structured output...} ``` ``` ### Cybersecurity Domains - Vulnerability analysis (CVE/CWE) - Static code analysis with AST parsing - Binary exploitation (ROP chains, buffer overflows) - Crash dump / GDB trace analysis - Threat intelligence (MITRE ATT&CK mapping) - Malware behavior classification - Network intrusion detection ## Usage ```python from transformers import pipeline pipe = pipeline("text-generation", model="moro72842/CyberCoder-7B-v1", torch_dtype="auto", device_map="auto") messages = [ {"role": "system", "content": "You are a cybersecurity expert. Provide detailed analysis with structured JSON output."}, {"role": "user", "content": "Analyze CVE-2021-44228 and provide the analysis as JSON."} ] response = pipe(messages, max_new_tokens=2048, temperature=0.1) print(response[0]["generated_text"][-1]["content"]) ``` ## Architecture & Efficiency Considerations This model demonstrates the approach described in the training documentation for building cybersecurity-capable models: - **MoE consideration**: For production 100B+ models, sparse MoE (DeepSeek-V3 style) with 64-128 experts reduces active params to ~37B - **MLA attention**: Multi-Head Latent Attention compresses KV cache for long-context inference - **LoRA efficiency**: This 7B model uses LoRA (r=64), training only ~2% of parameters while achieving strong domain performance - **Structured output**: JSON structured output trained via SFT examples rather than constrained decoding (per RL-Struct findings) ## License Apache 2.0 (inherited from Qwen2.5-Coder)