CyberStrike-OffSec-35B

hf_20260626_003513_bed87a06-b6d3-4270-9ecd-edf23c1e31d6

The #1 Ranked Open-Source Model for Cybersecurity & Offensive Security


Model Size Precision License Architecture


SecEval SECURE MAET SECURE CWET CyberMetric MMLU CompSec


Outperforms GPT-4-turbo on SecEval | Outperforms GPT-4 on MITRE ATT&CK & CWE benchmarks


Quantized  •  Benchmarks  •  Quick Start  •  Model Details  •  Training  •  Architecture  •  Use Cases  •  Citation


What is CyberStrike?

CyberStrike-OffSec-35B is a domain-specialized large language model built for offensive security professionals, penetration testers, and security researchers. Fine-tuned on Qwen3.6-35B-A3B using a two-stage pipeline (SFT + DPO), it delivers expert-level knowledge across the entire offensive security lifecycle:

  • Vulnerability Discovery — SQL injection, XSS, SSRF, deserialization, business logic flaws
  • MITRE ATT&CK Operations — Technique identification, kill chain analysis, threat mapping
  • Exploit Development — PoC creation, payload crafting, evasion techniques
  • Cloud & Infrastructure — AWS/Azure/GCP misconfigurations, container escapes, IAM abuse
  • Red Team Operations — C2 setup, lateral movement, persistence, EDR evasion
  • Compliance & Standards — NIST, OWASP ASVS, CIS benchmarks, CVSS scoring

Model Format: This is the full-precision BF16 model (67 GB, 26 safetensors shards). For quantized versions, see below.

Available Versions

Repo Format Size Use Case
oyildirim/CyberStrike-OffSec-35B BF16 (full precision) 67 GB Transformers, vLLM, fine-tuning
oyildirim/CyberStrike-OffSec-35B-GGUF GGUF Q8_0 36 GB llama.cpp, Ollama, LM Studio
oyildirim/CyberStrike-OffSec-35B-GGUF GGUF Q6_K 27 GB llama.cpp, Ollama, LM Studio
oyildirim/CyberStrike-OffSec-35B-GGUF GGUF Q5_K_M 24 GB llama.cpp, Ollama, LM Studio
oyildirim/CyberStrike-OffSec-35B-GGUF GGUF Q4_K_M 21 GB llama.cpp, Ollama, LM Studio

Benchmark Results

CyberStrike achieves state-of-the-art results on multiple cybersecurity benchmarks, outperforming GPT-4-turbo, GPT-4, and all other evaluated models on domain-specific evaluations.

SecEval — #1 on Leaderboard

Outperforms GPT-4-turbo by +2.32 points across 9 cybersecurity domains, 2,189 questions.

Rank Model Overall Network Sec Web Sec PenTest Cryptography
#1 CyberStrike-OffSec-35B 81.39% 85.09% 85.34% 82.26% 75.00%
#2 GPT-4-turbo 79.07% 75.65% 82.15% 80.00% 64.29%
#3 GPT-3.5-turbo 62.09% 60.87% 63.00% 72.00% 35.71%
#4 Yi-6B 53.57% 56.52% 54.98% 69.26% 35.71%
Full SecEval Domain Breakdown (9 domains)
Domain CyberStrike GPT-4-turbo Delta
Network Security 85.09% 75.65% +9.44
Web Security 85.34% 82.15% +3.19
Vulnerability 83.33% 76.05% +7.28
Application Security 82.29% 75.25% +7.04
PenTest 82.26% 80.00% +2.26
Software Security 79.75% 73.28% +6.47
System Security 77.82% 73.61% +4.21
Cryptography 75.00% 64.29% +10.71
Memory Safety 71.43% 70.83% +0.60

CyberStrike leads in all 9 domains. Largest improvement: Cryptography (+10.71) and Network Security (+9.44).

SECURE — #1 on MITRE ATT&CK & CWE Tasks

Outperforms GPT-4 by +5.34 points on MITRE ATT&CK extraction. Evaluated on ICS cybersecurity scenarios.

Task CyberStrike GPT-4 Llama3-70B Gemini-Pro
MAET (MITRE ATT&CK) 93.94% 88.6% 86.3% 86.2%
CWET (CWE Knowledge) 93.05% 89.6% 90.4% 87.8%

CyberMetric-10000 — #4 out of 25 Models

9,189 expert-validated cybersecurity MCQ questions across NIST, RFC, and industry standards.

Rank Model Score
#1 GPT-4o 88.89%
#2 GPT-4-turbo 88.50%
#3 GEMINI-pro 1.0 87.50%
#4 CyberStrike-OffSec-35B 86.61%
#5 Mixtral-8x7B-Instruct 87.00%
#6 Falcon-180B-Chat 87.00%
#7 GPT-3.5-turbo 80.30%
General Benchmarks (lm-evaluation-harness, 0-shot)
Benchmark Score
MMLU (overall) 76.94%
MMLU — Social Sciences 86.81%
MMLU — Computer Security 86.00%
MMLU — Other 81.43%
MMLU — Security Studies 80.00%
MMLU — STEM 73.87%
MMLU — Humanities 69.59%
HellaSwag (acc_norm) 79.61%
ARC Easy 81.86%
ARC Challenge (acc_norm) 59.13%
WinoGrande 72.22%
TruthfulQA MC2 49.64%

Note: General benchmarks run at 0-shot. Few-shot performance expected to be higher.


Quick Start

Ollama (Easiest)

# Download and run the Q4_K_M quantized version
ollama run hf.co/oyildirim/CyberStrike-OffSec-35B-GGUF:Q4_K_M

llama.cpp

# Download the GGUF file from https://huggingface.co/oyildirim/CyberStrike-OffSec-35B-GGUF
./llama-cli -m CyberStrike-OffSec-35B-Q4_K_M.gguf \
  -p "Explain SSRF exploitation in cloud environments" \
  -n 512 --temp 0.7

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "oyildirim/CyberStrike-OffSec-35B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "oyildirim/CyberStrike-OffSec-35B",
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "Explain SSRF exploitation in cloud environments with AWS metadata service abuse."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM (Recommended for Production)

pip install vllm

vllm serve oyildirim/CyberStrike-OffSec-35B \
  --dtype bfloat16 \
  --max-model-len 4096 \
  --trust-remote-code \
  --served-model-name CyberStrike-OffSec-35B

Then use the OpenAI-compatible API:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="CyberStrike-OffSec-35B",
    messages=[{"role": "user", "content": "How to exploit deserialization vulnerabilities in Java applications?"}],
    max_tokens=2048,
)
print(response.choices[0].message.content)

Model Details

Property Value
Base Model Qwen3.6-35B-A3B
Type Mixture-of-Experts (MoE)
Total Parameters 35 Billion
Active Parameters ~3 Billion per token
Precision BF16 (Brain Float 16)
Model Size 67 GB (26 safetensors shards)
Context Length 4,096 tokens (training) / 262,144 max (architecture)
Training Method SFT + DPO (QLoRA)
Training Hardware NVIDIA H200 140GB SXM
License Apache 2.0

Training Pipeline

CyberStrike was trained using a two-stage alignment pipeline:

Stage 1: Supervised Fine-Tuning (SFT)

The base Qwen3.6-35B-A3B model was fine-tuned on a curated dataset of offensive security scenarios covering 10 categories:

web_app cloud post_exploitation edr_evasion malware_dev network social_engineering full_kill_chain lateral_movement persistence

  • Method: QLoRA (4-bit NF4 quantization)
  • LoRA Config: r=64, alpha=128, dropout=0
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Stage 2: Direct Preference Optimization (DPO)

The SFT model was further aligned using 115,250 preference pairs across 12 carefully designed axes, teaching the model to produce expert-level responses over superficial ones:

Axis Description Examples
MITRE ATT&CK Depth Deep technique analysis over surface-level summaries T1059 sub-technique breakdowns
CVE Analysis Detailed vulnerability analysis with CVSS scoring CVE-2024-* exploit chains
OWASP Methodology Structured testing methodology ASVS compliance checks
Cloud Security Provider-specific attack paths AWS IAM, Azure AD, GCP abuse
Tool Usage Proper tool invocation patterns Nmap, Burp, sqlmap workflows
ReAct Reasoning Step-by-step analytical thinking Multi-stage attack planning
Multi-turn Engagement Sustained deep conversation Progressive pentest engagement
Code-first Approach Working exploit code over theory PoC development, payload crafting
Techstack Analysis Technology-specific vulnerabilities Framework-specific attacks
Sub-agent Coordination Orchestrated multi-tool operations Combined recon + exploit chains
Business Logic Domain-aware vulnerability assessment Sector-specific attack scenarios
NIST Compliance Standards-aligned security assessment SP 800-53 control mapping
  • Method: QLoRA, LoRA r=32, alpha=64
  • DPO Beta: 0.1
  • Learning Rate: 5e-6 with cosine schedule
  • Effective Batch Size: 8
  • Training Steps: 9,142

Architecture

Qwen3.6-35B-A3B (Mixture-of-Experts)
├── 35B total parameters
├── ~3B active parameters per token
├── 256 experts, top-k routing
├── Grouped Query Attention (GQA)
├── RoPE positional encoding (theta=10M)
├── Max position embeddings: 262,144
└── BF16 precision (67 GB on disk)

The MoE architecture provides a unique advantage: expert-level knowledge at inference costs comparable to a 3B model, while having the knowledge capacity of a 35B model.


Use Cases

CyberStrike is designed for professionals conducting authorized security assessments:

  • Penetration Testing — Web app, network, cloud, and API security testing
  • Red Team Operations — Full kill chain simulation, C2 operations, evasion
  • Vulnerability Research — CVE analysis, exploit development, PoC creation
  • CTF Competitions — Challenge solving, reverse engineering, cryptography
  • Security Education — Training material generation, exam preparation
  • Threat Intelligence — MITRE ATT&CK mapping, threat actor TTPs
  • Compliance Assessment — NIST, OWASP, CIS benchmark evaluation

Ethical Use & Disclaimer

This model is intended exclusively for authorized security testing, education, and research purposes. Users must:

  • Obtain proper written authorization before testing any systems
  • Comply with all applicable laws and regulations
  • Follow responsible disclosure practices
  • Never use this model for unauthorized access or malicious activities

The authors are not responsible for any misuse of this model.


Citation

@misc{cyberstrike2025,
  title={CyberStrike-OffSec-35B: A Domain-Specialized LLM for Offensive Security},
  author={Orhan Yildirim},
  year={2025},
  url={https://huggingface.co/oyildirim/CyberStrike-OffSec-35B}
}

Built with purpose. Benchmarked with rigor. Designed for professionals.


Made with Love HuggingFace Qwen

Downloads last month
18
Safetensors
Model size
36B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oyildirim/CyberStrike-OffSec-35B

Finetuned
(145)
this model
Quantizations
1 model

Evaluation results