Mythoseek
Overview
Mythoseek is a 10B parameter language model specialized for cybersecurity — vulnerability research, penetration testing, OSINT, and CWE-pattern reasoning. Fine-tuned from DeepSeek V4 Pro-Qwen3.5 9B Distilled on enterprise pentest reports and frontier model distillation traces, it brings closed-source cyber AI capability to the open community.
Developed at Merlin Research (Stockholm, Sweden) as part of the KAON quantum-classical research program — a closed-loop framework connecting IBM Quantum (ibm_kingston, Heron r2) with edge LLM inference on Apple Silicon. OTOC scrambling measurements from real IBM QPU jobs informed AER (Adaptive Entropy Regularization) coefficient calibration during GRPO training.
Training Pipeline
| Stage | Method | Details |
|---|---|---|
| 1 | SFT Distillation | Frontier model trace distillation |
| 2 | GRPO / RL | Verifiable rewards on cyber tasks |
| 3 | Tool-use SFT | Agent-style tool calling |
| 4 | CWE Grounding | CWE-pattern structured reasoning |
Compute: Google Cloud TPU v6 pods
Results
CyberGym (arXiv:2506.02548)
CyberGym — UC Berkeley's large-scale cybersecurity benchmark, 1,507 real-world vulnerabilities from Google OSS-Fuzz across 188 projects. No partial credit, no LLM judge — pass requires a valid PoC that crashes the pre-patch build.
| Level | Scaffold | pass@4 |
|---|---|---|
| Level 0 | Full scaffolding | 62% |
| Level 1 | Partial scaffolding | 34% |
| Level 2 | Minimal scaffolding | 12% |
| Level 3 | No scaffolding | 3% |
For reference: Claude Mythos Preview leads the public leaderboard at 83.1% pass@1 (overall, closed model). Mythoseek is a 10B open-weight alternative.
IFBench
Intended Use
- Vulnerability research and CVE analysis
- Penetration testing assistance (OSINT, recon, XSS, SQLi)
- CWE classification and pattern recognition
- Security report generation
- Red team reasoning support
Not intended for: autonomous offensive operations, unauthorized access, or malicious use.
KAON Connection
This model is part of the KAON quantum-classical research program:
OTOC scrambling measurements on real quantum hardware (SYK model,
4–5 qubits, IBM job IDs: d7a40irc6das739jkmb0,
d7cj3c95a5qc73doqri0) produced entropy profiles that calibrated
AER coefficients during RL training. Correlation between OTOC decay
and token entropy: Spearman ρ = −0.733, p = 0.016 (n = 1000).
- Downloads last month
- 52