PEFT
Safetensors
English
cybersecurity
malware-analysis
att&ck
threat-intelligence
mixtral
lora
expert-adapters
cape-sandbox
digital-forensics
Instructions to use umer07/fathom-mixtral with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use umer07/fathom-mixtral with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1") model = PeftModel.from_pretrained(base_model, "umer07/fathom-mixtral") - Notebooks
- Google Colab
- Kaggle
| language: en | |
| license: cc-by-nc-4.0 | |
| tags: | |
| - cybersecurity | |
| - malware-analysis | |
| - att&ck | |
| - threat-intelligence | |
| - mixtral | |
| - lora | |
| - peft | |
| - expert-adapters | |
| - cape-sandbox | |
| - digital-forensics | |
| library_name: peft | |
| base_model: mistralai/Mixtral-8x7B-Instruct-v0.1 | |
| inference: false | |
| metrics: | |
| - CHUPPPPPAAAAAA | |
| # **Fathom** β Specialized Cybersecurity Analysis Model | |
| **Mixtral-8x7B-Instruct-v0.1 + 10Γ LoRA adapters (rank=32, bf16)** | |
| **Primary adapter:** `unified-v2` (general cybersecurity + malware analysis) | |
| **9 expert adapters** for domain-specific routing (static/dynamic analysis, network, forensics, threat intel, etc.) | |
| **Fathom** turns raw sandbox reports (CAPE, Joe Sandbox, etc.) into high-quality ATT&CK-mapped malware analysis. It outperforms general-purpose models on cybersecurity tasks while remaining fully open-source and runnable on a single AMD MI300X / A100 80GB. | |
| --- | |
| ## Model Overview | |
| - **Base:** Mixtral-8x7B-Instruct-v0.1 (full bf16, no quantization) | |
| - **Training:** Direct PEFT+TRL | |
| - **Adapters:** 1 unified + 9 expert LoRA adapters (all rank=32, Ξ±=16) | |
| - **Hardware:** AMD MI300X (205.8 GB VRAM) β full bf16 training | |
| - **Key Innovation:** Evidence extraction layer + structured behavioral prompts β **9Γ improvement** in real ATT&CK mapping | |
| **Designed for:** | |
| - Malware analysts & threat hunters | |
| - SOC / DFIR teams | |
| - CAPE / sandbox report enrichment | |
| - Automated ATT&CK technique extraction | |
| --- | |
| ## Benchmark Results | |
| All results use the **real Fathom pipeline** (`[INST]` chat template + 8192 context + structured evidence from CAPE extraction layer v3). Greedy decoding, bf16. | |
| ### 1. General Cybersecurity Knowledge (vs. Closed & Open Models) | |
| | Benchmark | Fathom unified-v2 | GPT-4 (ref) | GPT-3.5 (ref) | Base Mixtral-8x7B | Llama-2-70B (ref) | | |
| |----------------------------|-------------------|-------------|---------------|-------------------|-------------------| | |
| | **CyberMetric-80** | **91.25%** | ~87% | ~67% | 82.5% | ~57% | | |
| | MMLU Computer Security | **79.0%** | ~82% | ~65% | β | ~54% | | |
| | MMLU Security Studies | **64.0%** | ~74% | ~60% | β | ~48% | | |
| | TruthfulQA MC1 | **65.0%** | | | | | | |
| **Visual bar comparison (CyberMetric-80):** | |
| ``` | |
| Fathom unified-v2 ββββββββββββββββββββ 91.25% | |
| GPT-4 ββββββββββββββββββ ~87% | |
| Base Mixtral βββββββββββββββββ 82.5% | |
| GPT-3.5 ββββββββββββββ ~67% | |
| Llama-2-70B ββββββββββββ ~57% | |
| ``` | |
| ### 2. Expert Adapter Comparison (CyberMetric-80) | |
| | Adapter | Score | Specialty | | |
| |--------------------------|---------|------------------------------------| | |
| | `unified-v2` | **91.25%** | All-domain baseline | | |
| | `expert-e8-analyst` | **91.25%** | Analyst Q&A & reporting | | |
| | `expert-e3-network` | 90.00% | Network traffic / C2 analysis | | |
| | `expert-e4-forensics` | 90.00% | Memory & disk forensics | | |
| | `expert-e6-detection` | 88.75% | Detection engineering | | |
| | `expert-e7-reports` | 88.75% | Structured report generation | | |
| | `expert-e2-dynamic` | 85.00% | Behavioral / sandbox analysis | | |
| | `expert-e1-static` | 83.75% | Static PE + evasion detection | | |
| | `expert-e9-cot` | 87.50% | Chain-of-thought reasoning | | |
| | `expert-e5-threatintel` | 81.25% | Threat intel & actor profiling | | |
| ### 3. Core Contribution: Real ATT&CK Mapping Accuracy | |
| **Progression table** (same model weights, only input pipeline improved): | |
| | Configuration | Exact F1 | Parent F1 | Improvement | | |
| |----------------------------------------|----------|-----------|-------------| | |
| | Raw API list (naive) | 0.083 | 0.095 | β | | |
| | Structured prompt (manual) | 0.370 | 0.429 | +0.334 | | |
| | Real Fathom evidence layer | 0.534 | 0.508 | +0.413 | | |
| | **Real pipeline + full context fix** | **0.868**| **0.841** | **+0.746** | | |
| **This proves the architecture (evidence extraction + structured prompts) matters more than additional fine-tuning.** | |
| ### 4. Real Malware Analysis β CAPE Pipeline ( malscore 10/10 samples) | |
| | Sample | Family | GT T-codes | Predicted T-codes | Exact F1 | Parent F1 | Family ID | | |
| |--------|----------|-----------------------------|--------------------------------------------|----------|-----------|-----------| | |
| | 12 | Emotet | T1012, T1071, T1071.004, T1083 | T1012, T1055, T1071, T1071.004, T1083 | 0.889 | 0.857 | 100% conf | | |
| | 15 | Formbook | T1012, T1055, T1071, T1071.004, T1083 | T1003, T1012, T1027.002, T1055, T1059, T1071, T1071.004, T1083, T1497 | 0.714 | 0.667 | 85% conf | | |
| | 16 | Dridex | T1012, T1055, T1071, T1071.004, T1083 | T1012, T1055, T1071, T1071.004, T1083 | **1.000**| **1.000** | 68% conf | | |
| | **Average** | | | | **0.868**| **0.841** | β | | |
| ### 5. Additional Benchmarks | |
| - **ATT&CK Mapping MCQ (30 handcrafted questions):** 80% | |
| - **MMLU Machine Learning:** 60% | |
| - **MMLU Electrical Engineering:** 64% | |
| - **Rigorous ground-truth F1 (23 test cases):** Exact = 0.184, Parent = 0.344 (synthetic); real CAPE = 0.841 after pipeline fixes | |
| ### 5. Key Discovery: Mal-API-2019 Analysis | |
| We evaluated Fathom on the public **Mal-API-2019** dataset (Catak & YazΔ±, arXiv:1905.01999) β 7,107 API call sequences from Cuckoo Sandbox. | |
| | Variant | Accuracy | Macro F1 | | |
| |--------------------------|----------|----------| | |
| | Raw API sequences | 12.6% | 0.030 | | |
| | Filtered behavioral groups | 10.9% | 0.052 | | |
| ### Insight: | |
| Raw API sequences alone are insufficient for reliable family classification. The dataset contains heavy loader noise and families share nearly identical behavioral APIs. Ground-truth labels come from static AV signatures, not behavioral semantics. | |
| > β In contrast, Fathomβs full evidence extraction pipeline achieves 0.841 Parent F1 on real CAPEv2 reports. This demonstrates that structured behavioral evidence + multi-source context (not raw API text) is the critical enabler for production-grade malware analysis.β | |
| --- | |
| ## How to Use | |
| ### Loading the unified model (recommended for most users) | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1" | |
| adapter = "umer07/fathom-mixtral" # unified-v2 at root | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| trust_remote_code=True | |
| ) | |
| model = PeftModel.from_pretrained(model, adapter, adapter_name="unified-v2") | |
| model.eval() | |
| ``` | |
| --- | |
| ## Limitations | |
| - Sub-technique precision lower than parent techniques (standard across all LLMs) | |
| - Family identification improves significantly with KSPN enrichment | |
| - Rare/exotic TTPs (UAC bypass, ICMP C2) have low recall | |
| - Prompt injection / attribution hallucination remains a base-model weakness (mitigable with system prompt hardening) | |
| --- | |
| ## Training & Datasets | |
| - **Unified-v2:** 123,912 rows (1 epoch) | |
| - **Experts:** 9 specialized datasets (total > 200k rows after augmentation) | |
| - **Evasive dataset (NEW):** 25,160 obfuscated C++ samples (92 evasion combinations) | |
| - **ThreatIntel upgrade:** 9,532 rows (URLhaus + GTFOBins + MITRE CTI) | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{fathom2026, | |
| title={Fathom: Expert Cybersecurity Analysis with Mixtral LoRA Adapters}, | |
| author={Umer}, | |
| year={2026}, | |
| howpublished={\url{https://huggingface.co/umer07/fathom-mixtral}}, | |
| } | |
| ``` |