Fathom — Specialized Cybersecurity Analysis Model

Mixtral-8x7B-Instruct-v0.1 + 10× LoRA adapters (rank=32, bf16)
Primary adapter: unified-v2 (general cybersecurity + malware analysis)
9 expert adapters for domain-specific routing (static/dynamic analysis, network, forensics, threat intel, etc.)

Fathom turns raw sandbox reports (CAPE, Joe Sandbox, etc.) into high-quality ATT&CK-mapped malware analysis. It outperforms general-purpose models on cybersecurity tasks while remaining fully open-source and runnable on a single AMD MI300X / A100 80GB.

Model Overview

Base: Mixtral-8x7B-Instruct-v0.1 (full bf16, no quantization)
Training: Direct PEFT+TRL
Adapters: 1 unified + 9 expert LoRA adapters (all rank=32, α=16)
Hardware: AMD MI300X (205.8 GB VRAM) — full bf16 training
Key Innovation: Evidence extraction layer + structured behavioral prompts → 9× improvement in real ATT&CK mapping

Designed for:

Malware analysts & threat hunters
SOC / DFIR teams
CAPE / sandbox report enrichment
Automated ATT&CK technique extraction

Benchmark Results

All results use the real Fathom pipeline ([INST] chat template + 8192 context + structured evidence from CAPE extraction layer v3). Greedy decoding, bf16.

1. General Cybersecurity Knowledge (vs. Closed & Open Models)

Benchmark	Fathom unified-v2	GPT-4 (ref)	GPT-3.5 (ref)	Base Mixtral-8x7B	Llama-2-70B (ref)
CyberMetric-80	91.25%	~87%	~67%	82.5%	~57%
MMLU Computer Security	79.0%	~82%	~65%	—	~54%
MMLU Security Studies	64.0%	~74%	~60%	—	~48%
TruthfulQA MC1	65.0%

Visual bar comparison (CyberMetric-80):

Fathom unified-v2     ████████████████████ 91.25%
GPT-4                 ██████████████████   ~87%
Base Mixtral          █████████████████    82.5%
GPT-3.5               ██████████████       ~67%
Llama-2-70B           ████████████         ~57%

2. Expert Adapter Comparison (CyberMetric-80)

Adapter	Score	Specialty
`unified-v2`	91.25%	All-domain baseline
`expert-e8-analyst`	91.25%	Analyst Q&A & reporting
`expert-e3-network`	90.00%	Network traffic / C2 analysis
`expert-e4-forensics`	90.00%	Memory & disk forensics
`expert-e6-detection`	88.75%	Detection engineering
`expert-e7-reports`	88.75%	Structured report generation
`expert-e2-dynamic`	85.00%	Behavioral / sandbox analysis
`expert-e1-static`	83.75%	Static PE + evasion detection
`expert-e9-cot`	87.50%	Chain-of-thought reasoning
`expert-e5-threatintel`	81.25%	Threat intel & actor profiling

3. Core Contribution: Real ATT&CK Mapping Accuracy

Progression table (same model weights, only input pipeline improved):

Configuration	Exact F1	Parent F1	Improvement
Raw API list (naive)	0.083	0.095	—
Structured prompt (manual)	0.370	0.429	+0.334
Real Fathom evidence layer	0.534	0.508	+0.413
Real pipeline + full context fix	0.868	0.841	+0.746

This proves the architecture (evidence extraction + structured prompts) matters more than additional fine-tuning.

4. Real Malware Analysis — CAPE Pipeline ( malscore 10/10 samples)

Sample	Family	GT T-codes	Predicted T-codes	Exact F1	Parent F1	Family ID
12	Emotet	T1012, T1071, T1071.004, T1083	T1012, T1055, T1071, T1071.004, T1083	0.889	0.857	100% conf
15	Formbook	T1012, T1055, T1071, T1071.004, T1083	T1003, T1012, T1027.002, T1055, T1059, T1071, T1071.004, T1083, T1497	0.714	0.667	85% conf
16	Dridex	T1012, T1055, T1071, T1071.004, T1083	T1012, T1055, T1071, T1071.004, T1083	1.000	1.000	68% conf
Average				0.868	0.841	—

5. Additional Benchmarks

ATT&CK Mapping MCQ (30 handcrafted questions): 80%
MMLU Machine Learning: 60%
MMLU Electrical Engineering: 64%
Rigorous ground-truth F1 (23 test cases): Exact = 0.184, Parent = 0.344 (synthetic); real CAPE = 0.841 after pipeline fixes

5. Key Discovery: Mal-API-2019 Analysis

We evaluated Fathom on the public Mal-API-2019 dataset (Catak & Yazı, arXiv:1905.01999) — 7,107 API call sequences from Cuckoo Sandbox.

Variant	Accuracy	Macro F1
Raw API sequences	12.6%	0.030
Filtered behavioral groups	10.9%	0.052

Insight:

Raw API sequences alone are insufficient for reliable family classification. The dataset contains heavy loader noise and families share nearly identical behavioral APIs. Ground-truth labels come from static AV signatures, not behavioral semantics.

“ In contrast, Fathom’s full evidence extraction pipeline achieves 0.841 Parent F1 on real CAPEv2 reports. This demonstrates that structured behavioral evidence + multi-source context (not raw API text) is the critical enabler for production-grade malware analysis.”

How to Use

Loading the unified model (recommended for most users)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"
adapter = "umer07/fathom-mixtral"   # unified-v2 at root

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter, adapter_name="unified-v2")
model.eval()

Limitations

Sub-technique precision lower than parent techniques (standard across all LLMs)
Family identification improves significantly with KSPN enrichment
Rare/exotic TTPs (UAC bypass, ICMP C2) have low recall
Prompt injection / attribution hallucination remains a base-model weakness (mitigable with system prompt hardening)

Training & Datasets

Unified-v2: 123,912 rows (1 epoch)
Experts: 9 specialized datasets (total > 200k rows after augmentation)
Evasive dataset (NEW): 25,160 obfuscated C++ samples (92 evasion combinations)
ThreatIntel upgrade: 9,532 rows (URLhaus + GTFOBins + MITRE CTI)

Citation

@misc{fathom2026,
  title={Fathom: Expert Cybersecurity Analysis with Mixtral LoRA Adapters},
  author={Umer},
  year={2026},
  howpublished={\url{https://huggingface.co/umer07/fathom-mixtral}},
}

Downloads last month: 18

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 3 Ask for provider support

Model tree for umer07/fathom-mixtral

Base model

mistralai/Mixtral-8x7B-v0.1

Finetuned

mistralai/Mixtral-8x7B-Instruct-v0.1

Adapter

(120)

this model

Paper for umer07/fathom-mixtral

A Benchmark API Call Dataset for Windows PE Malware Classification

Paper • 1905.01999 • Published Feb 21, 2021