File size: 8,186 Bytes
190878e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bb6855
190878e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
662fc41
190878e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0e59c9e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
language: en
license: cc-by-nc-4.0
tags:
- cybersecurity
- malware-analysis
- att&ck
- threat-intelligence
- mixtral
- lora
- peft
- expert-adapters
- cape-sandbox
- digital-forensics
library_name: peft
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
inference: false
metrics:
- CHUPPPPPAAAAAA
---

# **Fathom** β€” Specialized Cybersecurity Analysis Model

**Mixtral-8x7B-Instruct-v0.1 + 10Γ— LoRA adapters (rank=32, bf16)**  
**Primary adapter:** `unified-v2` (general cybersecurity + malware analysis)  
**9 expert adapters** for domain-specific routing (static/dynamic analysis, network, forensics, threat intel, etc.)


**Fathom** turns raw sandbox reports (CAPE, Joe Sandbox, etc.) into high-quality ATT&CK-mapped malware analysis. It outperforms general-purpose models on cybersecurity tasks while remaining fully open-source and runnable on a single AMD MI300X / A100 80GB.

---

## Model Overview

- **Base:** Mixtral-8x7B-Instruct-v0.1 (full bf16, no quantization)
- **Training:** Direct PEFT+TRL 
- **Adapters:** 1 unified + 9 expert LoRA adapters (all rank=32, Ξ±=16)
- **Hardware:** AMD MI300X (205.8 GB VRAM) β€” full bf16 training
- **Key Innovation:** Evidence extraction layer + structured behavioral prompts β†’ **9Γ— improvement** in real ATT&CK mapping 

**Designed for:**
- Malware analysts & threat hunters
- SOC / DFIR teams
- CAPE / sandbox report enrichment
- Automated ATT&CK technique extraction

---

## Benchmark Results

All results use the **real Fathom pipeline** (`[INST]` chat template + 8192 context + structured evidence from CAPE extraction layer v3). Greedy decoding, bf16.

### 1. General Cybersecurity Knowledge (vs. Closed & Open Models)

| Benchmark                  | Fathom unified-v2 | GPT-4 (ref) | GPT-3.5 (ref) | Base Mixtral-8x7B | Llama-2-70B (ref) |
|----------------------------|-------------------|-------------|---------------|-------------------|-------------------|
| **CyberMetric-80**        | **91.25%**        | ~87%        | ~67%          | 82.5%             | ~57%              |
| MMLU Computer Security    | **79.0%**         | ~82%        | ~65%          | β€”                 | ~54%              |
| MMLU Security Studies     | **64.0%**         | ~74%        | ~60%          | β€”                 | ~48%              |
| TruthfulQA MC1            | **65.0%**         |             |               |                   |                   |

**Visual bar comparison (CyberMetric-80):**

```
Fathom unified-v2     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 91.25%
GPT-4                 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   ~87%
Base Mixtral          β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    82.5%
GPT-3.5               β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ       ~67%
Llama-2-70B           β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         ~57%
```

### 2. Expert Adapter Comparison (CyberMetric-80)

| Adapter                  | Score   | Specialty                          |
|--------------------------|---------|------------------------------------|
| `unified-v2`             | **91.25%** | All-domain baseline               |
| `expert-e8-analyst`      | **91.25%** | Analyst Q&A & reporting           |
| `expert-e3-network`      | 90.00%  | Network traffic / C2 analysis     |
| `expert-e4-forensics`    | 90.00%  | Memory & disk forensics           |
| `expert-e6-detection`    | 88.75%  | Detection engineering             |
| `expert-e7-reports`      | 88.75%  | Structured report generation      |
| `expert-e2-dynamic`      | 85.00%  | Behavioral / sandbox analysis     |
| `expert-e1-static`       | 83.75%  | Static PE + evasion detection     |
| `expert-e9-cot`          | 87.50%  | Chain-of-thought reasoning        |
| `expert-e5-threatintel`  | 81.25%  | Threat intel & actor profiling    |

### 3. Core Contribution: Real ATT&CK Mapping Accuracy

**Progression table** (same model weights, only input pipeline improved):

| Configuration                          | Exact F1 | Parent F1 | Improvement |
|----------------------------------------|----------|-----------|-------------|
| Raw API list (naive)                   | 0.083    | 0.095     | β€”           |
| Structured prompt (manual)             | 0.370    | 0.429     | +0.334      |
| Real Fathom evidence layer             | 0.534    | 0.508     | +0.413      |
| **Real pipeline + full context fix**   | **0.868**| **0.841** | **+0.746**  |

**This proves the architecture (evidence extraction + structured prompts) matters more than additional fine-tuning.**

### 4. Real Malware Analysis β€” CAPE Pipeline ( malscore 10/10 samples)

| Sample | Family   | GT T-codes                  | Predicted T-codes                          | Exact F1 | Parent F1 | Family ID |
|--------|----------|-----------------------------|--------------------------------------------|----------|-----------|-----------|
| 12     | Emotet   | T1012, T1071, T1071.004, T1083 | T1012, T1055, T1071, T1071.004, T1083    | 0.889    | 0.857     | 100% conf |
| 15     | Formbook | T1012, T1055, T1071, T1071.004, T1083 | T1003, T1012, T1027.002, T1055, T1059, T1071, T1071.004, T1083, T1497 | 0.714    | 0.667     | 85% conf  |
| 16     | Dridex   | T1012, T1055, T1071, T1071.004, T1083 | T1012, T1055, T1071, T1071.004, T1083    | **1.000**| **1.000** | 68% conf  |
| **Average** |       |                             |                                            | **0.868**| **0.841** | β€”         |



### 5. Additional Benchmarks

- **ATT&CK Mapping MCQ (30 handcrafted questions):** 80%
- **MMLU Machine Learning:** 60%
- **MMLU Electrical Engineering:** 64%
- **Rigorous ground-truth F1 (23 test cases):** Exact = 0.184, Parent = 0.344 (synthetic); real CAPE = 0.841 after pipeline fixes

### 5. Key Discovery: Mal-API-2019 Analysis

We evaluated Fathom on the public **Mal-API-2019** dataset (Catak & YazΔ±, arXiv:1905.01999) β€” 7,107 API call sequences from Cuckoo Sandbox.

| Variant                  | Accuracy | Macro F1 |
|--------------------------|----------|----------|
| Raw API sequences        | 12.6%    | 0.030    |
| Filtered behavioral groups | 10.9%  | 0.052    |

### Insight: 

Raw API sequences alone are insufficient for reliable family classification. The dataset contains heavy loader noise and families share nearly identical behavioral APIs. Ground-truth labels come from static AV signatures, not behavioral semantics.
> β€œ In contrast, Fathom’s full evidence extraction pipeline achieves 0.841 Parent F1 on real CAPEv2 reports. This demonstrates that structured behavioral evidence + multi-source context (not raw API text) is the critical enabler for production-grade malware analysis.” 

---

## How to Use

### Loading the unified model (recommended for most users)

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"
adapter = "umer07/fathom-mixtral"   # unified-v2 at root

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter, adapter_name="unified-v2")
model.eval()
```


---

## Limitations 

- Sub-technique precision lower than parent techniques (standard across all LLMs)
- Family identification improves significantly with KSPN enrichment
- Rare/exotic TTPs (UAC bypass, ICMP C2) have low recall
- Prompt injection / attribution hallucination remains a base-model weakness (mitigable with system prompt hardening)


---

## Training & Datasets

- **Unified-v2:** 123,912 rows (1 epoch)
- **Experts:** 9 specialized datasets (total > 200k rows after augmentation)
- **Evasive dataset (NEW):** 25,160 obfuscated C++ samples (92 evasion combinations)
- **ThreatIntel upgrade:** 9,532 rows (URLhaus + GTFOBins + MITRE CTI)

---

## Citation

```bibtex
@misc{fathom2026,
  title={Fathom: Expert Cybersecurity Analysis with Mixtral LoRA Adapters},
  author={Umer},
  year={2026},
  howpublished={\url{https://huggingface.co/umer07/fathom-mixtral}},
}
```