File size: 2,929 Bytes
53a8472
0feb62f
d54b7eb
 
53a8472
 
 
 
 
 
 
 
 
 
 
0feb62f
 
 
 
 
53a8472
 
 
 
 
 
 
 
 
 
 
 
 
 
d54b7eb
53a8472
0feb62f
 
d54b7eb
 
0feb62f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
license: apache-2.0
---
license: apache-2.0
language: [none]
tags:
  - arm64
  - aarch64
  - malware-detection
  - AV
  - EDR
  - XDR
  - Rust
  - security
  - binary-analysis
pipeline_tag: other
library_name: none
base_model: null
new_version: v0.0.1
model-index:
  - name: WinnCore ARM64 AV Baseline
    results:
      - task:
          type: binary-classification
          name: Malware detection
        dataset:
          name: WinnCore ARM64 Corpus v0
          type: winncore/arm64-corpus
          split: test
        metrics:
          - {name: ROC-AUC,           type: roc-auc,          value: 0.000}
          - {name: Average Precision, type: average_precision, value: 0.000}
          - {name: FPR@TPR=95%,       type: fpr_at_tpr_95,    value: 0.000}
          - {name: FPR@TPR=99%,       type: fpr_at_tpr_99,    value: 0.000}
---

# WinnCore ARM64 AV Baseline

ARM64-first malware detection baseline. Rust extractor → JSONL features → LightGBM → ONNX.  
No live malware binaries in this repo. Only hashes, feature rows, code, and reports.

**Important:** No live malware binaries are included in this repository. Only hashes, feature rows, code, and reports are shared.

## Scope & Safety

- **Do not upload samples.** Share only SHA-256 hashes and feature rows.
- Perform training and evaluation in an isolated environment.
- This is an R&D baseline only—not intended for production EDR use.

## Repository Layout

```
extractor/   # Rust feature extractor (AArch64 disassembly, headers, n-grams)
training/    # Python scripts for training and ONNX export
models/      # Exported models (e.g., gbm_v0.onnx)
datasets/    # Manifests and features (no binaries)
eval/        # Metrics tables, plots, and write-ups
```

## Quick Start

```bash
# Build the extractor
cd extractor && cargo build --release

# Extract features from binaries
./target/release/arm64_extractor /path/to/binaries datasets/features.jsonl

# Train LightGBM and export to ONNX
python training/train_lightgbm.py \
  --features datasets/features.jsonl \
  --out models/gbm_v0.onnx \
  --report eval/report_v0.md
```

## Metrics Protocol

Report the following metrics and optimize for low false positives:

- ROC-AUC
- PR-AUC (Average Precision)
- FPR@TPR=95%
- FPR@TPR=99%

Targets for future improvements: FPR@TPR=95% ≤ 1%, FPR@TPR=99% ≤ 5% on unseen data.

## Dataset Manifest (Example)

Located at `datasets/manifest.jsonl`:

```json
{"sha256": "<hash>", "label": 1, "arch": "aarch64", "source": "internal_sandbox", "ts": "2025-11-02"}
```

## Reproducibility Tips

- Use small, atomic commits.
- Leverage Git LFS for large files like *.onnx, *.pt, *.bin, *.pkl.
- Pin dependency versions in `training/requirements.txt` and `rust-toolchain.toml`.
- Evaluate performance at fixed thresholds, not just AUC scores.

## Legal

Licensed under Apache-2.0. No redistribution of malware binaries is permitted. Use at your own risk.
```