WinnCore commited on
Commit
0feb62f
·
verified ·
1 Parent(s): a0db60a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ license: apache-2.0
2
+ language: []
3
+ tags: [arm64, aarch64, malware-detection, AV, EDR, XDR, Rust, security, binary-analysis]
4
+ pipeline_tag: other
5
+ library_name: none
6
+ base_model: null
7
+ new_version: v0.0.1
8
+ model-index:
9
+ - name: WinnCore ARM64 AV Baseline
10
+ results:
11
+ - task: {type: binary-classification, name: Malware detection}
12
+ dataset: {name: WinnCore ARM64 Corpus v0, type: winncore/arm64-corpus, split: test}
13
+ metrics:
14
+ - {name: ROC-AUC, type: roc-auc, value: 0.000}
15
+ - {name: Average Precision, type: average_precision, value: 0.000}
16
+ - {name: FPR@TPR=95%, type: fpr_at_tpr_95, value: 0.000}
17
+ - {name: FPR@TPR=99%, type: fpr_at_tpr_99, value: 0.000}
18
+ ---
19
+ # WinnCore ARM64 AV Baseline
20
+
21
+ An ARM64-first malware detection baseline. Features a Rust-based extractor for generating JSONL features, followed by LightGBM training and ONNX export.
22
+
23
+ **Important:** No live malware binaries are included in this repository. Only hashes, feature rows, code, and reports are shared.
24
+
25
+ ## Scope & Safety
26
+
27
+ - **Do not upload samples.** Share only SHA-256 hashes and feature rows.
28
+ - Perform training and evaluation in an isolated environment.
29
+ - This is an R&D baseline only—not intended for production EDR use.
30
+
31
+ ## Repository Layout
32
+
33
+ ```
34
+ extractor/ # Rust feature extractor (AArch64 disassembly, headers, n-grams)
35
+ training/ # Python scripts for training and ONNX export
36
+ models/ # Exported models (e.g., gbm_v0.onnx)
37
+ datasets/ # Manifests and features (no binaries)
38
+ eval/ # Metrics tables, plots, and write-ups
39
+ ```
40
+
41
+ ## Quick Start
42
+
43
+ ```bash
44
+ # Build the extractor
45
+ cd extractor && cargo build --release
46
+
47
+ # Extract features from binaries
48
+ ./target/release/arm64_extractor /path/to/binaries datasets/features.jsonl
49
+
50
+ # Train LightGBM and export to ONNX
51
+ python training/train_lightgbm.py \
52
+ --features datasets/features.jsonl \
53
+ --out models/gbm_v0.onnx \
54
+ --report eval/report_v0.md
55
+ ```
56
+
57
+ ## Metrics Protocol
58
+
59
+ Report the following metrics and optimize for low false positives:
60
+
61
+ - ROC-AUC
62
+ - PR-AUC (Average Precision)
63
+ - FPR@TPR=95%
64
+ - FPR@TPR=99%
65
+
66
+ Targets for future improvements: FPR@TPR=95% ≤ 1%, FPR@TPR=99% ≤ 5% on unseen data.
67
+
68
+ ## Dataset Manifest (Example)
69
+
70
+ Located at `datasets/manifest.jsonl`:
71
+
72
+ ```json
73
+ {"sha256": "<hash>", "label": 1, "arch": "aarch64", "source": "internal_sandbox", "ts": "2025-11-02"}
74
+ ```
75
+
76
+ ## Reproducibility Tips
77
+
78
+ - Use small, atomic commits.
79
+ - Leverage Git LFS for large files like *.onnx, *.pt, *.bin, *.pkl.
80
+ - Pin dependency versions in `training/requirements.txt` and `rust-toolchain.toml`.
81
+ - Evaluate performance at fixed thresholds, not just AUC scores.
82
+
83
+ ## Legal
84
+
85
+ Licensed under Apache-2.0. No redistribution of malware binaries is permitted. Use at your own risk.
86
+ ```