File size: 2,929 Bytes
53a8472 0feb62f d54b7eb 53a8472 0feb62f 53a8472 d54b7eb 53a8472 0feb62f d54b7eb 0feb62f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | ---
license: apache-2.0
---
license: apache-2.0
language: [none]
tags:
- arm64
- aarch64
- malware-detection
- AV
- EDR
- XDR
- Rust
- security
- binary-analysis
pipeline_tag: other
library_name: none
base_model: null
new_version: v0.0.1
model-index:
- name: WinnCore ARM64 AV Baseline
results:
- task:
type: binary-classification
name: Malware detection
dataset:
name: WinnCore ARM64 Corpus v0
type: winncore/arm64-corpus
split: test
metrics:
- {name: ROC-AUC, type: roc-auc, value: 0.000}
- {name: Average Precision, type: average_precision, value: 0.000}
- {name: FPR@TPR=95%, type: fpr_at_tpr_95, value: 0.000}
- {name: FPR@TPR=99%, type: fpr_at_tpr_99, value: 0.000}
---
# WinnCore ARM64 AV Baseline
ARM64-first malware detection baseline. Rust extractor → JSONL features → LightGBM → ONNX.
No live malware binaries in this repo. Only hashes, feature rows, code, and reports.
**Important:** No live malware binaries are included in this repository. Only hashes, feature rows, code, and reports are shared.
## Scope & Safety
- **Do not upload samples.** Share only SHA-256 hashes and feature rows.
- Perform training and evaluation in an isolated environment.
- This is an R&D baseline only—not intended for production EDR use.
## Repository Layout
```
extractor/ # Rust feature extractor (AArch64 disassembly, headers, n-grams)
training/ # Python scripts for training and ONNX export
models/ # Exported models (e.g., gbm_v0.onnx)
datasets/ # Manifests and features (no binaries)
eval/ # Metrics tables, plots, and write-ups
```
## Quick Start
```bash
# Build the extractor
cd extractor && cargo build --release
# Extract features from binaries
./target/release/arm64_extractor /path/to/binaries datasets/features.jsonl
# Train LightGBM and export to ONNX
python training/train_lightgbm.py \
--features datasets/features.jsonl \
--out models/gbm_v0.onnx \
--report eval/report_v0.md
```
## Metrics Protocol
Report the following metrics and optimize for low false positives:
- ROC-AUC
- PR-AUC (Average Precision)
- FPR@TPR=95%
- FPR@TPR=99%
Targets for future improvements: FPR@TPR=95% ≤ 1%, FPR@TPR=99% ≤ 5% on unseen data.
## Dataset Manifest (Example)
Located at `datasets/manifest.jsonl`:
```json
{"sha256": "<hash>", "label": 1, "arch": "aarch64", "source": "internal_sandbox", "ts": "2025-11-02"}
```
## Reproducibility Tips
- Use small, atomic commits.
- Leverage Git LFS for large files like *.onnx, *.pt, *.bin, *.pkl.
- Pin dependency versions in `training/requirements.txt` and `rust-toolchain.toml`.
- Evaluate performance at fixed thresholds, not just AUC scores.
## Legal
Licensed under Apache-2.0. No redistribution of malware binaries is permitted. Use at your own risk.
``` |