metadata
license: apache-2.0
license: apache-2.0 language: [none] tags: - arm64 - aarch64 - malware-detection - AV - EDR - XDR - Rust - security - binary-analysis pipeline_tag: other library_name: none base_model: null new_version: v0.0.1 model-index: - name: WinnCore ARM64 AV Baseline results: - task: type: binary-classification name: Malware detection dataset: name: WinnCore ARM64 Corpus v0 type: winncore/arm64-corpus split: test metrics: - {name: ROC-AUC, type: roc-auc, value: 0.000} - {name: Average Precision, type: average_precision, value: 0.000} - {name: FPR@TPR=95%, type: fpr_at_tpr_95, value: 0.000} - {name: FPR@TPR=99%, type: fpr_at_tpr_99, value: 0.000}
WinnCore ARM64 AV Baseline
ARM64-first malware detection baseline. Rust extractor → JSONL features → LightGBM → ONNX.
No live malware binaries in this repo. Only hashes, feature rows, code, and reports.
Important: No live malware binaries are included in this repository. Only hashes, feature rows, code, and reports are shared.
Scope & Safety
- Do not upload samples. Share only SHA-256 hashes and feature rows.
- Perform training and evaluation in an isolated environment.
- This is an R&D baseline only—not intended for production EDR use.
Repository Layout
extractor/ # Rust feature extractor (AArch64 disassembly, headers, n-grams)
training/ # Python scripts for training and ONNX export
models/ # Exported models (e.g., gbm_v0.onnx)
datasets/ # Manifests and features (no binaries)
eval/ # Metrics tables, plots, and write-ups
Quick Start
# Build the extractor
cd extractor && cargo build --release
# Extract features from binaries
./target/release/arm64_extractor /path/to/binaries datasets/features.jsonl
# Train LightGBM and export to ONNX
python training/train_lightgbm.py \
--features datasets/features.jsonl \
--out models/gbm_v0.onnx \
--report eval/report_v0.md
Metrics Protocol
Report the following metrics and optimize for low false positives:
- ROC-AUC
- PR-AUC (Average Precision)
- FPR@TPR=95%
- FPR@TPR=99%
Targets for future improvements: FPR@TPR=95% ≤ 1%, FPR@TPR=99% ≤ 5% on unseen data.
Dataset Manifest (Example)
Located at datasets/manifest.jsonl:
{"sha256": "<hash>", "label": 1, "arch": "aarch64", "source": "internal_sandbox", "ts": "2025-11-02"}
Reproducibility Tips
- Use small, atomic commits.
- Leverage Git LFS for large files like *.onnx, *.pt, *.bin, *.pkl.
- Pin dependency versions in
training/requirements.txtandrust-toolchain.toml. - Evaluate performance at fixed thresholds, not just AUC scores.
Legal
Licensed under Apache-2.0. No redistribution of malware binaries is permitted. Use at your own risk. ```