--- license: apache-2.0 --- license: apache-2.0 language: [none] tags: - arm64 - aarch64 - malware-detection - AV - EDR - XDR - Rust - security - binary-analysis pipeline_tag: other library_name: none base_model: null new_version: v0.0.1 model-index: - name: WinnCore ARM64 AV Baseline results: - task: type: binary-classification name: Malware detection dataset: name: WinnCore ARM64 Corpus v0 type: winncore/arm64-corpus split: test metrics: - {name: ROC-AUC, type: roc-auc, value: 0.000} - {name: Average Precision, type: average_precision, value: 0.000} - {name: FPR@TPR=95%, type: fpr_at_tpr_95, value: 0.000} - {name: FPR@TPR=99%, type: fpr_at_tpr_99, value: 0.000} --- # WinnCore ARM64 AV Baseline ARM64-first malware detection baseline. Rust extractor → JSONL features → LightGBM → ONNX. No live malware binaries in this repo. Only hashes, feature rows, code, and reports. **Important:** No live malware binaries are included in this repository. Only hashes, feature rows, code, and reports are shared. ## Scope & Safety - **Do not upload samples.** Share only SHA-256 hashes and feature rows. - Perform training and evaluation in an isolated environment. - This is an R&D baseline only—not intended for production EDR use. ## Repository Layout ``` extractor/ # Rust feature extractor (AArch64 disassembly, headers, n-grams) training/ # Python scripts for training and ONNX export models/ # Exported models (e.g., gbm_v0.onnx) datasets/ # Manifests and features (no binaries) eval/ # Metrics tables, plots, and write-ups ``` ## Quick Start ```bash # Build the extractor cd extractor && cargo build --release # Extract features from binaries ./target/release/arm64_extractor /path/to/binaries datasets/features.jsonl # Train LightGBM and export to ONNX python training/train_lightgbm.py \ --features datasets/features.jsonl \ --out models/gbm_v0.onnx \ --report eval/report_v0.md ``` ## Metrics Protocol Report the following metrics and optimize for low false positives: - ROC-AUC - PR-AUC (Average Precision) - FPR@TPR=95% - FPR@TPR=99% Targets for future improvements: FPR@TPR=95% ≤ 1%, FPR@TPR=99% ≤ 5% on unseen data. ## Dataset Manifest (Example) Located at `datasets/manifest.jsonl`: ```json {"sha256": "", "label": 1, "arch": "aarch64", "source": "internal_sandbox", "ts": "2025-11-02"} ``` ## Reproducibility Tips - Use small, atomic commits. - Leverage Git LFS for large files like *.onnx, *.pt, *.bin, *.pkl. - Pin dependency versions in `training/requirements.txt` and `rust-toolchain.toml`. - Evaluate performance at fixed thresholds, not just AUC scores. ## Legal Licensed under Apache-2.0. No redistribution of malware binaries is permitted. Use at your own risk. ```