license: apache-2.0 language: [none] tags: - arm64 - aarch64 - malware-detection - AV - EDR - XDR - Rust - security - binary-analysis pipeline_tag: other library_name: none base_model: null new_version: v0.0.1 model-index: - name: WinnCore ARM64 AV Baseline results: - task: type: binary-classification name: Malware detection dataset: name: WinnCore ARM64 Corpus v0 type: winncore/arm64-corpus split: test metrics: - {name: ROC-AUC, type: roc-auc, value: 0.000} - {name: Average Precision, type: average_precision, value: 0.000} - {name: FPR@TPR=95%, type: fpr_at_tpr_95, value: 0.000} - {name: FPR@TPR=99%, type: fpr_at_tpr_99, value: 0.000}

WinnCore ARM64 AV Baseline

ARM64-first malware detection baseline. Rust extractor → JSONL features → LightGBM → ONNX.
No live malware binaries in this repo. Only hashes, feature rows, code, and reports.

Important: No live malware binaries are included in this repository. Only hashes, feature rows, code, and reports are shared.

Scope & Safety

  • Do not upload samples. Share only SHA-256 hashes and feature rows.
  • Perform training and evaluation in an isolated environment.
  • This is an R&D baseline only—not intended for production EDR use.

Repository Layout

extractor/   # Rust feature extractor (AArch64 disassembly, headers, n-grams)
training/    # Python scripts for training and ONNX export
models/      # Exported models (e.g., gbm_v0.onnx)
datasets/    # Manifests and features (no binaries)
eval/        # Metrics tables, plots, and write-ups

Quick Start

# Build the extractor
cd extractor && cargo build --release

# Extract features from binaries
./target/release/arm64_extractor /path/to/binaries datasets/features.jsonl

# Train LightGBM and export to ONNX
python training/train_lightgbm.py \
  --features datasets/features.jsonl \
  --out models/gbm_v0.onnx \
  --report eval/report_v0.md

Metrics Protocol

Report the following metrics and optimize for low false positives:

  • ROC-AUC
  • PR-AUC (Average Precision)
  • FPR@TPR=95%
  • FPR@TPR=99%

Targets for future improvements: FPR@TPR=95% ≤ 1%, FPR@TPR=99% ≤ 5% on unseen data.

Dataset Manifest (Example)

Located at datasets/manifest.jsonl:

{"sha256": "<hash>", "label": 1, "arch": "aarch64", "source": "internal_sandbox", "ts": "2025-11-02"}

Reproducibility Tips

  • Use small, atomic commits.
  • Leverage Git LFS for large files like *.onnx, *.pt, *.bin, *.pkl.
  • Pin dependency versions in training/requirements.txt and rust-toolchain.toml.
  • Evaluate performance at fixed thresholds, not just AUC scores.

Legal

Licensed under Apache-2.0. No redistribution of malware binaries is permitted. Use at your own risk. ```

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support