Tabular Classification
LiteRT
malware
cybersecurity
pe-files
binary-classification
tabular-data
threat-intelligence
digital-forensics
reverse-engineering
incident-response
security-telemetry
ai-security
security-ml
mitre-attack
mitre-mbc
windows
executable-files
static-analysis
behavioral-analysis
classification
anomaly-detection
intrusion-detection
explainable-ai
model-evaluation
benchmarking
training
evaluation
research
education
teaching
quantized
edge-inference
| license: cc-by-4.0 | |
| pipeline_tag: tabular-classification | |
| tags: | |
| - malware | |
| - cybersecurity | |
| - pe-files | |
| - binary-classification | |
| - tabular-data | |
| - threat-intelligence | |
| - digital-forensics | |
| - reverse-engineering | |
| - incident-response | |
| - security-telemetry | |
| - ai-security | |
| - security-ml | |
| - mitre-attack | |
| - mitre-mbc | |
| - windows | |
| - executable-files | |
| - static-analysis | |
| - behavioral-analysis | |
| - classification | |
| - anomaly-detection | |
| - intrusion-detection | |
| - explainable-ai | |
| - model-evaluation | |
| - benchmarking | |
| - training | |
| - evaluation | |
| - research | |
| - education | |
| - teaching | |
| - quantized | |
| - tflite | |
| - edge-inference | |
| model_name: AURA Q1 | |
| # AURA Q1 | |
| **AURA Q1** is the free, quantized Windows release of the AURA malware classification model family. | |
| It is designed for efficient inference on structured telemetry extracted from **Windows PE files**, with a focus on lightweight deployment, fast scoring, and reproducible preprocessing. | |
| This repository contains a quantized model artifact intended for **inference only**. | |
| ## What this model does | |
| AURA Q1 performs **tabular binary classification** for Windows executable analysis workflows. It is intended for use in research, education, prototyping, and defensive experimentation where users want a compact model that can score extracted static PE features. | |
| Depending on your surrounding pipeline, the model can support workflows involving: | |
| - malware / benign classification | |
| - triage prioritization | |
| - bulk telemetry scoring | |
| - offline experimentation | |
| - edge or constrained deployment scenarios | |
| ## Model format | |
| This release is distributed as a **quantized TensorFlow Lite model**. | |
| Primary characteristics: | |
| - compact deployment format | |
| - lower memory footprint than a full-precision model | |
| - suitable for portable and embedded inference scenarios | |
| - optimized for inference speed and distribution simplicity | |
| ## Full version availability | |
| **AURA Q1** is the free quantized Windows release of AURA for lightweight local and edge inference. | |
| If you want to use the full version of **AURA** in a broader analysis workflow, including **Windows, Linux, and Android classification**, it is available through **Traceix**. | |
| **Traceix:** https://traceix.com | |
| Traceix is operated by **PCEF (Perkins Cybersecurity Educational Fund)**, a 501(c)(3) nonprofit that provides free cybersecurity education, tools, and training. | |
| **Learn more about PCEF:** https://perkinsfund.org | |
| ## Input schema | |
| AURA Q1 expects **30 numeric input features** in a fixed order. The preprocessing configuration attached to this release defines the exact feature list and normalization parameters for the Windows model. :contentReference[oaicite:1]{index=1} | |
| ### Feature order | |
| ```text | |
| 1. MajorImageVersion | |
| 2. MajorOperatingSystemVersion | |
| 3. MajorSubsystemVersion | |
| 4. ImageBase | |
| 5. MinorLinkerVersion | |
| 6. CheckSum | |
| 7. BaseOfData | |
| 8. SectionsMaxEntropy | |
| 9. MajorLinkerVersion | |
| 10. DllCharacteristics | |
| 11. SizeOfStackReserve | |
| 12. LoadConfigurationSize | |
| 13. ResourcesMinSize | |
| 14. Subsystem | |
| 15. SizeOfCode | |
| 16. SectionsMeanVirtualsize | |
| 17. Machine | |
| 18. SizeOfImage | |
| 19. AddressOfEntryPoint | |
| 20. Characteristics | |
| 21. SizeOfOptionalHeader | |
| 22. ResourcesMaxSize | |
| 23. ResourcesMaxEntropy | |
| 24. ImportsNb | |
| 25. SectionsMaxRawsize | |
| 26. ExportNb | |
| 27. ImportsNbDLL | |
| 28. ResourcesMinEntropy | |
| 29. SectionMaxVirtualsize | |
| 30. SectionsMeanRawsize | |
| ``` | |
| These features and their normalization metadata are defined in the provided preprocessing file. :contentReference[oaicite:2]{index=2} | |
| ## Preprocessing | |
| Inputs must be preprocessed exactly as defined by the release preprocessing configuration. | |
| This model uses **per-feature min-max scaling** with a target **feature range of `[0, 1]`** across all 30 input dimensions. The preprocessing metadata includes: | |
| - feature names | |
| - per-feature `scale` | |
| - per-feature `min` | |
| - original `data_min` | |
| - original `data_max` | |
| - feature range | |
| - number of expected input features | |
| The preprocessing config explicitly states: | |
| - `n_features_in = 30` | |
| - `feature_range = [0, 1]` :contentReference[oaicite:3]{index=3} | |
| ### Important | |
| You should not reorder features, omit features, or substitute alternative telemetry fields without retraining or validating compatibility. Inference quality depends on preserving the exact training-time feature contract. | |
| ## Example preprocessing flow | |
| At inference time, the expected workflow is: | |
| 1. Extract the 30 raw features from the analyzed Windows PE sample. | |
| 2. Arrange them in the exact order listed above. | |
| 3. Apply the released min-max normalization parameters. | |
| 4. Feed the normalized vector into the quantized TFLite model. | |
| 5. Interpret the output score according to your downstream thresholding policy. | |
| ## Quick Python runner | |
| Below is a minimal Python example that loads `model.tflite` and `preprocess.json`, applies the released min-max scaling, and runs inference on a single feature vector. | |
| ```python | |
| import json | |
| import numpy as np | |
| import tensorflow as tf | |
| def load_preprocess(path="preprocess.json"): | |
| with open(path, "r", encoding="utf-8") as f: | |
| cfg = json.load(f) | |
| features = cfg["features"] | |
| scale = np.array(cfg["scale"], dtype=np.float32) | |
| min_offset = np.array(cfg["min"], dtype=np.float32) | |
| return features, scale, min_offset | |
| def preprocess_features(raw_features: dict, feature_order, scale, min_offset): | |
| missing = [name for name in feature_order if name not in raw_features] | |
| if missing: | |
| raise ValueError(f"Missing features: {missing}") | |
| x = np.array([raw_features[name] for name in feature_order], dtype=np.float32) | |
| x = x * scale + min_offset | |
| x = np.clip(x, 0.0, 1.0) | |
| # Shape: [1, 30] | |
| return np.expand_dims(x, axis=0).astype(np.float32) | |
| def run_inference(model_path="model.tflite", preprocess_path="preprocess.json", raw_features=None): | |
| if raw_features is None: | |
| raise ValueError("raw_features must be provided") | |
| feature_order, scale, min_offset = load_preprocess(preprocess_path) | |
| x = preprocess_features(raw_features, feature_order, scale, min_offset) | |
| interpreter = tf.lite.Interpreter(model_path=model_path) | |
| interpreter.allocate_tensors() | |
| input_details = interpreter.get_input_details() | |
| output_details = interpreter.get_output_details() | |
| input_index = input_details[0]["index"] | |
| output_index = output_details[0]["index"] | |
| input_dtype = input_details[0]["dtype"] | |
| output_dtype = output_details[0]["dtype"] | |
| if np.issubdtype(input_dtype, np.integer): | |
| q_scale, q_zero_point = input_details[0]["quantization"] | |
| if q_scale == 0: | |
| raise ValueError("Invalid input quantization scale") | |
| x_in = np.round(x / q_scale + q_zero_point).astype(input_dtype) | |
| else: | |
| x_in = x.astype(input_dtype) | |
| interpreter.set_tensor(input_index, x_in) | |
| interpreter.invoke() | |
| y = interpreter.get_tensor(output_index) | |
| # Dequantize output if needed | |
| if np.issubdtype(output_dtype, np.integer): | |
| q_scale, q_zero_point = output_details[0]["quantization"] | |
| if q_scale != 0: | |
| y = (y.astype(np.float32) - q_zero_point) * q_scale | |
| return y | |
| if __name__ == "__main__": | |
| sample = { | |
| "MajorImageVersion": 0, | |
| "MajorOperatingSystemVersion": 6, | |
| "MajorSubsystemVersion": 6, | |
| "ImageBase": 4194304, | |
| "MinorLinkerVersion": 25, | |
| "CheckSum": 0, | |
| "BaseOfData": 24576, | |
| "SectionsMaxEntropy": 6.12, | |
| "MajorLinkerVersion": 14, | |
| "DllCharacteristics": 34112, | |
| "SizeOfStackReserve": 1048576, | |
| "LoadConfigurationSize": 160, | |
| "ResourcesMinSize": 48, | |
| "Subsystem": 3, | |
| "SizeOfCode": 28160, | |
| "SectionsMeanVirtualsize": 8192, | |
| "Machine": 34404, | |
| "SizeOfImage": 126976, | |
| "AddressOfEntryPoint": 5272, | |
| "Characteristics": 258, | |
| "SizeOfOptionalHeader": 240, | |
| "ResourcesMaxSize": 4096, | |
| "ResourcesMaxEntropy": 4.21, | |
| "ImportsNb": 12, | |
| "SectionsMaxRawsize": 28672, | |
| "ExportNb": 0, | |
| "ImportsNbDLL": 3, | |
| "ResourcesMinEntropy": 1.37, | |
| "SectionMaxVirtualsize": 32768, | |
| "SectionsMeanRawsize": 6144, | |
| } | |
| output = run_inference( | |
| model_path="model.tflite", | |
| preprocess_path="preprocess.json", | |
| raw_features=sample, | |
| ) | |
| print("Model output:", output) | |
| ``` | |
| ## Intended use | |
| AURA Q1 is intended for: | |
| - defensive security research | |
| - malware analysis experimentation | |
| - academic and educational use | |
| - benchmarking tabular security ML pipelines | |
| - lightweight inference deployments | |
| ## Out-of-scope use | |
| This release is **not** intended to be used as: | |
| - a standalone malware verdict engine without analyst oversight | |
| - a replacement for sandboxing, reverse engineering, or signature-based detection | |
| - a guarantee of maliciousness or benignness | |
| - a production enforcement control without independent validation | |
| ## Limitations | |
| Users should evaluate the model carefully in their own environment. Key limitations include: | |
| - performance depends heavily on feature extraction quality | |
| - distribution shift can reduce reliability | |
| - adversarial adaptation is possible | |
| - score calibration may not transfer across datasets | |
| - quantization can introduce small accuracy differences relative to full-precision variants | |
| - security telemetry definitions may vary across tooling stacks | |
| ## Bias, risk, and security considerations | |
| Security ML systems can produce both false positives and false negatives. AURA Q1 should be used as a **decision-support signal**, not as a sole source of truth. | |
| Potential risks include: | |
| - benign software being flagged incorrectly | |
| - malicious software evading classification | |
| - degraded performance on underrepresented file families | |
| - misuse in overly automated blocking pipelines | |
| Human review and layered security controls are recommended. | |
| ## Reproducibility notes | |
| To reproduce inference correctly, use: | |
| - the exact feature order released here | |
| - the exact normalization metadata in `preprocess.json` | |
| - the quantized TFLite model artifact included in the repository | |
| Any mismatch between extracted telemetry and the expected schema may invalidate outputs. :contentReference[oaicite:4]{index=4} | |
| ## Output | |
| This is a classification model that returns a prediction score or class output depending on the runtime wrapper used around the TFLite artifact. | |
| You should document your own: | |
| - output tensor interpretation | |
| - class mapping | |
| - threshold policy | |
| - confidence handling | |
| if you package this model inside a larger application. | |
| ## Citation | |
| If you use AURA Q1 in research, evaluation, teaching, or derivative work, please cite this repository and retain attribution under the repository license. | |
| ## License | |
| This model card is released under: | |
| **CC-BY-4.0** | |
| Please review the repository license terms before redistribution or derivative use. | |
| ## Disclaimer | |
| AURA Q1 is provided for research, educational, and defensive purposes. It is offered as-is, without warranty, and should be validated thoroughly before any operational use. |