Update README.md

Browse files

Files changed (1) hide show

README.md +76 -78

README.md CHANGED Viewed

@@ -5,7 +5,6 @@ tags:
 - malware
 - cybersecurity
 - pe-files
-- elf-files
 - binary-classification
 - tabular-data
 - threat-intelligence
@@ -18,7 +17,6 @@ tags:
 - mitre-attack
 - mitre-mbc
 - windows
-- linux
 - executable-files
 - static-analysis
 - behavioral-analysis
@@ -41,15 +39,15 @@ model_name: AURA Q1
 # AURA Q1
-**AURA Q1** is the free, quantized release of the AURA malware classification model family.
-It is designed for efficient inference on structured telemetry extracted from executable artifacts and mobile packages, with a focus on lightweight deployment, fast scoring, and reproducible preprocessing.
 This repository contains a quantized model artifact intended for **inference only**.
 ## What this model does
-AURA Q1 performs **tabular binary classification** for security-oriented file analysis workflows. It is intended for use in research, education, prototyping, and defensive experimentation where users want a compact model that can score extracted static features.
 Depending on your surrounding pipeline, the model can support workflows involving:
@@ -72,9 +70,9 @@ Primary characteristics:
 ## Full version availability
-**AURA Q1** is the free quantized release of AURA for lightweight local and edge inference.
-If you want to use the full version of **AURA** in a broader analysis workflow, it is available through **Traceix**:
 **Traceix:** https://traceix.com
@@ -84,44 +82,44 @@ Traceix is operated by **PCEF (Perkins Cybersecurity Educational Fund)**, a 501(
 ## Input schema
-AURA Q1 expects **30 numeric input features** in a fixed order. The preprocessing configuration attached to this release defines the exact feature list and normalization parameters.
 ### Feature order
 ```text
-1.  CertificatesNb
-2.  CertificatesMinEntropy
-3.  CertificatesMeanSize
-4.  CertificatesMaxEntropy
-5.  CertificatesMinSize
-6.  CertificatesMeanEntropy
-7.  CertificatesMaxSize
-8.  Providers
-9.  FilesMaxSize
-10. DexFilesMaxSize
-11. FilesNb
-12. APKSize
-13. ResourcesNb
-14. DexFilesNb
-15. FilesMinEntropy
-16. DexFilesMeanSize
-17. Services
-18. FilesMeanSize
-19. FilesMinSize
-20. Permissions
-21. Receivers
-22. Activities
-23. ResourcesMinEntropy
-24. ResourcesMeanSize
-25. ResourcesMaxEntropy
-26. ResourcesMaxSize
-27. ResourcesMeanEntropy
-28. DexFilesMinEntropy
-29. DexFilesMaxEntropy
-30. DexFilesMinSize
-````
-These features and their normalization metadata are defined in the provided preprocessing file.
 ## Preprocessing
@@ -129,18 +127,18 @@ Inputs must be preprocessed exactly as defined by the release preprocessing conf
 This model uses **per-feature min-max scaling** with a target **feature range of `[0, 1]`** across all 30 input dimensions. The preprocessing metadata includes:
-* feature names
-* per-feature `scale`
-* per-feature `min`
-* original `data_min`
-* original `data_max`
-* feature range
-* number of expected input features
 The preprocessing config explicitly states:
-* `n_features_in = 30`
-* `feature_range = [0, 1]`
 ### Important
@@ -150,7 +148,7 @@ You should not reorder features, omit features, or substitute alternative teleme
 At inference time, the expected workflow is:
-1. Extract the 30 raw features from the analyzed sample.
 2. Arrange them in the exact order listed above.
 3. Apply the released min-max normalization parameters.
 4. Feed the normalized vector into the quantized TFLite model.
@@ -160,31 +158,31 @@ At inference time, the expected workflow is:
 AURA Q1 is intended for:
-* defensive security research
-* malware analysis experimentation
-* academic and educational use
-* benchmarking tabular security ML pipelines
-* lightweight inference deployments
 ## Out-of-scope use
 This release is **not** intended to be used as:
-* a standalone malware verdict engine without analyst oversight
-* a replacement for sandboxing, reverse engineering, or signature-based detection
-* a guarantee of maliciousness or benignness
-* a production enforcement control without independent validation
 ## Limitations
 Users should evaluate the model carefully in their own environment. Key limitations include:
-* performance depends heavily on feature extraction quality
-* distribution shift can reduce reliability
-* adversarial adaptation is possible
-* score calibration may not transfer across datasets
-* quantization can introduce small accuracy differences relative to full-precision variants
-* security telemetry definitions may vary across tooling stacks
 ## Bias, risk, and security considerations
@@ -192,10 +190,10 @@ Security ML systems can produce both false positives and false negatives. AURA Q
 Potential risks include:
-* benign software being flagged incorrectly
-* malicious software evading classification
-* degraded performance on underrepresented file families
-* misuse in overly automated blocking pipelines
 Human review and layered security controls are recommended.
@@ -203,11 +201,11 @@ Human review and layered security controls are recommended.
 To reproduce inference correctly, use:
-* the exact feature order released here
-* the exact normalization metadata in `preprocess.json`
-* the quantized TFLite model artifact included in the repository
-Any mismatch between extracted telemetry and the expected schema may invalidate outputs.
 ## Output
@@ -215,10 +213,10 @@ This is a classification model that returns a prediction score or class output d
 You should document your own:
-* output tensor interpretation
-* class mapping
-* threshold policy
-* confidence handling
 if you package this model inside a larger application.

 - malware
 - cybersecurity
 - pe-files
 - binary-classification
 - tabular-data
 - threat-intelligence
 - mitre-attack
 - mitre-mbc
 - windows
 - executable-files
 - static-analysis
 - behavioral-analysis
 # AURA Q1
+**AURA Q1** is the free, quantized Windows release of the AURA malware classification model family.
+It is designed for efficient inference on structured telemetry extracted from **Windows PE files**, with a focus on lightweight deployment, fast scoring, and reproducible preprocessing.
 This repository contains a quantized model artifact intended for **inference only**.
 ## What this model does
+AURA Q1 performs **tabular binary classification** for Windows executable analysis workflows. It is intended for use in research, education, prototyping, and defensive experimentation where users want a compact model that can score extracted static PE features.
 Depending on your surrounding pipeline, the model can support workflows involving:
 ## Full version availability
+**AURA Q1** is the free quantized Windows release of AURA for lightweight local and edge inference.
+If you want to use the full version of **AURA** in a broader analysis workflow, including **Windows, Linux, and Android classification**, it is available through **Traceix**.
 **Traceix:** https://traceix.com
 ## Input schema
+AURA Q1 expects **30 numeric input features** in a fixed order. The preprocessing configuration attached to this release defines the exact feature list and normalization parameters for the Windows model. :contentReference[oaicite:1]{index=1}
 ### Feature order
 ```text
+1.  MajorImageVersion
+2.  MajorOperatingSystemVersion
+3.  MajorSubsystemVersion
+4.  ImageBase
+5.  MinorLinkerVersion
+6.  CheckSum
+7.  BaseOfData
+8.  SectionsMaxEntropy
+9.  MajorLinkerVersion
+10. DllCharacteristics
+11. SizeOfStackReserve
+12. LoadConfigurationSize
+13. ResourcesMinSize
+14. Subsystem
+15. SizeOfCode
+16. SectionsMeanVirtualsize
+17. Machine
+18. SizeOfImage
+19. AddressOfEntryPoint
+20. Characteristics
+21. SizeOfOptionalHeader
+22. ResourcesMaxSize
+23. ResourcesMaxEntropy
+24. ImportsNb
+25. SectionsMaxRawsize
+26. ExportNb
+27. ImportsNbDLL
+28. ResourcesMinEntropy
+29. SectionMaxVirtualsize
+30. SectionsMeanRawsize
+```
+These features and their normalization metadata are defined in the provided preprocessing file. :contentReference[oaicite:2]{index=2}
 ## Preprocessing
 This model uses **per-feature min-max scaling** with a target **feature range of `[0, 1]`** across all 30 input dimensions. The preprocessing metadata includes:
+- feature names
+- per-feature `scale`
+- per-feature `min`
+- original `data_min`
+- original `data_max`
+- feature range
+- number of expected input features
 The preprocessing config explicitly states:
+- `n_features_in = 30`
+- `feature_range = [0, 1]` :contentReference[oaicite:3]{index=3}
 ### Important
 At inference time, the expected workflow is:
+1. Extract the 30 raw features from the analyzed Windows PE sample.
 2. Arrange them in the exact order listed above.
 3. Apply the released min-max normalization parameters.
 4. Feed the normalized vector into the quantized TFLite model.
 AURA Q1 is intended for:
+- defensive security research
+- malware analysis experimentation
+- academic and educational use
+- benchmarking tabular security ML pipelines
+- lightweight inference deployments
 ## Out-of-scope use
 This release is **not** intended to be used as:
+- a standalone malware verdict engine without analyst oversight
+- a replacement for sandboxing, reverse engineering, or signature-based detection
+- a guarantee of maliciousness or benignness
+- a production enforcement control without independent validation
 ## Limitations
 Users should evaluate the model carefully in their own environment. Key limitations include:
+- performance depends heavily on feature extraction quality
+- distribution shift can reduce reliability
+- adversarial adaptation is possible
+- score calibration may not transfer across datasets
+- quantization can introduce small accuracy differences relative to full-precision variants
+- security telemetry definitions may vary across tooling stacks
 ## Bias, risk, and security considerations
 Potential risks include:
+- benign software being flagged incorrectly
+- malicious software evading classification
+- degraded performance on underrepresented file families
+- misuse in overly automated blocking pipelines
 Human review and layered security controls are recommended.
 To reproduce inference correctly, use:
+- the exact feature order released here
+- the exact normalization metadata in `preprocess.json`
+- the quantized TFLite model artifact included in the repository
+Any mismatch between extracted telemetry and the expected schema may invalidate outputs. :contentReference[oaicite:4]{index=4}
 ## Output
 You should document your own:
+- output tensor interpretation
+- class mapping
+- threshold policy
+- confidence handling
 if you package this model inside a larger application.