Tabular Classification
LiteRT
malware
cybersecurity
pe-files
binary-classification
tabular-data
threat-intelligence
digital-forensics
reverse-engineering
incident-response
security-telemetry
ai-security
security-ml
mitre-attack
mitre-mbc
windows
executable-files
static-analysis
behavioral-analysis
classification
anomaly-detection
intrusion-detection
explainable-ai
model-evaluation
benchmarking
training
evaluation
research
education
teaching
quantized
edge-inference
Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,6 @@ tags:
|
|
| 5 |
- malware
|
| 6 |
- cybersecurity
|
| 7 |
- pe-files
|
| 8 |
-
- elf-files
|
| 9 |
- binary-classification
|
| 10 |
- tabular-data
|
| 11 |
- threat-intelligence
|
|
@@ -18,7 +17,6 @@ tags:
|
|
| 18 |
- mitre-attack
|
| 19 |
- mitre-mbc
|
| 20 |
- windows
|
| 21 |
-
- linux
|
| 22 |
- executable-files
|
| 23 |
- static-analysis
|
| 24 |
- behavioral-analysis
|
|
@@ -41,15 +39,15 @@ model_name: AURA Q1
|
|
| 41 |
|
| 42 |
# AURA Q1
|
| 43 |
|
| 44 |
-
**AURA Q1** is the free, quantized release of the AURA malware classification model family.
|
| 45 |
|
| 46 |
-
It is designed for efficient inference on structured telemetry extracted from
|
| 47 |
|
| 48 |
This repository contains a quantized model artifact intended for **inference only**.
|
| 49 |
|
| 50 |
## What this model does
|
| 51 |
|
| 52 |
-
AURA Q1 performs **tabular binary classification** for
|
| 53 |
|
| 54 |
Depending on your surrounding pipeline, the model can support workflows involving:
|
| 55 |
|
|
@@ -72,9 +70,9 @@ Primary characteristics:
|
|
| 72 |
|
| 73 |
## Full version availability
|
| 74 |
|
| 75 |
-
**AURA Q1** is the free quantized release of AURA for lightweight local and edge inference.
|
| 76 |
|
| 77 |
-
If you want to use the full version of **AURA** in a broader analysis workflow, it is available through **Traceix**
|
| 78 |
|
| 79 |
**Traceix:** https://traceix.com
|
| 80 |
|
|
@@ -84,44 +82,44 @@ Traceix is operated by **PCEF (Perkins Cybersecurity Educational Fund)**, a 501(
|
|
| 84 |
|
| 85 |
## Input schema
|
| 86 |
|
| 87 |
-
AURA Q1 expects **30 numeric input features** in a fixed order. The preprocessing configuration attached to this release defines the exact feature list and normalization parameters.
|
| 88 |
|
| 89 |
### Feature order
|
| 90 |
|
| 91 |
```text
|
| 92 |
-
1.
|
| 93 |
-
2.
|
| 94 |
-
3.
|
| 95 |
-
4.
|
| 96 |
-
5.
|
| 97 |
-
6.
|
| 98 |
-
7.
|
| 99 |
-
8.
|
| 100 |
-
9.
|
| 101 |
-
10.
|
| 102 |
-
11.
|
| 103 |
-
12.
|
| 104 |
-
13.
|
| 105 |
-
14.
|
| 106 |
-
15.
|
| 107 |
-
16.
|
| 108 |
-
17.
|
| 109 |
-
18.
|
| 110 |
-
19.
|
| 111 |
-
20.
|
| 112 |
-
21.
|
| 113 |
-
22.
|
| 114 |
-
23.
|
| 115 |
-
24.
|
| 116 |
-
25.
|
| 117 |
-
26.
|
| 118 |
-
27.
|
| 119 |
-
28.
|
| 120 |
-
29.
|
| 121 |
-
30.
|
| 122 |
-
```
|
| 123 |
-
|
| 124 |
-
These features and their normalization metadata are defined in the provided preprocessing file.
|
| 125 |
|
| 126 |
## Preprocessing
|
| 127 |
|
|
@@ -129,18 +127,18 @@ Inputs must be preprocessed exactly as defined by the release preprocessing conf
|
|
| 129 |
|
| 130 |
This model uses **per-feature min-max scaling** with a target **feature range of `[0, 1]`** across all 30 input dimensions. The preprocessing metadata includes:
|
| 131 |
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
|
| 140 |
The preprocessing config explicitly states:
|
| 141 |
|
| 142 |
-
|
| 143 |
-
|
| 144 |
|
| 145 |
### Important
|
| 146 |
|
|
@@ -150,7 +148,7 @@ You should not reorder features, omit features, or substitute alternative teleme
|
|
| 150 |
|
| 151 |
At inference time, the expected workflow is:
|
| 152 |
|
| 153 |
-
1. Extract the 30 raw features from the analyzed sample.
|
| 154 |
2. Arrange them in the exact order listed above.
|
| 155 |
3. Apply the released min-max normalization parameters.
|
| 156 |
4. Feed the normalized vector into the quantized TFLite model.
|
|
@@ -160,31 +158,31 @@ At inference time, the expected workflow is:
|
|
| 160 |
|
| 161 |
AURA Q1 is intended for:
|
| 162 |
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
|
| 169 |
## Out-of-scope use
|
| 170 |
|
| 171 |
This release is **not** intended to be used as:
|
| 172 |
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
|
| 178 |
## Limitations
|
| 179 |
|
| 180 |
Users should evaluate the model carefully in their own environment. Key limitations include:
|
| 181 |
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
|
| 189 |
## Bias, risk, and security considerations
|
| 190 |
|
|
@@ -192,10 +190,10 @@ Security ML systems can produce both false positives and false negatives. AURA Q
|
|
| 192 |
|
| 193 |
Potential risks include:
|
| 194 |
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
|
| 200 |
Human review and layered security controls are recommended.
|
| 201 |
|
|
@@ -203,11 +201,11 @@ Human review and layered security controls are recommended.
|
|
| 203 |
|
| 204 |
To reproduce inference correctly, use:
|
| 205 |
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
|
| 210 |
-
Any mismatch between extracted telemetry and the expected schema may invalidate outputs.
|
| 211 |
|
| 212 |
## Output
|
| 213 |
|
|
@@ -215,10 +213,10 @@ This is a classification model that returns a prediction score or class output d
|
|
| 215 |
|
| 216 |
You should document your own:
|
| 217 |
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
|
| 223 |
if you package this model inside a larger application.
|
| 224 |
|
|
|
|
| 5 |
- malware
|
| 6 |
- cybersecurity
|
| 7 |
- pe-files
|
|
|
|
| 8 |
- binary-classification
|
| 9 |
- tabular-data
|
| 10 |
- threat-intelligence
|
|
|
|
| 17 |
- mitre-attack
|
| 18 |
- mitre-mbc
|
| 19 |
- windows
|
|
|
|
| 20 |
- executable-files
|
| 21 |
- static-analysis
|
| 22 |
- behavioral-analysis
|
|
|
|
| 39 |
|
| 40 |
# AURA Q1
|
| 41 |
|
| 42 |
+
**AURA Q1** is the free, quantized Windows release of the AURA malware classification model family.
|
| 43 |
|
| 44 |
+
It is designed for efficient inference on structured telemetry extracted from **Windows PE files**, with a focus on lightweight deployment, fast scoring, and reproducible preprocessing.
|
| 45 |
|
| 46 |
This repository contains a quantized model artifact intended for **inference only**.
|
| 47 |
|
| 48 |
## What this model does
|
| 49 |
|
| 50 |
+
AURA Q1 performs **tabular binary classification** for Windows executable analysis workflows. It is intended for use in research, education, prototyping, and defensive experimentation where users want a compact model that can score extracted static PE features.
|
| 51 |
|
| 52 |
Depending on your surrounding pipeline, the model can support workflows involving:
|
| 53 |
|
|
|
|
| 70 |
|
| 71 |
## Full version availability
|
| 72 |
|
| 73 |
+
**AURA Q1** is the free quantized Windows release of AURA for lightweight local and edge inference.
|
| 74 |
|
| 75 |
+
If you want to use the full version of **AURA** in a broader analysis workflow, including **Windows, Linux, and Android classification**, it is available through **Traceix**.
|
| 76 |
|
| 77 |
**Traceix:** https://traceix.com
|
| 78 |
|
|
|
|
| 82 |
|
| 83 |
## Input schema
|
| 84 |
|
| 85 |
+
AURA Q1 expects **30 numeric input features** in a fixed order. The preprocessing configuration attached to this release defines the exact feature list and normalization parameters for the Windows model. :contentReference[oaicite:1]{index=1}
|
| 86 |
|
| 87 |
### Feature order
|
| 88 |
|
| 89 |
```text
|
| 90 |
+
1. MajorImageVersion
|
| 91 |
+
2. MajorOperatingSystemVersion
|
| 92 |
+
3. MajorSubsystemVersion
|
| 93 |
+
4. ImageBase
|
| 94 |
+
5. MinorLinkerVersion
|
| 95 |
+
6. CheckSum
|
| 96 |
+
7. BaseOfData
|
| 97 |
+
8. SectionsMaxEntropy
|
| 98 |
+
9. MajorLinkerVersion
|
| 99 |
+
10. DllCharacteristics
|
| 100 |
+
11. SizeOfStackReserve
|
| 101 |
+
12. LoadConfigurationSize
|
| 102 |
+
13. ResourcesMinSize
|
| 103 |
+
14. Subsystem
|
| 104 |
+
15. SizeOfCode
|
| 105 |
+
16. SectionsMeanVirtualsize
|
| 106 |
+
17. Machine
|
| 107 |
+
18. SizeOfImage
|
| 108 |
+
19. AddressOfEntryPoint
|
| 109 |
+
20. Characteristics
|
| 110 |
+
21. SizeOfOptionalHeader
|
| 111 |
+
22. ResourcesMaxSize
|
| 112 |
+
23. ResourcesMaxEntropy
|
| 113 |
+
24. ImportsNb
|
| 114 |
+
25. SectionsMaxRawsize
|
| 115 |
+
26. ExportNb
|
| 116 |
+
27. ImportsNbDLL
|
| 117 |
+
28. ResourcesMinEntropy
|
| 118 |
+
29. SectionMaxVirtualsize
|
| 119 |
+
30. SectionsMeanRawsize
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
These features and their normalization metadata are defined in the provided preprocessing file. :contentReference[oaicite:2]{index=2}
|
| 123 |
|
| 124 |
## Preprocessing
|
| 125 |
|
|
|
|
| 127 |
|
| 128 |
This model uses **per-feature min-max scaling** with a target **feature range of `[0, 1]`** across all 30 input dimensions. The preprocessing metadata includes:
|
| 129 |
|
| 130 |
+
- feature names
|
| 131 |
+
- per-feature `scale`
|
| 132 |
+
- per-feature `min`
|
| 133 |
+
- original `data_min`
|
| 134 |
+
- original `data_max`
|
| 135 |
+
- feature range
|
| 136 |
+
- number of expected input features
|
| 137 |
|
| 138 |
The preprocessing config explicitly states:
|
| 139 |
|
| 140 |
+
- `n_features_in = 30`
|
| 141 |
+
- `feature_range = [0, 1]` :contentReference[oaicite:3]{index=3}
|
| 142 |
|
| 143 |
### Important
|
| 144 |
|
|
|
|
| 148 |
|
| 149 |
At inference time, the expected workflow is:
|
| 150 |
|
| 151 |
+
1. Extract the 30 raw features from the analyzed Windows PE sample.
|
| 152 |
2. Arrange them in the exact order listed above.
|
| 153 |
3. Apply the released min-max normalization parameters.
|
| 154 |
4. Feed the normalized vector into the quantized TFLite model.
|
|
|
|
| 158 |
|
| 159 |
AURA Q1 is intended for:
|
| 160 |
|
| 161 |
+
- defensive security research
|
| 162 |
+
- malware analysis experimentation
|
| 163 |
+
- academic and educational use
|
| 164 |
+
- benchmarking tabular security ML pipelines
|
| 165 |
+
- lightweight inference deployments
|
| 166 |
|
| 167 |
## Out-of-scope use
|
| 168 |
|
| 169 |
This release is **not** intended to be used as:
|
| 170 |
|
| 171 |
+
- a standalone malware verdict engine without analyst oversight
|
| 172 |
+
- a replacement for sandboxing, reverse engineering, or signature-based detection
|
| 173 |
+
- a guarantee of maliciousness or benignness
|
| 174 |
+
- a production enforcement control without independent validation
|
| 175 |
|
| 176 |
## Limitations
|
| 177 |
|
| 178 |
Users should evaluate the model carefully in their own environment. Key limitations include:
|
| 179 |
|
| 180 |
+
- performance depends heavily on feature extraction quality
|
| 181 |
+
- distribution shift can reduce reliability
|
| 182 |
+
- adversarial adaptation is possible
|
| 183 |
+
- score calibration may not transfer across datasets
|
| 184 |
+
- quantization can introduce small accuracy differences relative to full-precision variants
|
| 185 |
+
- security telemetry definitions may vary across tooling stacks
|
| 186 |
|
| 187 |
## Bias, risk, and security considerations
|
| 188 |
|
|
|
|
| 190 |
|
| 191 |
Potential risks include:
|
| 192 |
|
| 193 |
+
- benign software being flagged incorrectly
|
| 194 |
+
- malicious software evading classification
|
| 195 |
+
- degraded performance on underrepresented file families
|
| 196 |
+
- misuse in overly automated blocking pipelines
|
| 197 |
|
| 198 |
Human review and layered security controls are recommended.
|
| 199 |
|
|
|
|
| 201 |
|
| 202 |
To reproduce inference correctly, use:
|
| 203 |
|
| 204 |
+
- the exact feature order released here
|
| 205 |
+
- the exact normalization metadata in `preprocess.json`
|
| 206 |
+
- the quantized TFLite model artifact included in the repository
|
| 207 |
|
| 208 |
+
Any mismatch between extracted telemetry and the expected schema may invalidate outputs. :contentReference[oaicite:4]{index=4}
|
| 209 |
|
| 210 |
## Output
|
| 211 |
|
|
|
|
| 213 |
|
| 214 |
You should document your own:
|
| 215 |
|
| 216 |
+
- output tensor interpretation
|
| 217 |
+
- class mapping
|
| 218 |
+
- threshold policy
|
| 219 |
+
- confidence handling
|
| 220 |
|
| 221 |
if you package this model inside a larger application.
|
| 222 |
|