PerkinsFund commited on
Commit
2a3b9dd
·
verified ·
1 Parent(s): 8703d87

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -78
README.md CHANGED
@@ -5,7 +5,6 @@ tags:
5
  - malware
6
  - cybersecurity
7
  - pe-files
8
- - elf-files
9
  - binary-classification
10
  - tabular-data
11
  - threat-intelligence
@@ -18,7 +17,6 @@ tags:
18
  - mitre-attack
19
  - mitre-mbc
20
  - windows
21
- - linux
22
  - executable-files
23
  - static-analysis
24
  - behavioral-analysis
@@ -41,15 +39,15 @@ model_name: AURA Q1
41
 
42
  # AURA Q1
43
 
44
- **AURA Q1** is the free, quantized release of the AURA malware classification model family.
45
 
46
- It is designed for efficient inference on structured telemetry extracted from executable artifacts and mobile packages, with a focus on lightweight deployment, fast scoring, and reproducible preprocessing.
47
 
48
  This repository contains a quantized model artifact intended for **inference only**.
49
 
50
  ## What this model does
51
 
52
- AURA Q1 performs **tabular binary classification** for security-oriented file analysis workflows. It is intended for use in research, education, prototyping, and defensive experimentation where users want a compact model that can score extracted static features.
53
 
54
  Depending on your surrounding pipeline, the model can support workflows involving:
55
 
@@ -72,9 +70,9 @@ Primary characteristics:
72
 
73
  ## Full version availability
74
 
75
- **AURA Q1** is the free quantized release of AURA for lightweight local and edge inference.
76
 
77
- If you want to use the full version of **AURA** in a broader analysis workflow, it is available through **Traceix**:
78
 
79
  **Traceix:** https://traceix.com
80
 
@@ -84,44 +82,44 @@ Traceix is operated by **PCEF (Perkins Cybersecurity Educational Fund)**, a 501(
84
 
85
  ## Input schema
86
 
87
- AURA Q1 expects **30 numeric input features** in a fixed order. The preprocessing configuration attached to this release defines the exact feature list and normalization parameters.
88
 
89
  ### Feature order
90
 
91
  ```text
92
- 1. CertificatesNb
93
- 2. CertificatesMinEntropy
94
- 3. CertificatesMeanSize
95
- 4. CertificatesMaxEntropy
96
- 5. CertificatesMinSize
97
- 6. CertificatesMeanEntropy
98
- 7. CertificatesMaxSize
99
- 8. Providers
100
- 9. FilesMaxSize
101
- 10. DexFilesMaxSize
102
- 11. FilesNb
103
- 12. APKSize
104
- 13. ResourcesNb
105
- 14. DexFilesNb
106
- 15. FilesMinEntropy
107
- 16. DexFilesMeanSize
108
- 17. Services
109
- 18. FilesMeanSize
110
- 19. FilesMinSize
111
- 20. Permissions
112
- 21. Receivers
113
- 22. Activities
114
- 23. ResourcesMinEntropy
115
- 24. ResourcesMeanSize
116
- 25. ResourcesMaxEntropy
117
- 26. ResourcesMaxSize
118
- 27. ResourcesMeanEntropy
119
- 28. DexFilesMinEntropy
120
- 29. DexFilesMaxEntropy
121
- 30. DexFilesMinSize
122
- ````
123
-
124
- These features and their normalization metadata are defined in the provided preprocessing file.
125
 
126
  ## Preprocessing
127
 
@@ -129,18 +127,18 @@ Inputs must be preprocessed exactly as defined by the release preprocessing conf
129
 
130
  This model uses **per-feature min-max scaling** with a target **feature range of `[0, 1]`** across all 30 input dimensions. The preprocessing metadata includes:
131
 
132
- * feature names
133
- * per-feature `scale`
134
- * per-feature `min`
135
- * original `data_min`
136
- * original `data_max`
137
- * feature range
138
- * number of expected input features
139
 
140
  The preprocessing config explicitly states:
141
 
142
- * `n_features_in = 30`
143
- * `feature_range = [0, 1]`
144
 
145
  ### Important
146
 
@@ -150,7 +148,7 @@ You should not reorder features, omit features, or substitute alternative teleme
150
 
151
  At inference time, the expected workflow is:
152
 
153
- 1. Extract the 30 raw features from the analyzed sample.
154
  2. Arrange them in the exact order listed above.
155
  3. Apply the released min-max normalization parameters.
156
  4. Feed the normalized vector into the quantized TFLite model.
@@ -160,31 +158,31 @@ At inference time, the expected workflow is:
160
 
161
  AURA Q1 is intended for:
162
 
163
- * defensive security research
164
- * malware analysis experimentation
165
- * academic and educational use
166
- * benchmarking tabular security ML pipelines
167
- * lightweight inference deployments
168
 
169
  ## Out-of-scope use
170
 
171
  This release is **not** intended to be used as:
172
 
173
- * a standalone malware verdict engine without analyst oversight
174
- * a replacement for sandboxing, reverse engineering, or signature-based detection
175
- * a guarantee of maliciousness or benignness
176
- * a production enforcement control without independent validation
177
 
178
  ## Limitations
179
 
180
  Users should evaluate the model carefully in their own environment. Key limitations include:
181
 
182
- * performance depends heavily on feature extraction quality
183
- * distribution shift can reduce reliability
184
- * adversarial adaptation is possible
185
- * score calibration may not transfer across datasets
186
- * quantization can introduce small accuracy differences relative to full-precision variants
187
- * security telemetry definitions may vary across tooling stacks
188
 
189
  ## Bias, risk, and security considerations
190
 
@@ -192,10 +190,10 @@ Security ML systems can produce both false positives and false negatives. AURA Q
192
 
193
  Potential risks include:
194
 
195
- * benign software being flagged incorrectly
196
- * malicious software evading classification
197
- * degraded performance on underrepresented file families
198
- * misuse in overly automated blocking pipelines
199
 
200
  Human review and layered security controls are recommended.
201
 
@@ -203,11 +201,11 @@ Human review and layered security controls are recommended.
203
 
204
  To reproduce inference correctly, use:
205
 
206
- * the exact feature order released here
207
- * the exact normalization metadata in `preprocess.json`
208
- * the quantized TFLite model artifact included in the repository
209
 
210
- Any mismatch between extracted telemetry and the expected schema may invalidate outputs.
211
 
212
  ## Output
213
 
@@ -215,10 +213,10 @@ This is a classification model that returns a prediction score or class output d
215
 
216
  You should document your own:
217
 
218
- * output tensor interpretation
219
- * class mapping
220
- * threshold policy
221
- * confidence handling
222
 
223
  if you package this model inside a larger application.
224
 
 
5
  - malware
6
  - cybersecurity
7
  - pe-files
 
8
  - binary-classification
9
  - tabular-data
10
  - threat-intelligence
 
17
  - mitre-attack
18
  - mitre-mbc
19
  - windows
 
20
  - executable-files
21
  - static-analysis
22
  - behavioral-analysis
 
39
 
40
  # AURA Q1
41
 
42
+ **AURA Q1** is the free, quantized Windows release of the AURA malware classification model family.
43
 
44
+ It is designed for efficient inference on structured telemetry extracted from **Windows PE files**, with a focus on lightweight deployment, fast scoring, and reproducible preprocessing.
45
 
46
  This repository contains a quantized model artifact intended for **inference only**.
47
 
48
  ## What this model does
49
 
50
+ AURA Q1 performs **tabular binary classification** for Windows executable analysis workflows. It is intended for use in research, education, prototyping, and defensive experimentation where users want a compact model that can score extracted static PE features.
51
 
52
  Depending on your surrounding pipeline, the model can support workflows involving:
53
 
 
70
 
71
  ## Full version availability
72
 
73
+ **AURA Q1** is the free quantized Windows release of AURA for lightweight local and edge inference.
74
 
75
+ If you want to use the full version of **AURA** in a broader analysis workflow, including **Windows, Linux, and Android classification**, it is available through **Traceix**.
76
 
77
  **Traceix:** https://traceix.com
78
 
 
82
 
83
  ## Input schema
84
 
85
+ AURA Q1 expects **30 numeric input features** in a fixed order. The preprocessing configuration attached to this release defines the exact feature list and normalization parameters for the Windows model. :contentReference[oaicite:1]{index=1}
86
 
87
  ### Feature order
88
 
89
  ```text
90
+ 1. MajorImageVersion
91
+ 2. MajorOperatingSystemVersion
92
+ 3. MajorSubsystemVersion
93
+ 4. ImageBase
94
+ 5. MinorLinkerVersion
95
+ 6. CheckSum
96
+ 7. BaseOfData
97
+ 8. SectionsMaxEntropy
98
+ 9. MajorLinkerVersion
99
+ 10. DllCharacteristics
100
+ 11. SizeOfStackReserve
101
+ 12. LoadConfigurationSize
102
+ 13. ResourcesMinSize
103
+ 14. Subsystem
104
+ 15. SizeOfCode
105
+ 16. SectionsMeanVirtualsize
106
+ 17. Machine
107
+ 18. SizeOfImage
108
+ 19. AddressOfEntryPoint
109
+ 20. Characteristics
110
+ 21. SizeOfOptionalHeader
111
+ 22. ResourcesMaxSize
112
+ 23. ResourcesMaxEntropy
113
+ 24. ImportsNb
114
+ 25. SectionsMaxRawsize
115
+ 26. ExportNb
116
+ 27. ImportsNbDLL
117
+ 28. ResourcesMinEntropy
118
+ 29. SectionMaxVirtualsize
119
+ 30. SectionsMeanRawsize
120
+ ```
121
+
122
+ These features and their normalization metadata are defined in the provided preprocessing file. :contentReference[oaicite:2]{index=2}
123
 
124
  ## Preprocessing
125
 
 
127
 
128
  This model uses **per-feature min-max scaling** with a target **feature range of `[0, 1]`** across all 30 input dimensions. The preprocessing metadata includes:
129
 
130
+ - feature names
131
+ - per-feature `scale`
132
+ - per-feature `min`
133
+ - original `data_min`
134
+ - original `data_max`
135
+ - feature range
136
+ - number of expected input features
137
 
138
  The preprocessing config explicitly states:
139
 
140
+ - `n_features_in = 30`
141
+ - `feature_range = [0, 1]` :contentReference[oaicite:3]{index=3}
142
 
143
  ### Important
144
 
 
148
 
149
  At inference time, the expected workflow is:
150
 
151
+ 1. Extract the 30 raw features from the analyzed Windows PE sample.
152
  2. Arrange them in the exact order listed above.
153
  3. Apply the released min-max normalization parameters.
154
  4. Feed the normalized vector into the quantized TFLite model.
 
158
 
159
  AURA Q1 is intended for:
160
 
161
+ - defensive security research
162
+ - malware analysis experimentation
163
+ - academic and educational use
164
+ - benchmarking tabular security ML pipelines
165
+ - lightweight inference deployments
166
 
167
  ## Out-of-scope use
168
 
169
  This release is **not** intended to be used as:
170
 
171
+ - a standalone malware verdict engine without analyst oversight
172
+ - a replacement for sandboxing, reverse engineering, or signature-based detection
173
+ - a guarantee of maliciousness or benignness
174
+ - a production enforcement control without independent validation
175
 
176
  ## Limitations
177
 
178
  Users should evaluate the model carefully in their own environment. Key limitations include:
179
 
180
+ - performance depends heavily on feature extraction quality
181
+ - distribution shift can reduce reliability
182
+ - adversarial adaptation is possible
183
+ - score calibration may not transfer across datasets
184
+ - quantization can introduce small accuracy differences relative to full-precision variants
185
+ - security telemetry definitions may vary across tooling stacks
186
 
187
  ## Bias, risk, and security considerations
188
 
 
190
 
191
  Potential risks include:
192
 
193
+ - benign software being flagged incorrectly
194
+ - malicious software evading classification
195
+ - degraded performance on underrepresented file families
196
+ - misuse in overly automated blocking pipelines
197
 
198
  Human review and layered security controls are recommended.
199
 
 
201
 
202
  To reproduce inference correctly, use:
203
 
204
+ - the exact feature order released here
205
+ - the exact normalization metadata in `preprocess.json`
206
+ - the quantized TFLite model artifact included in the repository
207
 
208
+ Any mismatch between extracted telemetry and the expected schema may invalidate outputs. :contentReference[oaicite:4]{index=4}
209
 
210
  ## Output
211
 
 
213
 
214
  You should document your own:
215
 
216
+ - output tensor interpretation
217
+ - class mapping
218
+ - threshold policy
219
+ - confidence handling
220
 
221
  if you package this model inside a larger application.
222