ConstantQJ commited on
Commit
d29d7c8
Β·
verified Β·
1 Parent(s): 9354822

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +229 -0
  2. model_edge.onnx +3 -0
  3. model_edge_3class.onnx +3 -0
  4. model_mobile.onnx +3 -0
README.md ADDED
@@ -0,0 +1,229 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - de
6
+ - fr
7
+ - es
8
+ - zh
9
+ - ja
10
+ library_name: onnxruntime
11
+ pipeline_tag: text-classification
12
+ tags:
13
+ - sentiment-analysis
14
+ - edge-ai
15
+ - tinyml
16
+ - knowledge-distillation
17
+ - onnx
18
+ - int8
19
+ - quantized
20
+ - microcontroller
21
+ - nlp
22
+ datasets:
23
+ - glue
24
+ - sst2
25
+ metrics:
26
+ - accuracy
27
+ - f1
28
+ model-index:
29
+ - name: aure-edge-sentiment
30
+ results:
31
+ - task:
32
+ type: text-classification
33
+ name: Sentiment Analysis
34
+ dataset:
35
+ type: glue
36
+ name: SST-2
37
+ split: validation
38
+ metrics:
39
+ - type: accuracy
40
+ value: 83.03
41
+ - type: f1
42
+ value: 0.830
43
+ ---
44
+
45
+ # Aure Edge β€” 1.46 MB Sentiment Analysis for Edge Devices
46
+
47
+ A **288x compressed** sentiment classifier distilled from BERT. Runs on microcontrollers, mobile devices, and edge hardware with **0.14ms inference latency**.
48
+
49
+ | Metric | Value |
50
+ |--------|-------|
51
+ | **Accuracy** | 83.03% (SST-2) |
52
+ | **F1** | 0.830 |
53
+ | **Model Size** | 1.46 MB (INT8 quantized) |
54
+ | **Parameters** | 383,618 |
55
+ | **Inference** | 0.14ms (ONNX Runtime, CPU) |
56
+ | **Compression** | 288x vs. BERT teacher (420 MB) |
57
+ | **Teacher Accuracy** | 92.32% |
58
+
59
+ ## Quick Start
60
+
61
+ ```python
62
+ import onnxruntime as ort
63
+ import numpy as np
64
+
65
+ # Load model
66
+ session = ort.InferenceSession("model_edge.onnx")
67
+
68
+ # Tokenize (simple whitespace + vocabulary lookup)
69
+ # For production use: pip install aure
70
+ from aure import Aure
71
+ model = Aure("edge")
72
+ result = model.predict("I love this product!")
73
+ print(result) # SentimentResult(label='positive', score=0.91)
74
+ ```
75
+
76
+ ### Standalone ONNX Inference (No Dependencies)
77
+
78
+ ```python
79
+ import onnxruntime as ort
80
+ import numpy as np
81
+
82
+ session = ort.InferenceSession("model_edge.onnx")
83
+
84
+ # Input: token IDs as int64 array, shape [batch_size, seq_length]
85
+ # Max sequence length: 128
86
+ # Vocabulary: pruned to 10,907 tokens (from BERT's 30,522)
87
+ input_ids = np.array([[101, 1045, 2293, 2023, 3185, 999, 102]], dtype=np.int64)
88
+
89
+ logits = session.run(None, {"input_ids": input_ids})[0]
90
+
91
+ # Softmax
92
+ exp = np.exp(logits - logits.max(axis=1, keepdims=True))
93
+ probs = exp / exp.sum(axis=1, keepdims=True)
94
+
95
+ labels = ["negative", "positive"]
96
+ pred = labels[np.argmax(probs[0])]
97
+ confidence = float(probs[0].max())
98
+ print(f"{pred} ({confidence:.1%})")
99
+ ```
100
+
101
+ ## Architecture
102
+
103
+ **NanoCNN** β€” a compact convolutional architecture optimized for sub-2MB deployment:
104
+
105
+ - **Embedding**: 32-dimensional, pruned vocabulary (10,907 tokens)
106
+ - **Convolutions**: 4 parallel Conv1d banks (filter sizes 2, 3, 4, 5), 64 filters each
107
+ - **Compression**: Linear bottleneck (256 β†’ 16)
108
+ - **Classifier**: 16 β†’ 48 β†’ 2 (with dropout 0.3)
109
+ - **Quantization**: INT8 (post-training, ONNX)
110
+
111
+ ## Distillation Pipeline
112
+
113
+ Distilled from a BERT-base-uncased teacher through systematic experimentation:
114
+
115
+ ```
116
+ BERT Teacher (92.32%, 420 MB)
117
+ β†’ Knowledge Distillation (T=6.39, Ξ±=0.69)
118
+ β†’ NanoCNN Student (83.03%, 1.46 MB)
119
+ ```
120
+
121
+ Key distillation parameters (optimized via Optuna, 20 trials):
122
+ - Temperature: 6.39
123
+ - Distillation weight (Ξ±): 0.69
124
+ - Learning rate: 2e-3
125
+ - Epochs: 30
126
+ - Batch size: 128
127
+
128
+ ## Ablation Results
129
+
130
+ We tested multiple compression approaches. Linear projection consistently won:
131
+
132
+ ### Teacher Compression (on BERT)
133
+
134
+ | Method | Accuracy | Params |
135
+ |--------|----------|--------|
136
+ | **Linear** | **92.32%** | 49K |
137
+ | Graph Laplacian | 92.20% | 639K |
138
+ | MLP (2-layer) | 92.09% | 213K |
139
+
140
+ ### Student Compression (NanoCNN)
141
+
142
+ | Method | FP32 Accuracy | INT8 Accuracy | Size |
143
+ |--------|--------------|---------------|------|
144
+ | **Linear** | 82.04% | **83.03%** | **1.46 MB** |
145
+ | MLP | 81.54% | 82.11% | 1.47 MB |
146
+ | Spectral | 81.15% | 82.00% | 1.48 MB |
147
+
148
+ ### Architecture Comparison
149
+
150
+ | Model | Accuracy | Size | Compression |
151
+ |-------|----------|------|-------------|
152
+ | BERT Teacher | 92.32% | 420 MB | 1x |
153
+ | CNN Large | 83.94% | 31.8 MB | 13x |
154
+ | CNN TinyML | 83.14% | 3.4 MB | 124x |
155
+ | **NanoCNN INT8** | **83.03%** | **1.46 MB** | **288x** |
156
+ | Tiny Transformer | 80.16% | 6.4 MB | 66x |
157
+
158
+ The transformer student performs worse despite 4x more parameters, confirming CNN inductive biases are better suited to small-scale text classification.
159
+
160
+ ## Multilingual Support
161
+
162
+ The Aure SDK supports 6 languages. Non-English models are downloaded on first use:
163
+
164
+ ```python
165
+ from aure import Aure
166
+
167
+ # German
168
+ model = Aure("edge", lang="de")
169
+ model.predict("Das ist wunderbar!") # positive
170
+
171
+ # Japanese
172
+ model = Aure("edge", lang="ja")
173
+ model.predict("η΄ ζ™΄γ‚‰γ—γ„ζ˜ η”»γ§γ—γŸ") # positive
174
+
175
+ # French, Spanish, Chinese also supported
176
+ ```
177
+
178
+ Supported: `en`, `de`, `fr`, `es`, `zh`, `ja`
179
+
180
+ ## Model Variants
181
+
182
+ | Variant | File | Size | Accuracy | Use Case |
183
+ |---------|------|------|----------|----------|
184
+ | **Edge** (this model) | `model_edge.onnx` | 1.46 MB | 83.03% | MCUs, wearables, IoT |
185
+ | Edge 3-Class | `model_edge_3class.onnx` | 1.47 MB | ~82% | Pos/neutral/neg classification |
186
+ | Mobile | `model_mobile.onnx` | 4.0 MB | 83% | Mobile apps, Raspberry Pi |
187
+
188
+ ## Hardware Targets
189
+
190
+ Tested on:
191
+ - **NVIDIA Jetson Nano** β€” 0.08ms inference
192
+ - **Raspberry Pi 4** β€” 0.9ms inference
193
+ - **x86 CPU** (i7) β€” 0.14ms inference
194
+ - **ARM Cortex-M7** (STM32H7) β€” target <10ms (ONNX Micro Runtime)
195
+
196
+ ## Training Details
197
+
198
+ - **Dataset**: SST-2 (Stanford Sentiment Treebank, binary), 67,349 train / 872 validation
199
+ - **Teacher**: BERT-base-uncased + linear compression head, fine-tuned 12 epochs
200
+ - **Hardware**: NVIDIA RTX 4090 Laptop GPU (16 GB), Windows 11
201
+ - **Framework**: PyTorch 2.x β†’ ONNX export β†’ INT8 quantization
202
+ - **Reproducibility**: 5-seed evaluation with standard deviations reported
203
+
204
+ ## Negative Results (Published for Transparency)
205
+
206
+ 1. **Graph Laplacian spectral compression provides no benefit** over linear projection at either teacher or student level
207
+ 2. **Progressive distillation** (BERT β†’ DistilBERT β†’ Student) does not improve student quality vs. direct distillation
208
+ 3. **Transformer students perform worse than CNN students** at sub-2MB scale despite using 4x more parameters
209
+
210
+ ## Citation
211
+
212
+ ```bibtex
213
+ @misc{constantone2026aure,
214
+ title={Aure: Pareto-Optimal Knowledge Distillation for Sub-2MB Sentiment Classification},
215
+ author={ConstantOne AI},
216
+ year={2026},
217
+ url={https://huggingface.co/ConstantQJ/aure-edge-sentiment}
218
+ }
219
+ ```
220
+
221
+ ## License
222
+
223
+ Apache 2.0 β€” use freely in commercial and non-commercial projects.
224
+
225
+ ## Links
226
+
227
+ - [ConstantOne AI](https://constantone.ai)
228
+ - [API Documentation](https://constantone.ai/docs.html)
229
+ - [Technical Report](https://constantone.ai/math.html)
model_edge.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e521ffa720d22fdc6073b3c0ce4ea600cf15601fdd1e6f6e249334ced0fa424f
3
+ size 1542385
model_edge_3class.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:53c8e99b8d766219dd8e49917f98003e08fddb246452eea547bfd5f7566f5a16
3
+ size 1541498
model_mobile.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:373156649e2d061ddf1f8a7b0b07fbb5a87e8f1f5555ad2ce86c2381fe281fbf
3
+ size 4186084