File size: 6,507 Bytes
d29d7c8 02cac04 d29d7c8 02cac04 d29d7c8 02cac04 d29d7c8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
---
license: apache-2.0
language:
- en
- de
- fr
- es
- zh
- ja
library_name: onnxruntime
pipeline_tag: text-classification
tags:
- sentiment-analysis
- edge-ai
- tinyml
- knowledge-distillation
- onnx
- int8
- quantized
- microcontroller
- nlp
datasets:
- glue
- sst2
metrics:
- accuracy
- f1
model-index:
- name: constant-edge-0.5
results:
- task:
type: text-classification
name: Sentiment Analysis
dataset:
type: glue
name: SST-2
split: validation
metrics:
- type: accuracy
value: 83.03
- type: f1
value: 0.830
---
# Constant Edge 0.5 β 1.46 MB Sentiment Analysis for Edge Devices
A **288x compressed** sentiment classifier distilled from BERT. Runs on microcontrollers, mobile devices, and edge hardware with **0.14ms inference latency**.
| Metric | Value |
|--------|-------|
| **Accuracy** | 83.03% (SST-2) |
| **F1** | 0.830 |
| **Model Size** | 1.46 MB (INT8 quantized) |
| **Parameters** | 383,618 |
| **Inference** | 0.14ms (ONNX Runtime, CPU) |
| **Compression** | 288x vs. BERT teacher (420 MB) |
| **Teacher Accuracy** | 92.32% |
## Quick Start
```python
import onnxruntime as ort
import numpy as np
# Load model
session = ort.InferenceSession("model_edge.onnx")
# Tokenize (simple whitespace + vocabulary lookup)
# For production use: pip install aure
from aure import Aure
model = Aure("edge")
result = model.predict("I love this product!")
print(result) # SentimentResult(label='positive', score=0.91)
```
### Standalone ONNX Inference (No Dependencies)
```python
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model_edge.onnx")
# Input: token IDs as int64 array, shape [batch_size, seq_length]
# Max sequence length: 128
# Vocabulary: pruned to 10,907 tokens (from BERT's 30,522)
input_ids = np.array([[101, 1045, 2293, 2023, 3185, 999, 102]], dtype=np.int64)
logits = session.run(None, {"input_ids": input_ids})[0]
# Softmax
exp = np.exp(logits - logits.max(axis=1, keepdims=True))
probs = exp / exp.sum(axis=1, keepdims=True)
labels = ["negative", "positive"]
pred = labels[np.argmax(probs[0])]
confidence = float(probs[0].max())
print(f"{pred} ({confidence:.1%})")
```
## Architecture
**NanoCNN** β a compact convolutional architecture optimized for sub-2MB deployment:
- **Embedding**: 32-dimensional, pruned vocabulary (10,907 tokens)
- **Convolutions**: 4 parallel Conv1d banks (filter sizes 2, 3, 4, 5), 64 filters each
- **Compression**: Linear bottleneck (256 β 16)
- **Classifier**: 16 β 48 β 2 (with dropout 0.3)
- **Quantization**: INT8 (post-training, ONNX)
## Distillation Pipeline
Distilled from a BERT-base-uncased teacher through systematic experimentation:
```
BERT Teacher (92.32%, 420 MB)
β Knowledge Distillation (T=6.39, Ξ±=0.69)
β NanoCNN Student (83.03%, 1.46 MB)
```
Key distillation parameters (optimized via Optuna, 20 trials):
- Temperature: 6.39
- Distillation weight (Ξ±): 0.69
- Learning rate: 2e-3
- Epochs: 30
- Batch size: 128
## Ablation Results
We tested multiple compression approaches. Linear projection consistently won:
### Teacher Compression (on BERT)
| Method | Accuracy | Params |
|--------|----------|--------|
| **Linear** | **92.32%** | 49K |
| Graph Laplacian | 92.20% | 639K |
| MLP (2-layer) | 92.09% | 213K |
### Student Compression (NanoCNN)
| Method | FP32 Accuracy | INT8 Accuracy | Size |
|--------|--------------|---------------|------|
| **Linear** | 82.04% | **83.03%** | **1.46 MB** |
| MLP | 81.54% | 82.11% | 1.47 MB |
| Spectral | 81.15% | 82.00% | 1.48 MB |
### Architecture Comparison
| Model | Accuracy | Size | Compression |
|-------|----------|------|-------------|
| BERT Teacher | 92.32% | 420 MB | 1x |
| CNN Large | 83.94% | 31.8 MB | 13x |
| CNN TinyML | 83.14% | 3.4 MB | 124x |
| **NanoCNN INT8** | **83.03%** | **1.46 MB** | **288x** |
| Tiny Transformer | 80.16% | 6.4 MB | 66x |
The transformer student performs worse despite 4x more parameters, confirming CNN inductive biases are better suited to small-scale text classification.
## Multilingual Support
The Aure SDK supports 6 languages. Non-English models are downloaded on first use:
```python
from aure import Aure
# German
model = Aure("edge", lang="de")
model.predict("Das ist wunderbar!") # positive
# Japanese
model = Aure("edge", lang="ja")
model.predict("η΄ ζ΄γγγζ η»γ§γγ") # positive
# French, Spanish, Chinese also supported
```
Supported: `en`, `de`, `fr`, `es`, `zh`, `ja`
## Model Variants
| Variant | File | Size | Accuracy | Use Case |
|---------|------|------|----------|----------|
| **Edge** (this model) | `model_edge.onnx` | 1.46 MB | 83.03% | MCUs, wearables, IoT |
| Edge 3-Class | `model_edge_3class.onnx` | 1.47 MB | ~82% | Pos/neutral/neg classification |
| Mobile | `model_mobile.onnx` | 4.0 MB | 83% | Mobile apps, Raspberry Pi |
## Hardware Targets
Tested on:
- **NVIDIA Jetson Nano** β 0.08ms inference
- **Raspberry Pi 4** β 0.9ms inference
- **x86 CPU** (i7) β 0.14ms inference
- **ARM Cortex-M7** (STM32H7) β target <10ms (ONNX Micro Runtime)
## Training Details
- **Dataset**: SST-2 (Stanford Sentiment Treebank, binary), 67,349 train / 872 validation
- **Teacher**: BERT-base-uncased + linear compression head, fine-tuned 12 epochs
- **Hardware**: NVIDIA RTX 4090 Laptop GPU (16 GB), Windows 11
- **Framework**: PyTorch 2.x β ONNX export β INT8 quantization
- **Reproducibility**: 5-seed evaluation with standard deviations reported
## Negative Results (Published for Transparency)
1. **Graph Laplacian spectral compression provides no benefit** over linear projection at either teacher or student level
2. **Progressive distillation** (BERT β DistilBERT β Student) does not improve student quality vs. direct distillation
3. **Transformer students perform worse than CNN students** at sub-2MB scale despite using 4x more parameters
## Citation
```bibtex
@misc{constantone2026aure,
title={Aure: Pareto-Optimal Knowledge Distillation for Sub-2MB Sentiment Classification},
author={ConstantOne AI},
year={2026},
url={https://huggingface.co/ConstantQJ/constant-edge-0.5}
}
```
## License
Apache 2.0 β use freely in commercial and non-commercial projects.
## Links
- [ConstantOne AI](https://constantone.ai)
- [API Documentation](https://constantone.ai/docs.html)
- [Technical Report](https://constantone.ai/math.html)
|