|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- de |
|
|
- fr |
|
|
- es |
|
|
- zh |
|
|
- ja |
|
|
library_name: onnxruntime |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- sentiment-analysis |
|
|
- edge-ai |
|
|
- tinyml |
|
|
- knowledge-distillation |
|
|
- onnx |
|
|
- int8 |
|
|
- quantized |
|
|
- microcontroller |
|
|
- nlp |
|
|
datasets: |
|
|
- glue |
|
|
- sst2 |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
model-index: |
|
|
- name: constant-edge-0.5 |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Sentiment Analysis |
|
|
dataset: |
|
|
type: glue |
|
|
name: SST-2 |
|
|
split: validation |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 83.03 |
|
|
- type: f1 |
|
|
value: 0.830 |
|
|
--- |
|
|
|
|
|
# Constant Edge 0.5 β 1.46 MB Sentiment Analysis for Edge Devices |
|
|
|
|
|
A **288x compressed** sentiment classifier distilled from BERT. Runs on microcontrollers, mobile devices, and edge hardware with **0.14ms inference latency**. |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Accuracy** | 83.03% (SST-2) | |
|
|
| **F1** | 0.830 | |
|
|
| **Model Size** | 1.46 MB (INT8 quantized) | |
|
|
| **Parameters** | 383,618 | |
|
|
| **Inference** | 0.14ms (ONNX Runtime, CPU) | |
|
|
| **Compression** | 288x vs. BERT teacher (420 MB) | |
|
|
| **Teacher Accuracy** | 92.32% | |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```python |
|
|
import onnxruntime as ort |
|
|
import numpy as np |
|
|
|
|
|
# Load model |
|
|
session = ort.InferenceSession("model_edge.onnx") |
|
|
|
|
|
# Tokenize (simple whitespace + vocabulary lookup) |
|
|
# For production use: pip install aure |
|
|
from aure import Aure |
|
|
model = Aure("edge") |
|
|
result = model.predict("I love this product!") |
|
|
print(result) # SentimentResult(label='positive', score=0.91) |
|
|
``` |
|
|
|
|
|
### Standalone ONNX Inference (No Dependencies) |
|
|
|
|
|
```python |
|
|
import onnxruntime as ort |
|
|
import numpy as np |
|
|
|
|
|
session = ort.InferenceSession("model_edge.onnx") |
|
|
|
|
|
# Input: token IDs as int64 array, shape [batch_size, seq_length] |
|
|
# Max sequence length: 128 |
|
|
# Vocabulary: pruned to 10,907 tokens (from BERT's 30,522) |
|
|
input_ids = np.array([[101, 1045, 2293, 2023, 3185, 999, 102]], dtype=np.int64) |
|
|
|
|
|
logits = session.run(None, {"input_ids": input_ids})[0] |
|
|
|
|
|
# Softmax |
|
|
exp = np.exp(logits - logits.max(axis=1, keepdims=True)) |
|
|
probs = exp / exp.sum(axis=1, keepdims=True) |
|
|
|
|
|
labels = ["negative", "positive"] |
|
|
pred = labels[np.argmax(probs[0])] |
|
|
confidence = float(probs[0].max()) |
|
|
print(f"{pred} ({confidence:.1%})") |
|
|
``` |
|
|
|
|
|
## Architecture |
|
|
|
|
|
**NanoCNN** β a compact convolutional architecture optimized for sub-2MB deployment: |
|
|
|
|
|
- **Embedding**: 32-dimensional, pruned vocabulary (10,907 tokens) |
|
|
- **Convolutions**: 4 parallel Conv1d banks (filter sizes 2, 3, 4, 5), 64 filters each |
|
|
- **Compression**: Linear bottleneck (256 β 16) |
|
|
- **Classifier**: 16 β 48 β 2 (with dropout 0.3) |
|
|
- **Quantization**: INT8 (post-training, ONNX) |
|
|
|
|
|
## Distillation Pipeline |
|
|
|
|
|
Distilled from a BERT-base-uncased teacher through systematic experimentation: |
|
|
|
|
|
``` |
|
|
BERT Teacher (92.32%, 420 MB) |
|
|
β Knowledge Distillation (T=6.39, Ξ±=0.69) |
|
|
β NanoCNN Student (83.03%, 1.46 MB) |
|
|
``` |
|
|
|
|
|
Key distillation parameters (optimized via Optuna, 20 trials): |
|
|
- Temperature: 6.39 |
|
|
- Distillation weight (Ξ±): 0.69 |
|
|
- Learning rate: 2e-3 |
|
|
- Epochs: 30 |
|
|
- Batch size: 128 |
|
|
|
|
|
## Ablation Results |
|
|
|
|
|
We tested multiple compression approaches. Linear projection consistently won: |
|
|
|
|
|
### Teacher Compression (on BERT) |
|
|
|
|
|
| Method | Accuracy | Params | |
|
|
|--------|----------|--------| |
|
|
| **Linear** | **92.32%** | 49K | |
|
|
| Graph Laplacian | 92.20% | 639K | |
|
|
| MLP (2-layer) | 92.09% | 213K | |
|
|
|
|
|
### Student Compression (NanoCNN) |
|
|
|
|
|
| Method | FP32 Accuracy | INT8 Accuracy | Size | |
|
|
|--------|--------------|---------------|------| |
|
|
| **Linear** | 82.04% | **83.03%** | **1.46 MB** | |
|
|
| MLP | 81.54% | 82.11% | 1.47 MB | |
|
|
| Spectral | 81.15% | 82.00% | 1.48 MB | |
|
|
|
|
|
### Architecture Comparison |
|
|
|
|
|
| Model | Accuracy | Size | Compression | |
|
|
|-------|----------|------|-------------| |
|
|
| BERT Teacher | 92.32% | 420 MB | 1x | |
|
|
| CNN Large | 83.94% | 31.8 MB | 13x | |
|
|
| CNN TinyML | 83.14% | 3.4 MB | 124x | |
|
|
| **NanoCNN INT8** | **83.03%** | **1.46 MB** | **288x** | |
|
|
| Tiny Transformer | 80.16% | 6.4 MB | 66x | |
|
|
|
|
|
The transformer student performs worse despite 4x more parameters, confirming CNN inductive biases are better suited to small-scale text classification. |
|
|
|
|
|
## Multilingual Support |
|
|
|
|
|
The Aure SDK supports 6 languages. Non-English models are downloaded on first use: |
|
|
|
|
|
```python |
|
|
from aure import Aure |
|
|
|
|
|
# German |
|
|
model = Aure("edge", lang="de") |
|
|
model.predict("Das ist wunderbar!") # positive |
|
|
|
|
|
# Japanese |
|
|
model = Aure("edge", lang="ja") |
|
|
model.predict("η΄ ζ΄γγγζ η»γ§γγ") # positive |
|
|
|
|
|
# French, Spanish, Chinese also supported |
|
|
``` |
|
|
|
|
|
Supported: `en`, `de`, `fr`, `es`, `zh`, `ja` |
|
|
|
|
|
## Model Variants |
|
|
|
|
|
| Variant | File | Size | Accuracy | Use Case | |
|
|
|---------|------|------|----------|----------| |
|
|
| **Edge** (this model) | `model_edge.onnx` | 1.46 MB | 83.03% | MCUs, wearables, IoT | |
|
|
| Edge 3-Class | `model_edge_3class.onnx` | 1.47 MB | ~82% | Pos/neutral/neg classification | |
|
|
| Mobile | `model_mobile.onnx` | 4.0 MB | 83% | Mobile apps, Raspberry Pi | |
|
|
|
|
|
## Hardware Targets |
|
|
|
|
|
Tested on: |
|
|
- **NVIDIA Jetson Nano** β 0.08ms inference |
|
|
- **Raspberry Pi 4** β 0.9ms inference |
|
|
- **x86 CPU** (i7) β 0.14ms inference |
|
|
- **ARM Cortex-M7** (STM32H7) β target <10ms (ONNX Micro Runtime) |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Dataset**: SST-2 (Stanford Sentiment Treebank, binary), 67,349 train / 872 validation |
|
|
- **Teacher**: BERT-base-uncased + linear compression head, fine-tuned 12 epochs |
|
|
- **Hardware**: NVIDIA RTX 4090 Laptop GPU (16 GB), Windows 11 |
|
|
- **Framework**: PyTorch 2.x β ONNX export β INT8 quantization |
|
|
- **Reproducibility**: 5-seed evaluation with standard deviations reported |
|
|
|
|
|
## Negative Results (Published for Transparency) |
|
|
|
|
|
1. **Graph Laplacian spectral compression provides no benefit** over linear projection at either teacher or student level |
|
|
2. **Progressive distillation** (BERT β DistilBERT β Student) does not improve student quality vs. direct distillation |
|
|
3. **Transformer students perform worse than CNN students** at sub-2MB scale despite using 4x more parameters |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{constantone2026aure, |
|
|
title={Aure: Pareto-Optimal Knowledge Distillation for Sub-2MB Sentiment Classification}, |
|
|
author={ConstantOne AI}, |
|
|
year={2026}, |
|
|
url={https://huggingface.co/ConstantQJ/constant-edge-0.5} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 β use freely in commercial and non-commercial projects. |
|
|
|
|
|
## Links |
|
|
|
|
|
- [ConstantOne AI](https://constantone.ai) |
|
|
- [API Documentation](https://constantone.ai/docs.html) |
|
|
- [Technical Report](https://constantone.ai/math.html) |
|
|
|