constant-edge-0.5 / README.md
ConstantQJ's picture
Upload README.md with huggingface_hub
02cac04 verified
---
license: apache-2.0
language:
- en
- de
- fr
- es
- zh
- ja
library_name: onnxruntime
pipeline_tag: text-classification
tags:
- sentiment-analysis
- edge-ai
- tinyml
- knowledge-distillation
- onnx
- int8
- quantized
- microcontroller
- nlp
datasets:
- glue
- sst2
metrics:
- accuracy
- f1
model-index:
- name: constant-edge-0.5
results:
- task:
type: text-classification
name: Sentiment Analysis
dataset:
type: glue
name: SST-2
split: validation
metrics:
- type: accuracy
value: 83.03
- type: f1
value: 0.830
---
# Constant Edge 0.5 β€” 1.46 MB Sentiment Analysis for Edge Devices
A **288x compressed** sentiment classifier distilled from BERT. Runs on microcontrollers, mobile devices, and edge hardware with **0.14ms inference latency**.
| Metric | Value |
|--------|-------|
| **Accuracy** | 83.03% (SST-2) |
| **F1** | 0.830 |
| **Model Size** | 1.46 MB (INT8 quantized) |
| **Parameters** | 383,618 |
| **Inference** | 0.14ms (ONNX Runtime, CPU) |
| **Compression** | 288x vs. BERT teacher (420 MB) |
| **Teacher Accuracy** | 92.32% |
## Quick Start
```python
import onnxruntime as ort
import numpy as np
# Load model
session = ort.InferenceSession("model_edge.onnx")
# Tokenize (simple whitespace + vocabulary lookup)
# For production use: pip install aure
from aure import Aure
model = Aure("edge")
result = model.predict("I love this product!")
print(result) # SentimentResult(label='positive', score=0.91)
```
### Standalone ONNX Inference (No Dependencies)
```python
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model_edge.onnx")
# Input: token IDs as int64 array, shape [batch_size, seq_length]
# Max sequence length: 128
# Vocabulary: pruned to 10,907 tokens (from BERT's 30,522)
input_ids = np.array([[101, 1045, 2293, 2023, 3185, 999, 102]], dtype=np.int64)
logits = session.run(None, {"input_ids": input_ids})[0]
# Softmax
exp = np.exp(logits - logits.max(axis=1, keepdims=True))
probs = exp / exp.sum(axis=1, keepdims=True)
labels = ["negative", "positive"]
pred = labels[np.argmax(probs[0])]
confidence = float(probs[0].max())
print(f"{pred} ({confidence:.1%})")
```
## Architecture
**NanoCNN** β€” a compact convolutional architecture optimized for sub-2MB deployment:
- **Embedding**: 32-dimensional, pruned vocabulary (10,907 tokens)
- **Convolutions**: 4 parallel Conv1d banks (filter sizes 2, 3, 4, 5), 64 filters each
- **Compression**: Linear bottleneck (256 β†’ 16)
- **Classifier**: 16 β†’ 48 β†’ 2 (with dropout 0.3)
- **Quantization**: INT8 (post-training, ONNX)
## Distillation Pipeline
Distilled from a BERT-base-uncased teacher through systematic experimentation:
```
BERT Teacher (92.32%, 420 MB)
β†’ Knowledge Distillation (T=6.39, Ξ±=0.69)
β†’ NanoCNN Student (83.03%, 1.46 MB)
```
Key distillation parameters (optimized via Optuna, 20 trials):
- Temperature: 6.39
- Distillation weight (Ξ±): 0.69
- Learning rate: 2e-3
- Epochs: 30
- Batch size: 128
## Ablation Results
We tested multiple compression approaches. Linear projection consistently won:
### Teacher Compression (on BERT)
| Method | Accuracy | Params |
|--------|----------|--------|
| **Linear** | **92.32%** | 49K |
| Graph Laplacian | 92.20% | 639K |
| MLP (2-layer) | 92.09% | 213K |
### Student Compression (NanoCNN)
| Method | FP32 Accuracy | INT8 Accuracy | Size |
|--------|--------------|---------------|------|
| **Linear** | 82.04% | **83.03%** | **1.46 MB** |
| MLP | 81.54% | 82.11% | 1.47 MB |
| Spectral | 81.15% | 82.00% | 1.48 MB |
### Architecture Comparison
| Model | Accuracy | Size | Compression |
|-------|----------|------|-------------|
| BERT Teacher | 92.32% | 420 MB | 1x |
| CNN Large | 83.94% | 31.8 MB | 13x |
| CNN TinyML | 83.14% | 3.4 MB | 124x |
| **NanoCNN INT8** | **83.03%** | **1.46 MB** | **288x** |
| Tiny Transformer | 80.16% | 6.4 MB | 66x |
The transformer student performs worse despite 4x more parameters, confirming CNN inductive biases are better suited to small-scale text classification.
## Multilingual Support
The Aure SDK supports 6 languages. Non-English models are downloaded on first use:
```python
from aure import Aure
# German
model = Aure("edge", lang="de")
model.predict("Das ist wunderbar!") # positive
# Japanese
model = Aure("edge", lang="ja")
model.predict("η΄ ζ™΄γ‚‰γ—γ„ζ˜ η”»γ§γ—γŸ") # positive
# French, Spanish, Chinese also supported
```
Supported: `en`, `de`, `fr`, `es`, `zh`, `ja`
## Model Variants
| Variant | File | Size | Accuracy | Use Case |
|---------|------|------|----------|----------|
| **Edge** (this model) | `model_edge.onnx` | 1.46 MB | 83.03% | MCUs, wearables, IoT |
| Edge 3-Class | `model_edge_3class.onnx` | 1.47 MB | ~82% | Pos/neutral/neg classification |
| Mobile | `model_mobile.onnx` | 4.0 MB | 83% | Mobile apps, Raspberry Pi |
## Hardware Targets
Tested on:
- **NVIDIA Jetson Nano** β€” 0.08ms inference
- **Raspberry Pi 4** β€” 0.9ms inference
- **x86 CPU** (i7) β€” 0.14ms inference
- **ARM Cortex-M7** (STM32H7) β€” target <10ms (ONNX Micro Runtime)
## Training Details
- **Dataset**: SST-2 (Stanford Sentiment Treebank, binary), 67,349 train / 872 validation
- **Teacher**: BERT-base-uncased + linear compression head, fine-tuned 12 epochs
- **Hardware**: NVIDIA RTX 4090 Laptop GPU (16 GB), Windows 11
- **Framework**: PyTorch 2.x β†’ ONNX export β†’ INT8 quantization
- **Reproducibility**: 5-seed evaluation with standard deviations reported
## Negative Results (Published for Transparency)
1. **Graph Laplacian spectral compression provides no benefit** over linear projection at either teacher or student level
2. **Progressive distillation** (BERT β†’ DistilBERT β†’ Student) does not improve student quality vs. direct distillation
3. **Transformer students perform worse than CNN students** at sub-2MB scale despite using 4x more parameters
## Citation
```bibtex
@misc{constantone2026aure,
title={Aure: Pareto-Optimal Knowledge Distillation for Sub-2MB Sentiment Classification},
author={ConstantOne AI},
year={2026},
url={https://huggingface.co/ConstantQJ/constant-edge-0.5}
}
```
## License
Apache 2.0 β€” use freely in commercial and non-commercial projects.
## Links
- [ConstantOne AI](https://constantone.ai)
- [API Documentation](https://constantone.ai/docs.html)
- [Technical Report](https://constantone.ai/math.html)