--- license: apache-2.0 language: - en - de - fr - es - zh - ja library_name: onnxruntime pipeline_tag: text-classification tags: - sentiment-analysis - edge-ai - tinyml - knowledge-distillation - onnx - int8 - quantized - microcontroller - nlp datasets: - glue - sst2 metrics: - accuracy - f1 model-index: - name: constant-edge-0.5 results: - task: type: text-classification name: Sentiment Analysis dataset: type: glue name: SST-2 split: validation metrics: - type: accuracy value: 83.03 - type: f1 value: 0.830 --- # Constant Edge 0.5 — 1.46 MB Sentiment Analysis for Edge Devices A **288x compressed** sentiment classifier distilled from BERT. Runs on microcontrollers, mobile devices, and edge hardware with **0.14ms inference latency**. | Metric | Value | |--------|-------| | **Accuracy** | 83.03% (SST-2) | | **F1** | 0.830 | | **Model Size** | 1.46 MB (INT8 quantized) | | **Parameters** | 383,618 | | **Inference** | 0.14ms (ONNX Runtime, CPU) | | **Compression** | 288x vs. BERT teacher (420 MB) | | **Teacher Accuracy** | 92.32% | ## Quick Start ```python import onnxruntime as ort import numpy as np # Load model session = ort.InferenceSession("model_edge.onnx") # Tokenize (simple whitespace + vocabulary lookup) # For production use: pip install aure from aure import Aure model = Aure("edge") result = model.predict("I love this product!") print(result) # SentimentResult(label='positive', score=0.91) ``` ### Standalone ONNX Inference (No Dependencies) ```python import onnxruntime as ort import numpy as np session = ort.InferenceSession("model_edge.onnx") # Input: token IDs as int64 array, shape [batch_size, seq_length] # Max sequence length: 128 # Vocabulary: pruned to 10,907 tokens (from BERT's 30,522) input_ids = np.array([[101, 1045, 2293, 2023, 3185, 999, 102]], dtype=np.int64) logits = session.run(None, {"input_ids": input_ids})[0] # Softmax exp = np.exp(logits - logits.max(axis=1, keepdims=True)) probs = exp / exp.sum(axis=1, keepdims=True) labels = ["negative", "positive"] pred = labels[np.argmax(probs[0])] confidence = float(probs[0].max()) print(f"{pred} ({confidence:.1%})") ``` ## Architecture **NanoCNN** — a compact convolutional architecture optimized for sub-2MB deployment: - **Embedding**: 32-dimensional, pruned vocabulary (10,907 tokens) - **Convolutions**: 4 parallel Conv1d banks (filter sizes 2, 3, 4, 5), 64 filters each - **Compression**: Linear bottleneck (256 → 16) - **Classifier**: 16 → 48 → 2 (with dropout 0.3) - **Quantization**: INT8 (post-training, ONNX) ## Distillation Pipeline Distilled from a BERT-base-uncased teacher through systematic experimentation: ``` BERT Teacher (92.32%, 420 MB) → Knowledge Distillation (T=6.39, α=0.69) → NanoCNN Student (83.03%, 1.46 MB) ``` Key distillation parameters (optimized via Optuna, 20 trials): - Temperature: 6.39 - Distillation weight (α): 0.69 - Learning rate: 2e-3 - Epochs: 30 - Batch size: 128 ## Ablation Results We tested multiple compression approaches. Linear projection consistently won: ### Teacher Compression (on BERT) | Method | Accuracy | Params | |--------|----------|--------| | **Linear** | **92.32%** | 49K | | Graph Laplacian | 92.20% | 639K | | MLP (2-layer) | 92.09% | 213K | ### Student Compression (NanoCNN) | Method | FP32 Accuracy | INT8 Accuracy | Size | |--------|--------------|---------------|------| | **Linear** | 82.04% | **83.03%** | **1.46 MB** | | MLP | 81.54% | 82.11% | 1.47 MB | | Spectral | 81.15% | 82.00% | 1.48 MB | ### Architecture Comparison | Model | Accuracy | Size | Compression | |-------|----------|------|-------------| | BERT Teacher | 92.32% | 420 MB | 1x | | CNN Large | 83.94% | 31.8 MB | 13x | | CNN TinyML | 83.14% | 3.4 MB | 124x | | **NanoCNN INT8** | **83.03%** | **1.46 MB** | **288x** | | Tiny Transformer | 80.16% | 6.4 MB | 66x | The transformer student performs worse despite 4x more parameters, confirming CNN inductive biases are better suited to small-scale text classification. ## Multilingual Support The Aure SDK supports 6 languages. Non-English models are downloaded on first use: ```python from aure import Aure # German model = Aure("edge", lang="de") model.predict("Das ist wunderbar!") # positive # Japanese model = Aure("edge", lang="ja") model.predict("素晴らしい映画でした") # positive # French, Spanish, Chinese also supported ``` Supported: `en`, `de`, `fr`, `es`, `zh`, `ja` ## Model Variants | Variant | File | Size | Accuracy | Use Case | |---------|------|------|----------|----------| | **Edge** (this model) | `model_edge.onnx` | 1.46 MB | 83.03% | MCUs, wearables, IoT | | Edge 3-Class | `model_edge_3class.onnx` | 1.47 MB | ~82% | Pos/neutral/neg classification | | Mobile | `model_mobile.onnx` | 4.0 MB | 83% | Mobile apps, Raspberry Pi | ## Hardware Targets Tested on: - **NVIDIA Jetson Nano** — 0.08ms inference - **Raspberry Pi 4** — 0.9ms inference - **x86 CPU** (i7) — 0.14ms inference - **ARM Cortex-M7** (STM32H7) — target <10ms (ONNX Micro Runtime) ## Training Details - **Dataset**: SST-2 (Stanford Sentiment Treebank, binary), 67,349 train / 872 validation - **Teacher**: BERT-base-uncased + linear compression head, fine-tuned 12 epochs - **Hardware**: NVIDIA RTX 4090 Laptop GPU (16 GB), Windows 11 - **Framework**: PyTorch 2.x → ONNX export → INT8 quantization - **Reproducibility**: 5-seed evaluation with standard deviations reported ## Negative Results (Published for Transparency) 1. **Graph Laplacian spectral compression provides no benefit** over linear projection at either teacher or student level 2. **Progressive distillation** (BERT → DistilBERT → Student) does not improve student quality vs. direct distillation 3. **Transformer students perform worse than CNN students** at sub-2MB scale despite using 4x more parameters ## Citation ```bibtex @misc{constantone2026aure, title={Aure: Pareto-Optimal Knowledge Distillation for Sub-2MB Sentiment Classification}, author={ConstantOne AI}, year={2026}, url={https://huggingface.co/ConstantQJ/constant-edge-0.5} } ``` ## License Apache 2.0 — use freely in commercial and non-commercial projects. ## Links - [ConstantOne AI](https://constantone.ai) - [API Documentation](https://constantone.ai/docs.html) - [Technical Report](https://constantone.ai/math.html)