File size: 6,507 Bytes

---
license: apache-2.0
language:
  - en
  - de
  - fr
  - es
  - zh
  - ja
library_name: onnxruntime
pipeline_tag: text-classification
tags:
  - sentiment-analysis
  - edge-ai
  - tinyml
  - knowledge-distillation
  - onnx
  - int8
  - quantized
  - microcontroller
  - nlp
datasets:
  - glue
  - sst2
metrics:
  - accuracy
  - f1
model-index:
  - name: constant-edge-0.5
    results:
      - task:
          type: text-classification
          name: Sentiment Analysis
        dataset:
          type: glue
          name: SST-2
          split: validation
        metrics:
          - type: accuracy
            value: 83.03
          - type: f1
            value: 0.830
---

# Constant Edge 0.5 — 1.46 MB Sentiment Analysis for Edge Devices

A **288x compressed** sentiment classifier distilled from BERT. Runs on microcontrollers, mobile devices, and edge hardware with **0.14ms inference latency**.

| Metric | Value |
|--------|-------|
| **Accuracy** | 83.03% (SST-2) |
| **F1** | 0.830 |
| **Model Size** | 1.46 MB (INT8 quantized) |
| **Parameters** | 383,618 |
| **Inference** | 0.14ms (ONNX Runtime, CPU) |
| **Compression** | 288x vs. BERT teacher (420 MB) |
| **Teacher Accuracy** | 92.32% |

## Quick Start

```python
import onnxruntime as ort
import numpy as np

# Load model
session = ort.InferenceSession("model_edge.onnx")

# Tokenize (simple whitespace + vocabulary lookup)
# For production use: pip install aure
from aure import Aure
model = Aure("edge")
result = model.predict("I love this product!")
print(result)  # SentimentResult(label='positive', score=0.91)
```

### Standalone ONNX Inference (No Dependencies)

```python
import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model_edge.onnx")

# Input: token IDs as int64 array, shape [batch_size, seq_length]
# Max sequence length: 128
# Vocabulary: pruned to 10,907 tokens (from BERT's 30,522)
input_ids = np.array([[101, 1045, 2293, 2023, 3185, 999, 102]], dtype=np.int64)

logits = session.run(None, {"input_ids": input_ids})[0]

# Softmax
exp = np.exp(logits - logits.max(axis=1, keepdims=True))
probs = exp / exp.sum(axis=1, keepdims=True)

labels = ["negative", "positive"]
pred = labels[np.argmax(probs[0])]
confidence = float(probs[0].max())
print(f"{pred} ({confidence:.1%})")
```

## Architecture

**NanoCNN** — a compact convolutional architecture optimized for sub-2MB deployment:

- **Embedding**: 32-dimensional, pruned vocabulary (10,907 tokens)
- **Convolutions**: 4 parallel Conv1d banks (filter sizes 2, 3, 4, 5), 64 filters each
- **Compression**: Linear bottleneck (256 → 16)
- **Classifier**: 16 → 48 → 2 (with dropout 0.3)
- **Quantization**: INT8 (post-training, ONNX)

## Distillation Pipeline

Distilled from a BERT-base-uncased teacher through systematic experimentation:

```
BERT Teacher (92.32%, 420 MB)
    → Knowledge Distillation (T=6.39, α=0.69)
        → NanoCNN Student (83.03%, 1.46 MB)
```

Key distillation parameters (optimized via Optuna, 20 trials):
- Temperature: 6.39
- Distillation weight (α): 0.69
- Learning rate: 2e-3
- Epochs: 30
- Batch size: 128

## Ablation Results

We tested multiple compression approaches. Linear projection consistently won:

### Teacher Compression (on BERT)

| Method | Accuracy | Params |
|--------|----------|--------|
| **Linear** | **92.32%** | 49K |
| Graph Laplacian | 92.20% | 639K |
| MLP (2-layer) | 92.09% | 213K |

### Student Compression (NanoCNN)

| Method | FP32 Accuracy | INT8 Accuracy | Size |
|--------|--------------|---------------|------|
| **Linear** | 82.04% | **83.03%** | **1.46 MB** |
| MLP | 81.54% | 82.11% | 1.47 MB |
| Spectral | 81.15% | 82.00% | 1.48 MB |

### Architecture Comparison

| Model | Accuracy | Size | Compression |
|-------|----------|------|-------------|
| BERT Teacher | 92.32% | 420 MB | 1x |
| CNN Large | 83.94% | 31.8 MB | 13x |
| CNN TinyML | 83.14% | 3.4 MB | 124x |
| **NanoCNN INT8** | **83.03%** | **1.46 MB** | **288x** |
| Tiny Transformer | 80.16% | 6.4 MB | 66x |

The transformer student performs worse despite 4x more parameters, confirming CNN inductive biases are better suited to small-scale text classification.

## Multilingual Support

The Aure SDK supports 6 languages. Non-English models are downloaded on first use:

```python
from aure import Aure

# German
model = Aure("edge", lang="de")
model.predict("Das ist wunderbar!")  # positive

# Japanese
model = Aure("edge", lang="ja")
model.predict("素晴らしい映画でした")  # positive

# French, Spanish, Chinese also supported
```

Supported: `en`, `de`, `fr`, `es`, `zh`, `ja`

## Model Variants

| Variant | File | Size | Accuracy | Use Case |
|---------|------|------|----------|----------|
| **Edge** (this model) | `model_edge.onnx` | 1.46 MB | 83.03% | MCUs, wearables, IoT |
| Edge 3-Class | `model_edge_3class.onnx` | 1.47 MB | ~82% | Pos/neutral/neg classification |
| Mobile | `model_mobile.onnx` | 4.0 MB | 83% | Mobile apps, Raspberry Pi |

## Hardware Targets

Tested on:
- **NVIDIA Jetson Nano** — 0.08ms inference
- **Raspberry Pi 4** — 0.9ms inference
- **x86 CPU** (i7) — 0.14ms inference
- **ARM Cortex-M7** (STM32H7) — target <10ms (ONNX Micro Runtime)

## Training Details

- **Dataset**: SST-2 (Stanford Sentiment Treebank, binary), 67,349 train / 872 validation
- **Teacher**: BERT-base-uncased + linear compression head, fine-tuned 12 epochs
- **Hardware**: NVIDIA RTX 4090 Laptop GPU (16 GB), Windows 11
- **Framework**: PyTorch 2.x → ONNX export → INT8 quantization
- **Reproducibility**: 5-seed evaluation with standard deviations reported

## Negative Results (Published for Transparency)

1. **Graph Laplacian spectral compression provides no benefit** over linear projection at either teacher or student level
2. **Progressive distillation** (BERT → DistilBERT → Student) does not improve student quality vs. direct distillation
3. **Transformer students perform worse than CNN students** at sub-2MB scale despite using 4x more parameters

## Citation

```bibtex
@misc{constantone2026aure,
  title={Aure: Pareto-Optimal Knowledge Distillation for Sub-2MB Sentiment Classification},
  author={ConstantOne AI},
  year={2026},
  url={https://huggingface.co/ConstantQJ/constant-edge-0.5}
}
```

## License

Apache 2.0 — use freely in commercial and non-commercial projects.

## Links

- [ConstantOne AI](https://constantone.ai)
- [API Documentation](https://constantone.ai/docs.html)
- [Technical Report](https://constantone.ai/math.html)