constant-edge-0.5 / README.md
ConstantQJ's picture
Upload README.md with huggingface_hub
02cac04 verified
metadata
license: apache-2.0
language:
  - en
  - de
  - fr
  - es
  - zh
  - ja
library_name: onnxruntime
pipeline_tag: text-classification
tags:
  - sentiment-analysis
  - edge-ai
  - tinyml
  - knowledge-distillation
  - onnx
  - int8
  - quantized
  - microcontroller
  - nlp
datasets:
  - glue
  - sst2
metrics:
  - accuracy
  - f1
model-index:
  - name: constant-edge-0.5
    results:
      - task:
          type: text-classification
          name: Sentiment Analysis
        dataset:
          type: glue
          name: SST-2
          split: validation
        metrics:
          - type: accuracy
            value: 83.03
          - type: f1
            value: 0.83

Constant Edge 0.5 β€” 1.46 MB Sentiment Analysis for Edge Devices

A 288x compressed sentiment classifier distilled from BERT. Runs on microcontrollers, mobile devices, and edge hardware with 0.14ms inference latency.

Metric Value
Accuracy 83.03% (SST-2)
F1 0.830
Model Size 1.46 MB (INT8 quantized)
Parameters 383,618
Inference 0.14ms (ONNX Runtime, CPU)
Compression 288x vs. BERT teacher (420 MB)
Teacher Accuracy 92.32%

Quick Start

import onnxruntime as ort
import numpy as np

# Load model
session = ort.InferenceSession("model_edge.onnx")

# Tokenize (simple whitespace + vocabulary lookup)
# For production use: pip install aure
from aure import Aure
model = Aure("edge")
result = model.predict("I love this product!")
print(result)  # SentimentResult(label='positive', score=0.91)

Standalone ONNX Inference (No Dependencies)

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model_edge.onnx")

# Input: token IDs as int64 array, shape [batch_size, seq_length]
# Max sequence length: 128
# Vocabulary: pruned to 10,907 tokens (from BERT's 30,522)
input_ids = np.array([[101, 1045, 2293, 2023, 3185, 999, 102]], dtype=np.int64)

logits = session.run(None, {"input_ids": input_ids})[0]

# Softmax
exp = np.exp(logits - logits.max(axis=1, keepdims=True))
probs = exp / exp.sum(axis=1, keepdims=True)

labels = ["negative", "positive"]
pred = labels[np.argmax(probs[0])]
confidence = float(probs[0].max())
print(f"{pred} ({confidence:.1%})")

Architecture

NanoCNN β€” a compact convolutional architecture optimized for sub-2MB deployment:

  • Embedding: 32-dimensional, pruned vocabulary (10,907 tokens)
  • Convolutions: 4 parallel Conv1d banks (filter sizes 2, 3, 4, 5), 64 filters each
  • Compression: Linear bottleneck (256 β†’ 16)
  • Classifier: 16 β†’ 48 β†’ 2 (with dropout 0.3)
  • Quantization: INT8 (post-training, ONNX)

Distillation Pipeline

Distilled from a BERT-base-uncased teacher through systematic experimentation:

BERT Teacher (92.32%, 420 MB)
    β†’ Knowledge Distillation (T=6.39, Ξ±=0.69)
        β†’ NanoCNN Student (83.03%, 1.46 MB)

Key distillation parameters (optimized via Optuna, 20 trials):

  • Temperature: 6.39
  • Distillation weight (Ξ±): 0.69
  • Learning rate: 2e-3
  • Epochs: 30
  • Batch size: 128

Ablation Results

We tested multiple compression approaches. Linear projection consistently won:

Teacher Compression (on BERT)

Method Accuracy Params
Linear 92.32% 49K
Graph Laplacian 92.20% 639K
MLP (2-layer) 92.09% 213K

Student Compression (NanoCNN)

Method FP32 Accuracy INT8 Accuracy Size
Linear 82.04% 83.03% 1.46 MB
MLP 81.54% 82.11% 1.47 MB
Spectral 81.15% 82.00% 1.48 MB

Architecture Comparison

Model Accuracy Size Compression
BERT Teacher 92.32% 420 MB 1x
CNN Large 83.94% 31.8 MB 13x
CNN TinyML 83.14% 3.4 MB 124x
NanoCNN INT8 83.03% 1.46 MB 288x
Tiny Transformer 80.16% 6.4 MB 66x

The transformer student performs worse despite 4x more parameters, confirming CNN inductive biases are better suited to small-scale text classification.

Multilingual Support

The Aure SDK supports 6 languages. Non-English models are downloaded on first use:

from aure import Aure

# German
model = Aure("edge", lang="de")
model.predict("Das ist wunderbar!")  # positive

# Japanese
model = Aure("edge", lang="ja")
model.predict("η΄ ζ™΄γ‚‰γ—γ„ζ˜ η”»γ§γ—γŸ")  # positive

# French, Spanish, Chinese also supported

Supported: en, de, fr, es, zh, ja

Model Variants

Variant File Size Accuracy Use Case
Edge (this model) model_edge.onnx 1.46 MB 83.03% MCUs, wearables, IoT
Edge 3-Class model_edge_3class.onnx 1.47 MB ~82% Pos/neutral/neg classification
Mobile model_mobile.onnx 4.0 MB 83% Mobile apps, Raspberry Pi

Hardware Targets

Tested on:

  • NVIDIA Jetson Nano β€” 0.08ms inference
  • Raspberry Pi 4 β€” 0.9ms inference
  • x86 CPU (i7) β€” 0.14ms inference
  • ARM Cortex-M7 (STM32H7) β€” target <10ms (ONNX Micro Runtime)

Training Details

  • Dataset: SST-2 (Stanford Sentiment Treebank, binary), 67,349 train / 872 validation
  • Teacher: BERT-base-uncased + linear compression head, fine-tuned 12 epochs
  • Hardware: NVIDIA RTX 4090 Laptop GPU (16 GB), Windows 11
  • Framework: PyTorch 2.x β†’ ONNX export β†’ INT8 quantization
  • Reproducibility: 5-seed evaluation with standard deviations reported

Negative Results (Published for Transparency)

  1. Graph Laplacian spectral compression provides no benefit over linear projection at either teacher or student level
  2. Progressive distillation (BERT β†’ DistilBERT β†’ Student) does not improve student quality vs. direct distillation
  3. Transformer students perform worse than CNN students at sub-2MB scale despite using 4x more parameters

Citation

@misc{constantone2026aure,
  title={Aure: Pareto-Optimal Knowledge Distillation for Sub-2MB Sentiment Classification},
  author={ConstantOne AI},
  year={2026},
  url={https://huggingface.co/ConstantQJ/constant-edge-0.5}
}

License

Apache 2.0 β€” use freely in commercial and non-commercial projects.

Links