Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +229 -0
model_edge.onnx +3 -0
model_edge_3class.onnx +3 -0
model_mobile.onnx +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,229 @@

+---
+license: apache-2.0
+language:
+  - en
+  - de
+  - fr
+  - es
+  - zh
+  - ja
+library_name: onnxruntime
+pipeline_tag: text-classification
+tags:
+  - sentiment-analysis
+  - edge-ai
+  - tinyml
+  - knowledge-distillation
+  - onnx
+  - int8
+  - quantized
+  - microcontroller
+  - nlp
+datasets:
+  - glue
+  - sst2
+metrics:
+  - accuracy
+  - f1
+model-index:
+  - name: aure-edge-sentiment
+    results:
+      - task:
+          type: text-classification
+          name: Sentiment Analysis
+        dataset:
+          type: glue
+          name: SST-2
+          split: validation
+        metrics:
+          - type: accuracy
+            value: 83.03
+          - type: f1
+            value: 0.830
+---
+# Aure Edge — 1.46 MB Sentiment Analysis for Edge Devices
+A **288x compressed** sentiment classifier distilled from BERT. Runs on microcontrollers, mobile devices, and edge hardware with **0.14ms inference latency**.
+| Metric | Value |
+|--------|-------|
+| **Accuracy** | 83.03% (SST-2) |
+| **F1** | 0.830 |
+| **Model Size** | 1.46 MB (INT8 quantized) |
+| **Parameters** | 383,618 |
+| **Inference** | 0.14ms (ONNX Runtime, CPU) |
+| **Compression** | 288x vs. BERT teacher (420 MB) |
+| **Teacher Accuracy** | 92.32% |
+## Quick Start
+```python
+import onnxruntime as ort
+import numpy as np
+# Load model
+session = ort.InferenceSession("model_edge.onnx")
+# Tokenize (simple whitespace + vocabulary lookup)
+# For production use: pip install aure
+from aure import Aure
+model = Aure("edge")
+result = model.predict("I love this product!")
+print(result)  # SentimentResult(label='positive', score=0.91)
+```
+### Standalone ONNX Inference (No Dependencies)
+```python
+import onnxruntime as ort
+import numpy as np
+session = ort.InferenceSession("model_edge.onnx")
+# Input: token IDs as int64 array, shape [batch_size, seq_length]
+# Max sequence length: 128
+# Vocabulary: pruned to 10,907 tokens (from BERT's 30,522)
+input_ids = np.array([[101, 1045, 2293, 2023, 3185, 999, 102]], dtype=np.int64)
+logits = session.run(None, {"input_ids": input_ids})[0]
+# Softmax
+exp = np.exp(logits - logits.max(axis=1, keepdims=True))
+probs = exp / exp.sum(axis=1, keepdims=True)
+labels = ["negative", "positive"]
+pred = labels[np.argmax(probs[0])]
+confidence = float(probs[0].max())
+print(f"{pred} ({confidence:.1%})")
+```
+## Architecture
+**NanoCNN** — a compact convolutional architecture optimized for sub-2MB deployment:
+- **Embedding**: 32-dimensional, pruned vocabulary (10,907 tokens)
+- **Convolutions**: 4 parallel Conv1d banks (filter sizes 2, 3, 4, 5), 64 filters each
+- **Compression**: Linear bottleneck (256 → 16)
+- **Classifier**: 16 → 48 → 2 (with dropout 0.3)
+- **Quantization**: INT8 (post-training, ONNX)
+## Distillation Pipeline
+Distilled from a BERT-base-uncased teacher through systematic experimentation:
+```
+BERT Teacher (92.32%, 420 MB)
+    → Knowledge Distillation (T=6.39, α=0.69)
+        → NanoCNN Student (83.03%, 1.46 MB)
+```
+Key distillation parameters (optimized via Optuna, 20 trials):
+- Temperature: 6.39
+- Distillation weight (α): 0.69
+- Learning rate: 2e-3
+- Epochs: 30
+- Batch size: 128
+## Ablation Results
+We tested multiple compression approaches. Linear projection consistently won:
+### Teacher Compression (on BERT)
+| Method | Accuracy | Params |
+|--------|----------|--------|
+| **Linear** | **92.32%** | 49K |
+| Graph Laplacian | 92.20% | 639K |
+| MLP (2-layer) | 92.09% | 213K |
+### Student Compression (NanoCNN)
+| Method | FP32 Accuracy | INT8 Accuracy | Size |
+|--------|--------------|---------------|------|
+| **Linear** | 82.04% | **83.03%** | **1.46 MB** |
+| MLP | 81.54% | 82.11% | 1.47 MB |
+| Spectral | 81.15% | 82.00% | 1.48 MB |
+### Architecture Comparison
+| Model | Accuracy | Size | Compression |
+|-------|----------|------|-------------|
+| BERT Teacher | 92.32% | 420 MB | 1x |
+| CNN Large | 83.94% | 31.8 MB | 13x |
+| CNN TinyML | 83.14% | 3.4 MB | 124x |
+| **NanoCNN INT8** | **83.03%** | **1.46 MB** | **288x** |
+| Tiny Transformer | 80.16% | 6.4 MB | 66x |
+The transformer student performs worse despite 4x more parameters, confirming CNN inductive biases are better suited to small-scale text classification.
+## Multilingual Support
+The Aure SDK supports 6 languages. Non-English models are downloaded on first use:
+```python
+from aure import Aure
+# German
+model = Aure("edge", lang="de")
+model.predict("Das ist wunderbar!")  # positive
+# Japanese
+model = Aure("edge", lang="ja")
+model.predict("素晴らしい映画でした")  # positive
+# French, Spanish, Chinese also supported
+```
+Supported: `en`, `de`, `fr`, `es`, `zh`, `ja`
+## Model Variants
+| Variant | File | Size | Accuracy | Use Case |
+|---------|------|------|----------|----------|
+| **Edge** (this model) | `model_edge.onnx` | 1.46 MB | 83.03% | MCUs, wearables, IoT |
+| Edge 3-Class | `model_edge_3class.onnx` | 1.47 MB | ~82% | Pos/neutral/neg classification |
+| Mobile | `model_mobile.onnx` | 4.0 MB | 83% | Mobile apps, Raspberry Pi |
+## Hardware Targets
+Tested on:
+- **NVIDIA Jetson Nano** — 0.08ms inference
+- **Raspberry Pi 4** — 0.9ms inference
+- **x86 CPU** (i7) — 0.14ms inference
+- **ARM Cortex-M7** (STM32H7) — target <10ms (ONNX Micro Runtime)
+## Training Details
+- **Dataset**: SST-2 (Stanford Sentiment Treebank, binary), 67,349 train / 872 validation
+- **Teacher**: BERT-base-uncased + linear compression head, fine-tuned 12 epochs
+- **Hardware**: NVIDIA RTX 4090 Laptop GPU (16 GB), Windows 11
+- **Framework**: PyTorch 2.x → ONNX export → INT8 quantization
+- **Reproducibility**: 5-seed evaluation with standard deviations reported
+## Negative Results (Published for Transparency)
+1. **Graph Laplacian spectral compression provides no benefit** over linear projection at either teacher or student level
+2. **Progressive distillation** (BERT → DistilBERT → Student) does not improve student quality vs. direct distillation
+3. **Transformer students perform worse than CNN students** at sub-2MB scale despite using 4x more parameters
+## Citation
+```bibtex
+@misc{constantone2026aure,
+  title={Aure: Pareto-Optimal Knowledge Distillation for Sub-2MB Sentiment Classification},
+  author={ConstantOne AI},
+  year={2026},
+  url={https://huggingface.co/ConstantQJ/aure-edge-sentiment}
+}
+```
+## License
+Apache 2.0 — use freely in commercial and non-commercial projects.
+## Links
+- [ConstantOne AI](https://constantone.ai)
+- [API Documentation](https://constantone.ai/docs.html)
+- [Technical Report](https://constantone.ai/math.html)

model_edge.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e521ffa720d22fdc6073b3c0ce4ea600cf15601fdd1e6f6e249334ced0fa424f
+size 1542385

model_edge_3class.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:53c8e99b8d766219dd8e49917f98003e08fddb246452eea547bfd5f7566f5a16
+size 1541498

model_mobile.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:373156649e2d061ddf1f8a7b0b07fbb5a87e8f1f5555ad2ce86c2381fe281fbf
+size 4186084