constant-edge-0.5 / README.md

Upload README.md with huggingface_hub

02cac04 verified 3 days ago

6.51 kB

	---
	license: apache-2.0
	language:
	- en
	- de
	- fr
	- es
	- zh
	- ja
	library_name: onnxruntime
	pipeline_tag: text-classification
	tags:
	- sentiment-analysis
	- edge-ai
	- tinyml
	- knowledge-distillation
	- onnx
	- int8
	- quantized
	- microcontroller
	- nlp
	datasets:
	- glue
	- sst2
	metrics:
	- accuracy
	- f1
	model-index:
	- name: constant-edge-0.5
	results:
	- task:
	type: text-classification
	name: Sentiment Analysis
	dataset:
	type: glue
	name: SST-2
	split: validation
	metrics:
	- type: accuracy
	value: 83.03
	- type: f1
	value: 0.830
	---

	# Constant Edge 0.5 — 1.46 MB Sentiment Analysis for Edge Devices

	A 288x compressed sentiment classifier distilled from BERT. Runs on microcontrollers, mobile devices, and edge hardware with 0.14ms inference latency.

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Accuracy \| 83.03% (SST-2) \|
	\| F1 \| 0.830 \|
	\| Model Size \| 1.46 MB (INT8 quantized) \|
	\| Parameters \| 383,618 \|
	\| Inference \| 0.14ms (ONNX Runtime, CPU) \|
	\| Compression \| 288x vs. BERT teacher (420 MB) \|
	\| Teacher Accuracy \| 92.32% \|

	## Quick Start

	```python
	import onnxruntime as ort
	import numpy as np

	# Load model
	session = ort.InferenceSession("model_edge.onnx")

	# Tokenize (simple whitespace + vocabulary lookup)
	# For production use: pip install aure
	from aure import Aure
	model = Aure("edge")
	result = model.predict("I love this product!")
	print(result) # SentimentResult(label='positive', score=0.91)
	```

	### Standalone ONNX Inference (No Dependencies)

	```python
	import onnxruntime as ort
	import numpy as np

	session = ort.InferenceSession("model_edge.onnx")

	# Input: token IDs as int64 array, shape [batch_size, seq_length]
	# Max sequence length: 128
	# Vocabulary: pruned to 10,907 tokens (from BERT's 30,522)
	input_ids = np.array([[101, 1045, 2293, 2023, 3185, 999, 102]], dtype=np.int64)

	logits = session.run(None, {"input_ids": input_ids})[0]

	# Softmax
	exp = np.exp(logits - logits.max(axis=1, keepdims=True))
	probs = exp / exp.sum(axis=1, keepdims=True)

	labels = ["negative", "positive"]
	pred = labels[np.argmax(probs[0])]
	confidence = float(probs[0].max())
	print(f"{pred} ({confidence:.1%})")
	```

	## Architecture

	NanoCNN — a compact convolutional architecture optimized for sub-2MB deployment:

	- Embedding: 32-dimensional, pruned vocabulary (10,907 tokens)
	- Convolutions: 4 parallel Conv1d banks (filter sizes 2, 3, 4, 5), 64 filters each
	- Compression: Linear bottleneck (256 → 16)
	- Classifier: 16 → 48 → 2 (with dropout 0.3)
	- Quantization: INT8 (post-training, ONNX)

	## Distillation Pipeline

	Distilled from a BERT-base-uncased teacher through systematic experimentation:

	```
	BERT Teacher (92.32%, 420 MB)
	→ Knowledge Distillation (T=6.39, α=0.69)
	→ NanoCNN Student (83.03%, 1.46 MB)
	```

	Key distillation parameters (optimized via Optuna, 20 trials):
	- Temperature: 6.39
	- Distillation weight (α): 0.69
	- Learning rate: 2e-3
	- Epochs: 30
	- Batch size: 128

	## Ablation Results

	We tested multiple compression approaches. Linear projection consistently won:

	### Teacher Compression (on BERT)

	\| Method \| Accuracy \| Params \|
	\|--------\|----------\|--------\|
	\| Linear \| 92.32% \| 49K \|
	\| Graph Laplacian \| 92.20% \| 639K \|
	\| MLP (2-layer) \| 92.09% \| 213K \|

	### Student Compression (NanoCNN)

	\| Method \| FP32 Accuracy \| INT8 Accuracy \| Size \|
	\|--------\|--------------\|---------------\|------\|
	\| Linear \| 82.04% \| 83.03% \| 1.46 MB \|
	\| MLP \| 81.54% \| 82.11% \| 1.47 MB \|
	\| Spectral \| 81.15% \| 82.00% \| 1.48 MB \|

	### Architecture Comparison

	\| Model \| Accuracy \| Size \| Compression \|
	\|-------\|----------\|------\|-------------\|
	\| BERT Teacher \| 92.32% \| 420 MB \| 1x \|
	\| CNN Large \| 83.94% \| 31.8 MB \| 13x \|
	\| CNN TinyML \| 83.14% \| 3.4 MB \| 124x \|
	\| NanoCNN INT8 \| 83.03% \| 1.46 MB \| 288x \|
	\| Tiny Transformer \| 80.16% \| 6.4 MB \| 66x \|

	The transformer student performs worse despite 4x more parameters, confirming CNN inductive biases are better suited to small-scale text classification.

	## Multilingual Support

	The Aure SDK supports 6 languages. Non-English models are downloaded on first use:

	```python
	from aure import Aure

	# German
	model = Aure("edge", lang="de")
	model.predict("Das ist wunderbar!") # positive

	# Japanese
	model = Aure("edge", lang="ja")
	model.predict("素晴らしい映画でした") # positive

	# French, Spanish, Chinese also supported
	```

	Supported: `en`, `de`, `fr`, `es`, `zh`, `ja`

	## Model Variants

	\| Variant \| File \| Size \| Accuracy \| Use Case \|
	\|---------\|------\|------\|----------\|----------\|
	\| Edge (this model) \| `model_edge.onnx` \| 1.46 MB \| 83.03% \| MCUs, wearables, IoT \|
	\| Edge 3-Class \| `model_edge_3class.onnx` \| 1.47 MB \| ~82% \| Pos/neutral/neg classification \|
	\| Mobile \| `model_mobile.onnx` \| 4.0 MB \| 83% \| Mobile apps, Raspberry Pi \|

	## Hardware Targets

	Tested on:
	- NVIDIA Jetson Nano — 0.08ms inference
	- Raspberry Pi 4 — 0.9ms inference
	- x86 CPU (i7) — 0.14ms inference
	- ARM Cortex-M7 (STM32H7) — target <10ms (ONNX Micro Runtime)

	## Training Details

	- Dataset: SST-2 (Stanford Sentiment Treebank, binary), 67,349 train / 872 validation
	- Teacher: BERT-base-uncased + linear compression head, fine-tuned 12 epochs
	- Hardware: NVIDIA RTX 4090 Laptop GPU (16 GB), Windows 11
	- Framework: PyTorch 2.x → ONNX export → INT8 quantization
	- Reproducibility: 5-seed evaluation with standard deviations reported

	## Negative Results (Published for Transparency)

	1. Graph Laplacian spectral compression provides no benefit over linear projection at either teacher or student level
	2. Progressive distillation (BERT → DistilBERT → Student) does not improve student quality vs. direct distillation
	3. Transformer students perform worse than CNN students at sub-2MB scale despite using 4x more parameters

	## Citation

	```bibtex
	@misc{constantone2026aure,
	title={Aure: Pareto-Optimal Knowledge Distillation for Sub-2MB Sentiment Classification},
	author={ConstantOne AI},
	year={2026},
	url={https://huggingface.co/ConstantQJ/constant-edge-0.5}
	}
	```

	## License

	Apache 2.0 — use freely in commercial and non-commercial projects.

	## Links

	- [ConstantOne AI](https://constantone.ai)
	- [API Documentation](https://constantone.ai/docs.html)
	- [Technical Report](https://constantone.ai/math.html)