README.md · reddysama/kestrelnet-benchmarks at main

Frodo

5 benchmark models: ECG, EEG emotions, eye state, seizure, HAR — verified on Kaggle

844b533 24 days ago

12.2 kB

	---
	license: apache-2.0
	library_name: numpy
	tags:
	- tabular-classification
	- tiny-model
	- edge-ai
	- no-gpu
	- numpy
	- real-time
	- ecg
	- eeg
	- seizure-detection
	- activity-recognition
	- medical-ai
	- biosignal
	- analytic-gradients
	datasets:
	- shayanfazeli/heartbeat
	- birdy654/eeg-brainwave-dataset-feeling-emotions
	- robikscube/eye-state-classification-eeg-dataset
	- harunshimanto/epileptic-seizure-recognition
	- uciml/human-activity-recognition-with-smartphones
	metrics:
	- accuracy
	- f1
	- roc_auc
	model-index:
	- name: KestrelNet / GoshawkNet Benchmark Suite
	results:
	- task:
	type: tabular-classification
	name: ECG Arrhythmia Detection
	dataset:
	type: shayanfazeli/heartbeat
	name: MIT-BIH Arrhythmia
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.972
	- name: Macro F1
	type: f1
	value: 0.853
	- task:
	type: tabular-classification
	name: EEG Emotion Recognition
	dataset:
	type: birdy654/eeg-brainwave-dataset-feeling-emotions
	name: EEG Brainwave Emotions
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.991
	- name: Macro F1
	type: f1
	value: 0.991
	- task:
	type: tabular-classification
	name: EEG Eye State Detection
	dataset:
	type: robikscube/eye-state-classification-eeg-dataset
	name: EEG Eye State (UCI)
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.942
	- name: AUC-ROC
	type: roc_auc
	value: 0.986
	- task:
	type: tabular-classification
	name: Epileptic Seizure Detection
	dataset:
	type: harunshimanto/epileptic-seizure-recognition
	name: Bonn University EEG
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.971
	- name: AUC-ROC
	type: roc_auc
	value: 0.988
	- task:
	type: tabular-classification
	name: Human Activity Recognition
	dataset:
	type: uciml/human-activity-recognition-with-smartphones
	name: UCI HAR Smartphones
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.949
	- name: Macro F1
	type: f1
	value: 0.949
	pipeline_tag: tabular-classification
	---

	# KestrelNet / GoshawkNet — Benchmark Suite

	Here's what a tiny model can do.

	Five public datasets. Five domains. All under 164K parameters. All CPU-only. All pure NumPy — no PyTorch, no TensorFlow, no GPU. Every result verified on Kaggle with live scoring.

	## Results

	\| Dataset \| Domain \| Task \| Accuracy \| F1 / AUC \| Params \| Size \| Latency \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| [MIT-BIH Arrhythmia](https://kaggle.com/datasets/shayanfazeli/heartbeat) \| Cardiology \| 5-class ECG \| 97.2% \| F1 0.853 \| 12,756 \| 50 KB \| 56 μs \|
	\| [EEG Brainwave Emotions](https://kaggle.com/datasets/birdy654/eeg-brainwave-dataset-feeling-emotions) \| Neuroscience \| 3-class EEG \| 99.1% \| F1 0.991 \| 163,788 \| 640 KB \| 1.3 ms \|
	\| [EEG Eye State](https://kaggle.com/datasets/robikscube/eye-state-classification-eeg-dataset) \| Neuroscience \| Binary EEG \| 94.2% \| AUC 0.986 \| 1,576 \| 6 KB \| 17 μs \|
	\| [Epileptic Seizure](https://kaggle.com/datasets/harunshimanto/epileptic-seizure-recognition) \| Neurology \| Binary EEG \| 97.1% \| AUC 0.988 \| 12,072 \| 47 KB \| — \|
	\| [HAR Smartphones](https://kaggle.com/datasets/uciml/human-activity-recognition-with-smartphones) \| Wearables \| 6-class IMU \| 94.9% \| F1 0.949 \| 15,416 \| 60 KB \| 70 μs \|

	Total model storage for all five: 803 KB.

	For context, a single layer of BERT is 7 million parameters. Our five models combined have 205,608.

	## How Small Is Small?

	\| Dataset \| Typical CNN/LSTM \| Ours \| How much smaller \|
	\|---\|---\|---\|---\|
	\| ECG Heartbeat \| 500K – 2M params \| 12,756 \| 40–160x \|
	\| EEG Emotions \| 1M+ params \| 163,788 \| 6x \|
	\| EEG Eye State \| 100K+ params \| 1,576 \| 63x \|
	\| Seizure Detection \| 200K+ params \| 12,072 \| 17x \|
	\| HAR Smartphones \| 200K – 1M params \| 15,416 \| 13–65x \|

	## Two Model Families

	We ship two architectures, named after raptors — bird size matches model size, hunting style matches classification style.

	### KestrelNet (Standard FC)

	The kestrel is the smallest falcon. It hovers perfectly still, then strikes with precision. KestrelNet is a standard fully-connected network with ReLU activations. Minimal parameters, maximum accuracy.

	```
	Input → Dense(hidden₁, ReLU) → Dense(hidden₂, ReLU) → Dense(classes, Softmax)
	```

	### GoshawkNet (Multivector Products)

	The goshawk is a larger raptor that hunts in complex terrain, reading patterns others miss. GoshawkNet replaces standard dot products with multivector products, giving each neuron native access to rotations, reflections, and scaling in a single operation. More parameters, but captures geometric structure in the data that FC nets need many more layers to approximate.

	Best model per dataset:

	\| Dataset \| Best Model \| Architecture \|
	\|---\|---\|---\|
	\| ECG Heartbeat \| GoshawkNet Cl(0,2) \| Quaternion, [16, 8] hidden \|
	\| EEG Emotions \| GoshawkNet Cl(0,2) \| Quaternion, [16, 8] hidden \|
	\| EEG Eye State \| GoshawkNet Cl(0,2) \| Quaternion, [16, 8] hidden \|
	\| Seizure Detection \| GoshawkNet Cl(0,2) \| Quaternion, [16, 8] hidden \|
	\| HAR Smartphones \| GoshawkNet Cl(0,2) \| Quaternion, [16, 8] hidden \|

	Quaternion algebra (Cl(0,2), dimension 4) consistently wins across all five domains.

	## Per-Dataset Details

	### ECG Heartbeat — MIT-BIH Arrhythmia Database

	- Samples: 87,554 train / 21,892 test
	- Features: 187 time-series values per heartbeat
	- Classes: Normal (N), Supraventricular (S), Ventricular (V), Fusion (F), Unknown (Q)
	- Best model: GoshawkNet Cl(0,2) [16,8] — 97.2% accuracy, 12,756 params
	- Kaggle notebook: [samareddy94/gnaninet-ecg-benchmark](https://www.kaggle.com/code/samareddy94/gnaninet-ecg-benchmark)

	\| Class \| Accuracy \|
	\|---\|---\|
	\| Normal (N) \| 99.2% \|
	\| Supraventricular (S) \| 64.6% \|
	\| Ventricular (V) \| 90.9% \|
	\| Fusion (F) \| 63.0% \|
	\| Unknown (Q) \| 95.9% \|

	### EEG Brainwave Emotions

	- Samples: 2,132 (1,707 train / 425 test)
	- Features: 2,548 EEG features (channel means + FFT)
	- Classes: Negative, Neutral, Positive
	- Best model: GoshawkNet Cl(0,2) [16,8] — 99.1% accuracy, 163,788 params
	- Kaggle notebook: [samareddy94/99-eeg-emotion-detection-164k-params-no-gpu](https://www.kaggle.com/code/samareddy94/99-eeg-emotion-detection-164k-params-no-gpu)

	\| Class \| Accuracy \|
	\|---\|---\|
	\| Negative \| 99.3% \|
	\| Neutral \| 100.0% \|
	\| Positive \| 97.9% \|

	### EEG Eye State — UCI / Roesler

	- Samples: 14,980 (11,985 train / 2,995 test)
	- Features: 14 EEG channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4)
	- Classes: Eyes Open, Eyes Closed
	- Best model: GoshawkNet Cl(0,2) [16,8] — 94.2% accuracy, 1,576 params
	- Kaggle notebook: [samareddy94/gnaninet-eeg-eyestate-benchmark](https://www.kaggle.com/code/samareddy94/gnaninet-eeg-eyestate-benchmark)

	The smallest model in the suite: 1,576 parameters, 6 KB. Runs at 60,000 inferences/sec on CPU.

	### Epileptic Seizure Recognition — Bonn University

	- Samples: 11,500 (9,200 train / 2,300 test)
	- Features: 178 EEG time-series values
	- Classes: Seizure vs Non-seizure (binary)
	- Best model: GoshawkNet Cl(0,2) [16,8] — 97.1% accuracy, AUC 0.988, 12,072 params

	AUC of 0.988 means the model correctly ranks seizure vs non-seizure 98.8% of the time — critical for clinical screening.

	### HAR Smartphones — UCI Activity Recognition

	- Samples: 7,352 train / 2,947 test (official split)
	- Features: 228 triaxial accelerometer + gyroscope features
	- Classes: Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing, Laying
	- Best model: GoshawkNet Cl(0,2) [16,8] — 95.7% local / 94.9% Kaggle live, 15,416 params
	- Kaggle notebook: [samareddy94/gnaninet-har-benchmark](https://www.kaggle.com/code/samareddy94/gnaninet-har-benchmark)

	\| Class \| Accuracy \|
	\|---\|---\|
	\| Walking \| 99.0% \|
	\| Walking Upstairs \| 90.7% \|
	\| Walking Downstairs \| 96.4% \|
	\| Sitting \| 91.9% \|
	\| Standing \| 95.7% \|
	\| Laying \| 99.8% \|

	## Training Details

	All models trained with the same configuration:

	- Optimizer: Adam (lr=0.001, β₁=0.9, β₂=0.999)
	- LR Schedule: Warmup-cosine (10-epoch warmup)
	- Early stopping: Patience 30–40 on validation loss
	- Batch size: 64–128
	- L2 regularization: λ = 1e-4 to 1e-5
	- Gradient clipping: 5.0
	- Normalization: Z-score, fit on training set only
	- Backpropagation: Analytic (hand-derived gradients, no autograd)

	Training is fast — all five models train in under 10 minutes total on a laptop CPU.

	## Repository Structure

	```
	├── ecg-heartbeat/
	│ ├── weights.txt # GoshawkNet Cl(0,2) [16,8] — 97.2% accuracy
	│ └── results.json # Full benchmark comparison (4 models)
	├── eeg-emotions/
	│ ├── weights.txt # GoshawkNet Cl(0,2) [16,8] — 99.1% accuracy
	│ └── results.json
	├── eye-state/
	│ ├── weights.txt # GoshawkNet Cl(0,2) [16,8] — 94.2% accuracy
	│ └── results.json
	├── seizure-prediction/
	│ ├── weights.txt # GoshawkNet Cl(0,2) [16,8] — 97.1% accuracy
	│ └── results.json
	├── har-smartphones/
	│ ├── weights.txt # GoshawkNet Cl(0,2) [16,8] — 94.9% accuracy
	│ └── results.json
	└── inference.py # Self-contained inference loader (no dependencies beyond NumPy)
	```

	## Quick Start

	```python
	import numpy as np
	from inference import load_model

	# Load any model
	model = load_model("ecg-heartbeat")
	proba = model.predict_proba(np.random.randn(187))
	print(proba) # [0.92, 0.01, 0.05, 0.01, 0.01] — 5-class probabilities
	```

	## Intended Use

	- Clinical screening: Pre-filter for ECG/EEG analysis before specialist review
	- Edge deployment: Wearables, IoT sensors, embedded devices — no GPU, no cloud
	- Ensemble first stage: Fast, tiny model screens easy cases; complex model handles the rest
	- Research baseline: Reproducible benchmarks on public datasets with minimal compute
	- Education: Complete from-scratch neural network with analytic gradients

	## Limitations

	- Models are trained on tabular/flattened features, not raw waveforms
	- Per-class accuracy varies — rare classes (ECG Fusion, ECG Supraventricular) have lower recall
	- No sequence modeling — each sample is classified independently
	- Medical models are NOT validated for clinical use — research benchmarks only

	## Kaggle Verification

	All results except seizure prediction have been verified with live Kaggle notebook scoring:

	\| Dataset \| Kaggle Notebook \|
	\|---\|---\|
	\| ECG Heartbeat \| [samareddy94/gnaninet-ecg-benchmark](https://www.kaggle.com/code/samareddy94/gnaninet-ecg-benchmark) \|
	\| EEG Emotions \| [samareddy94/99-eeg-emotion-detection-164k-params-no-gpu](https://www.kaggle.com/code/samareddy94/99-eeg-emotion-detection-164k-params-no-gpu) \|
	\| EEG Eye State \| [samareddy94/gnaninet-eeg-eyestate-benchmark](https://www.kaggle.com/code/samareddy94/gnaninet-eeg-eyestate-benchmark) \|
	\| HAR Smartphones \| [samareddy94/gnaninet-har-benchmark](https://www.kaggle.com/code/samareddy94/gnaninet-har-benchmark) \|

	## Citation

	```bibtex
	@misc{kestrelnet-benchmarks-2026,
	title={KestrelNet/GoshawkNet: Tiny Neural Classifiers for Biosignal and Sensor Data},
	author={Sama Reddy},
	year={2026},
	url={https://huggingface.co/reddysama/kestrelnet-benchmarks}
	}
	```

	---

	<p align="center">
	<em>No PyTorch. No TensorFlow. No GPU. Just NumPy and math.</em><br>
	<a href="https://huggingface.co/reddysama/gnaninet-fraud-classifier">Fraud Classifier</a> ·
	<a href="https://huggingface.co/spaces/reddysama/gnaninet-fraud-classifier">Live Demo</a> ·
	<a href="https://naninet.ai">Website</a>
	</p>