| --- |
| license: apache-2.0 |
| library_name: numpy |
| tags: |
| - fraud-detection |
| - tabular-classification |
| - tiny-model |
| - edge-ai |
| - no-gpu |
| - numpy |
| - real-time |
| - explainable-ai |
| - analytic-gradients |
| datasets: |
| - custom |
| metrics: |
| - accuracy |
| - latency |
| model-index: |
| - name: KestrelNet Fraud Classifier |
| results: |
| - task: |
| type: tabular-classification |
| name: Fraud Detection |
| metrics: |
| - name: Accuracy |
| type: accuracy |
| value: 0.916 |
| - name: Inference Latency |
| type: latency |
| value: 0.005ms |
| - name: Parameters |
| type: params |
| value: 1059 |
| pipeline_tag: tabular-classification |
| --- |
| |
| # KestrelNet — 1,059-Parameter Fraud Classifier |
|
|
| A fully-connected neural network for real-time transaction fraud detection. Built from scratch with **pure NumPy** — no PyTorch, no TensorFlow, no ONNX runtime. The entire model fits in a single tweet. |
|
|
| ## Why This Exists |
|
|
| Most fraud detection models are overbuilt. We wanted to find the floor: what's the smallest model that still works? Turns out, **1,059 parameters** gets you to 91.6% accuracy with sub-microsecond inference on commodity hardware. |
|
|
| ## Performance |
|
|
| | Metric | Value | |
| |---|---| |
| | Accuracy | 91.6% | |
| | Parameters | 1,059 | |
| | Model size | 8.3 KB | |
| | Inference latency | ~5 μs (CPU) | |
| | Throughput | ~190,000 inferences/sec | |
| | Dependencies | NumPy only | |
|
|
| For context, a single GPT-2 attention head has more parameters than this entire model. |
|
|
| ## Architecture |
|
|
| ``` |
| Input (14 features) → Dense(32, ReLU) → Dense(16, ReLU) → Dense(3, Softmax) |
| ``` |
|
|
| Three layers. No batch norm, no attention, no residual connections. Just matrix multiplies and ReLU. |
|
|
| **Training** uses analytic backpropagation — full gradient computation without autograd. Every partial derivative is derived by hand and implemented directly. This makes the training loop ~10x faster than equivalent PyTorch code for models this size. |
|
|
| ### GullNet Variant |
|
|
| We also offer a **GullNet** variant that replaces standard dot products with multivector products, giving the network native access to rotations, reflections, and scaling in a single operation — useful when feature interactions have geometric structure. The GullNet variant has more parameters but can capture complex feature relationships that FC nets miss. |
|
|
| ## Input Features |
|
|
| The model expects a 14-dimensional normalized feature vector: |
|
|
| | Index | Feature | Normalization | |
| |---|---|---| |
| | 0 | `amount_vs_avg` | Transaction amount / 90-day average | |
| | 1-2 | `hour_sin`, `hour_cos` | Cyclical encoding of transaction hour | |
| | 3-4 | `day_sin`, `day_cos` | Cyclical encoding of day of week | |
| | 5 | `location_delta` | Std deviations from usual location | |
| | 6 | `velocity_1h` | Transactions in past hour / 10, clipped | |
| | 7 | `velocity_24h` | Transactions in past 24h / 30, clipped | |
| | 8 | `merchant_risk` | Merchant category risk score [0-1] | |
| | 9 | `international` | Cross-border transaction (0/1) | |
| | 10 | `card_present` | Physical card used (0/1) | |
| | 11 | `device_match` | Known device (0/1) | |
| | 12 | `account_age_norm` | Account age / 3650 days | |
| | 13 | `prev_fraud_score` | Historical fraud rate [0-1] | |
|
|
| ## Output |
|
|
| Three-class softmax: `[legitimate, review, fraudulent]` |
|
|
| Threshold modes control the decision boundary: |
| - **Standard** — Balanced precision/recall |
| - **Conservative** — Flags more transactions (fewer false negatives) |
| - **Strict** — Flags fewer (fewer false positives) |
|
|
| ## Benchmarks — Public Datasets |
|
|
| KestrelNet and GoshawkNet evaluated on public Kaggle datasets. All results independently reproducible. |
|
|
| | Dataset | Task | Accuracy | F1 / AUC | Params | Latency | Source | |
| |---|---|---|---|---|---|---| |
| | **ECG Heartbeat** (MIT-BIH) | 5-class arrhythmia | 97.2% | F1 0.853 | 12,756 | 56μs | [shayanfazeli/heartbeat](https://kaggle.com/datasets/shayanfazeli/heartbeat) | |
| | **EEG Emotions** | 3-class sentiment | 99.1% | F1 0.991 | 163,788 | 1.3ms | [birdy654/eeg-brainwave-dataset-feeling-emotions](https://kaggle.com/datasets/birdy654/eeg-brainwave-dataset-feeling-emotions) | |
| | **EEG Eye State** | Binary open/closed | 94.2% | AUC 0.986 | 1,576 | 17μs | [robikscube/eye-state-classification-eeg-dataset](https://kaggle.com/datasets/robikscube/eye-state-classification-eeg-dataset) | |
| | **Seizure Prediction** (Bonn) | Binary seizure | 97.1% | AUC 0.988 | 12,072 | — | [harunshimanto/epileptic-seizure-recognition](https://kaggle.com/datasets/harunshimanto/epileptic-seizure-recognition) | |
| | **HAR Smartphones** (UCI) | 6-class activity | 94.9% | F1 0.949 | 15,416 | 70μs | [uciml/human-activity-recognition-with-smartphones](https://kaggle.com/datasets/uciml/human-activity-recognition-with-smartphones) | |
| | **Fraud Detection** | 3-class fraud | 91.6% | — | 1,059 | 5μs | Proprietary | |
|
|
| All benchmarks run on CPU. No GPU required. Pure NumPy inference. |
|
|
| ### Parameter Efficiency |
|
|
| For comparison, typical models on these datasets: |
|
|
| | Dataset | Typical CNN/LSTM | KestrelNet/GoshawkNet | Reduction | |
| |---|---|---|---| |
| | ECG Heartbeat | 500K–2M params | 12,756 | **40–160x smaller** | |
| | EEG Emotions | 1M+ params | 163,788 | **6x smaller** | |
| | EEG Eye State | 100K+ params | 1,576 | **63x smaller** | |
| | HAR Smartphones | 200K–1M params | 15,416 | **13–65x smaller** | |
|
|
| ## Quick Start |
|
|
| ```python |
| import numpy as np |
| from kestrelnet import KestrelNet |
| |
| model = KestrelNet.from_pretrained("kestrelnet/fraud-classifier") |
| scores = model.predict([1.2, 14, 2, 0.1, 1, 3, 0.05, False, True, True, 365, 0.0]) |
| # {'legitimate': 0.983, 'review': 0.017, 'fraudulent': 0.000} |
| ``` |
|
|
| ## Intended Use |
|
|
| - Real-time fraud screening for payment processors |
| - Pre-filter before heavier ML models (ensemble first stage) |
| - Edge deployment where GPU is unavailable |
| - Educational reference for from-scratch neural networks |
|
|
| ## Limitations |
|
|
| - Trained on synthetic/proprietary data — accuracy on your distribution will vary |
| - 14 fixed features — cannot ingest raw transaction logs directly |
| - No sequence modeling — treats each transaction independently |
| - Small capacity means it cannot memorize complex fraud patterns |
|
|
| ## How to Cite |
|
|
| ```bibtex |
| @misc{kestrelnet2026, |
| title={KestrelNet: Sub-Kilobyte Neural Fraud Classifier}, |
| author={KestrelNet Team}, |
| year={2026}, |
| url={https://huggingface.co/kestrelnet/fraud-classifier} |
| } |
| ``` |
|
|