--- license: apache-2.0 library_name: numpy tags: - fraud-detection - tabular-classification - tiny-model - edge-ai - no-gpu - numpy - real-time - explainable-ai - analytic-gradients datasets: - custom metrics: - accuracy - latency model-index: - name: KestrelNet Fraud Classifier results: - task: type: tabular-classification name: Fraud Detection metrics: - name: Accuracy type: accuracy value: 0.916 - name: Inference Latency type: latency value: 0.005ms - name: Parameters type: params value: 1059 pipeline_tag: tabular-classification --- # KestrelNet — 1,059-Parameter Fraud Classifier A fully-connected neural network for real-time transaction fraud detection. Built from scratch with **pure NumPy** — no PyTorch, no TensorFlow, no ONNX runtime. The entire model fits in a single tweet. ## Why This Exists Most fraud detection models are overbuilt. We wanted to find the floor: what's the smallest model that still works? Turns out, **1,059 parameters** gets you to 91.6% accuracy with sub-microsecond inference on commodity hardware. ## Performance | Metric | Value | |---|---| | Accuracy | 91.6% | | Parameters | 1,059 | | Model size | 8.3 KB | | Inference latency | ~5 μs (CPU) | | Throughput | ~190,000 inferences/sec | | Dependencies | NumPy only | For context, a single GPT-2 attention head has more parameters than this entire model. ## Architecture ``` Input (14 features) → Dense(32, ReLU) → Dense(16, ReLU) → Dense(3, Softmax) ``` Three layers. No batch norm, no attention, no residual connections. Just matrix multiplies and ReLU. **Training** uses analytic backpropagation — full gradient computation without autograd. Every partial derivative is derived by hand and implemented directly. This makes the training loop ~10x faster than equivalent PyTorch code for models this size. ### GullNet Variant We also offer a **GullNet** variant that replaces standard dot products with multivector products, giving the network native access to rotations, reflections, and scaling in a single operation — useful when feature interactions have geometric structure. The GullNet variant has more parameters but can capture complex feature relationships that FC nets miss. ## Input Features The model expects a 14-dimensional normalized feature vector: | Index | Feature | Normalization | |---|---|---| | 0 | `amount_vs_avg` | Transaction amount / 90-day average | | 1-2 | `hour_sin`, `hour_cos` | Cyclical encoding of transaction hour | | 3-4 | `day_sin`, `day_cos` | Cyclical encoding of day of week | | 5 | `location_delta` | Std deviations from usual location | | 6 | `velocity_1h` | Transactions in past hour / 10, clipped | | 7 | `velocity_24h` | Transactions in past 24h / 30, clipped | | 8 | `merchant_risk` | Merchant category risk score [0-1] | | 9 | `international` | Cross-border transaction (0/1) | | 10 | `card_present` | Physical card used (0/1) | | 11 | `device_match` | Known device (0/1) | | 12 | `account_age_norm` | Account age / 3650 days | | 13 | `prev_fraud_score` | Historical fraud rate [0-1] | ## Output Three-class softmax: `[legitimate, review, fraudulent]` Threshold modes control the decision boundary: - **Standard** — Balanced precision/recall - **Conservative** — Flags more transactions (fewer false negatives) - **Strict** — Flags fewer (fewer false positives) ## Benchmarks — Public Datasets KestrelNet and GoshawkNet evaluated on public Kaggle datasets. All results independently reproducible. | Dataset | Task | Accuracy | F1 / AUC | Params | Latency | Source | |---|---|---|---|---|---|---| | **ECG Heartbeat** (MIT-BIH) | 5-class arrhythmia | 97.2% | F1 0.853 | 12,756 | 56μs | [shayanfazeli/heartbeat](https://kaggle.com/datasets/shayanfazeli/heartbeat) | | **EEG Emotions** | 3-class sentiment | 99.1% | F1 0.991 | 163,788 | 1.3ms | [birdy654/eeg-brainwave-dataset-feeling-emotions](https://kaggle.com/datasets/birdy654/eeg-brainwave-dataset-feeling-emotions) | | **EEG Eye State** | Binary open/closed | 94.2% | AUC 0.986 | 1,576 | 17μs | [robikscube/eye-state-classification-eeg-dataset](https://kaggle.com/datasets/robikscube/eye-state-classification-eeg-dataset) | | **Seizure Prediction** (Bonn) | Binary seizure | 97.1% | AUC 0.988 | 12,072 | — | [harunshimanto/epileptic-seizure-recognition](https://kaggle.com/datasets/harunshimanto/epileptic-seizure-recognition) | | **HAR Smartphones** (UCI) | 6-class activity | 94.9% | F1 0.949 | 15,416 | 70μs | [uciml/human-activity-recognition-with-smartphones](https://kaggle.com/datasets/uciml/human-activity-recognition-with-smartphones) | | **Fraud Detection** | 3-class fraud | 91.6% | — | 1,059 | 5μs | Proprietary | All benchmarks run on CPU. No GPU required. Pure NumPy inference. ### Parameter Efficiency For comparison, typical models on these datasets: | Dataset | Typical CNN/LSTM | KestrelNet/GoshawkNet | Reduction | |---|---|---|---| | ECG Heartbeat | 500K–2M params | 12,756 | **40–160x smaller** | | EEG Emotions | 1M+ params | 163,788 | **6x smaller** | | EEG Eye State | 100K+ params | 1,576 | **63x smaller** | | HAR Smartphones | 200K–1M params | 15,416 | **13–65x smaller** | ## Quick Start ```python import numpy as np from kestrelnet import KestrelNet model = KestrelNet.from_pretrained("kestrelnet/fraud-classifier") scores = model.predict([1.2, 14, 2, 0.1, 1, 3, 0.05, False, True, True, 365, 0.0]) # {'legitimate': 0.983, 'review': 0.017, 'fraudulent': 0.000} ``` ## Intended Use - Real-time fraud screening for payment processors - Pre-filter before heavier ML models (ensemble first stage) - Edge deployment where GPU is unavailable - Educational reference for from-scratch neural networks ## Limitations - Trained on synthetic/proprietary data — accuracy on your distribution will vary - 14 fixed features — cannot ingest raw transaction logs directly - No sequence modeling — treats each transaction independently - Small capacity means it cannot memorize complex fraud patterns ## How to Cite ```bibtex @misc{kestrelnet2026, title={KestrelNet: Sub-Kilobyte Neural Fraud Classifier}, author={KestrelNet Team}, year={2026}, url={https://huggingface.co/kestrelnet/fraud-classifier} } ```