|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- r_squared |
|
|
- mse |
|
|
tags: |
|
|
- knn |
|
|
- nearest-neighbors |
|
|
- tabular |
|
|
- classification |
|
|
- regression |
|
|
- cpu |
|
|
- low-latency |
|
|
- ann |
|
|
- distance-weighted |
|
|
- production-ready |
|
|
--- |
|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- tabular |
|
|
- nearest-neighbors |
|
|
- knn |
|
|
- classification |
|
|
- regression |
|
|
- cpu |
|
|
- low-latency |
|
|
- interpretable |
|
|
library_name: smart-knn |
|
|
license: mit |
|
|
pipeline_tag: tabular-classification |
|
|
model_name: SmartKNN v2 |
|
|
--- |
|
|
|
|
|
# SmartKNN v2 |
|
|
|
|
|
**SmartKNN v2** is a high-performance, CPU-first nearest-neighbors model designed for **low-latency production inference** on real-world tabular data. |
|
|
|
|
|
It delivers **competitive accuracy with gradient-boosted models** while maintaining **sub-millisecond single-prediction latency (p95)** on CPU-only systems. |
|
|
|
|
|
SmartKNN v2 is part of the **SmartEco** ecosystem. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model type:** Distance-weighted K-Nearest Neighbors |
|
|
- **Tasks:** Classification, Regression |
|
|
- **Backend:** Adaptive (Brute-force + ANN) |
|
|
- **Hardware:** CPU-only (GPU not required) |
|
|
- **Focus:** Low latency, interpretability, production readiness |
|
|
|
|
|
Unlike classical KNN, SmartKNN v2 learns feature importance, adapts execution strategy based on data size, and uses optimized distance kernels for fast inference. |
|
|
|
|
|
--- |
|
|
|
|
|
## What’s New in v2 |
|
|
|
|
|
- Full classification support restored |
|
|
- ANN backend introduced for scalable neighbor search |
|
|
- Automatic backend selection (small → brute, large → ANN) |
|
|
- Distance-weighted voting for improved accuracy |
|
|
- Interpretable neighbor influence statistics |
|
|
- Foundation for adaptive-K strategies |
|
|
|
|
|
--- |
|
|
|
|
|
## Architecture Overview |
|
|
|
|
|
- Feature Weighting |
|
|
- Backend Selector |
|
|
- Brute Backend (small datasets) |
|
|
- ANN Backend (large datasets) |
|
|
- Distance Kernel |
|
|
- Weighted Voting |
|
|
- Prediction |
|
|
|
|
|
|
|
|
This hybrid architecture ensures consistent low latency across dataset sizes. |
|
|
|
|
|
--- |
|
|
|
|
|
## Performance (Internal Evaluation) |
|
|
|
|
|
> Public benchmarks will be released soon. |
|
|
|
|
|
From internal testing on real-world tabular datasets: |
|
|
|
|
|
- Accuracy comparable to XGBoost / LightGBM / CatBoost |
|
|
- Single-prediction latency: |
|
|
- Median: sub-millisecond |
|
|
- p95: consistently low on CPU |
|
|
- Predictable batch inference scaling |
|
|
|
|
|
SmartKNN v2 has **not yet reached its performance ceiling**. Future releases will further optimize speed and accuracy. |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Not designed for unstructured data (text, images) |
|
|
- ANN backend focuses on CPU efficiency, not GPU acceleration |
|
|
- Best suited for tabular datasets |
|
|
|
|
|
--- |
|
|
|
|
|
## Future Work |
|
|
|
|
|
- Adaptive-K accuracy optimization |
|
|
- Kernel-level speed improvements |
|
|
- Custom ANN backend |
|
|
|
|
|
## Links |
|
|
|
|
|
- Website: https://thatipamula-jashwanth.github.io/SmartEco/ |
|
|
- Source Code: https://github.com/thatipamula-jashwanth/smart-knn |