SmartKNN_v2 / README.md
JashuXo's picture
Update README.md
dd8886e verified
---
license: mit
language:
- en
metrics:
- accuracy
- f1
- r_squared
- mse
tags:
- knn
- nearest-neighbors
- tabular
- classification
- regression
- cpu
- low-latency
- ann
- distance-weighted
- production-ready
---
---
language:
- en
tags:
- tabular
- nearest-neighbors
- knn
- classification
- regression
- cpu
- low-latency
- interpretable
library_name: smart-knn
license: mit
pipeline_tag: tabular-classification
model_name: SmartKNN v2
---
# SmartKNN v2
**SmartKNN v2** is a high-performance, CPU-first nearest-neighbors model designed for **low-latency production inference** on real-world tabular data.
It delivers **competitive accuracy with gradient-boosted models** while maintaining **sub-millisecond single-prediction latency (p95)** on CPU-only systems.
SmartKNN v2 is part of the **SmartEco** ecosystem.
---
## Model Details
- **Model type:** Distance-weighted K-Nearest Neighbors
- **Tasks:** Classification, Regression
- **Backend:** Adaptive (Brute-force + ANN)
- **Hardware:** CPU-only (GPU not required)
- **Focus:** Low latency, interpretability, production readiness
Unlike classical KNN, SmartKNN v2 learns feature importance, adapts execution strategy based on data size, and uses optimized distance kernels for fast inference.
---
## What’s New in v2
- Full classification support restored
- ANN backend introduced for scalable neighbor search
- Automatic backend selection (small → brute, large → ANN)
- Distance-weighted voting for improved accuracy
- Interpretable neighbor influence statistics
- Foundation for adaptive-K strategies
---
## Architecture Overview
- Feature Weighting
- Backend Selector
- Brute Backend (small datasets)
- ANN Backend (large datasets)
- Distance Kernel
- Weighted Voting
- Prediction
This hybrid architecture ensures consistent low latency across dataset sizes.
---
## Performance (Internal Evaluation)
> Public benchmarks will be released soon.
From internal testing on real-world tabular datasets:
- Accuracy comparable to XGBoost / LightGBM / CatBoost
- Single-prediction latency:
- Median: sub-millisecond
- p95: consistently low on CPU
- Predictable batch inference scaling
SmartKNN v2 has **not yet reached its performance ceiling**. Future releases will further optimize speed and accuracy.
---
## Limitations
- Not designed for unstructured data (text, images)
- ANN backend focuses on CPU efficiency, not GPU acceleration
- Best suited for tabular datasets
---
## Future Work
- Adaptive-K accuracy optimization
- Kernel-level speed improvements
- Custom ANN backend
## Links
- Website: https://thatipamula-jashwanth.github.io/SmartEco/
- Source Code: https://github.com/thatipamula-jashwanth/smart-knn