| | --- |
| | license: mit |
| | language: |
| | - en |
| | metrics: |
| | - r_squared |
| | - accuracy |
| | - mae |
| | - mse |
| | - f1 |
| | - recall |
| | tags: |
| | - machine-learning |
| | - algorithms |
| | - tabular-data |
| | - knn |
| | - python |
| | - weighted-knn |
| | - data-science |
| | - preprocessing |
| | --- |
| | |
| |
|
| | SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API. |
| |
|
| |
|
| | # Model Details |
| |
|
| |
|
| | Model Description |
| | SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability. |
| |
|
| | Developed by: Jashwanth Thatipamula |
| | Model type: Weighted KNN for tabular ML |
| | License: MIT |
| | Language(s): Not language-dependent (numerical tabular ML) |
| | Finetuned from model: Not applicable (original algorithm) |
| |
|
| | Model Sources |
| | Repository: https://github.com/thatipamula-jashwanth/smart-knn |
| | Paper (DOI): https://doi.org/10.5281/zenodo.17713746 |
| | Demo: Coming soon |
| |
|
| |
|
| | # Uses |
| |
|
| |
|
| | Direct Use |
| | • Regression on tabular datasets |
| | • Classification on tabular datasets |
| | • Interpretable ML where feature importance matters |
| | • Real-world ML pipelines with missing values and noisy features |
| |
|
| | Downstream Use |
| | • Research on distance-metric learning |
| | • Explainable ML baselines |
| | • AutoML components for tabular data |
| |
|
| | Out-of-Scope Use |
| | • NLP, image or audio modelling |
| | • Deep learning / GPU models |
| | • Raw categorical datasets without encoding |
| |
|
| |
|
| | # Bias, Risks, and Limitations |
| |
|
| | • Instance-based prediction can be slower than tree-based models on large datasets |
| | • Low performance on categorical-only datasets without encoding |
| | • Requires storing full training set for inference |
| |
|
| | Recommendations |
| | Users should numerically encode categorical features before fitting SmartKNN. |
| |
|
| |
|
| | # How to Get Started with the Model |
| |
|
| |
|
| | pip install smart-knn |
| |
|
| | import pandas as pd |
| | from smart_knn import SmartKNN |
| | |
| | df = pd.read_csv("data.csv") |
| | X = df.drop("target", axis=1) |
| | y = df["target"] |
| |
|
| | model = SmartKNN(k=5) |
| | model.fit(X, y) |
| |
|
| | sample = X.iloc[0] |
| | pred = model.predict(sample) |
| | print(pred) |
| |
|
| |
|
| | # Training Details |
| |
|
| |
|
| | Training Data |
| | SmartKNN is not pretrained and does not ship with training data; users train on their own dataset. |
| |
|
| | Preprocessing |
| | Performed automatically: |
| | • Normalization |
| | • NaN / Inf cleaning |
| | • Median imputation |
| | • Outlier clipping |
| | • Feature filtering via learned weights |
| |
|
| | Training Hyperparameters |
| | • k = number of neighbors |
| | • weight_threshold = drop features below learned importance |
| | |
| | |
| | # Evaluation |
| | Testing Data |
| | Evaluated across 35 regression and 20 classification public tabular datasets. |
| | |
| | # Metrics |
| | Regression: R², MSE |
| | Classification: Accuracy |
| | |
| | # Results |
| | • Regression: SmartKNN outperformed classical KNN on 90%+ datasets |
| | • Classification: SmartKNN beat classical KNN on 60% of datasets |
| | |
| | # Summary |
| | SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity. |
| | |
| | |
| | # Environmental Impact |
| | |
| | |
| | SmartKNN requires no GPU and has minimal energy usage. |
| | Hardware Type: CPU |
| | Hours used: Minimal |
| | Carbon Emitted: Negligible |
| | |
| | # Technical Specifications |
| | |
| | |
| | Model Architecture and Objective |
| | • Instance-based learner |
| | • Weighted Euclidean distance metric |
| | • Learned feature weights (MSE + MI + Random Forest) |
| | |
| | Compute Infrastructure |
| | • Runs efficiently on CPU systems |
| | • Implemented using NumPy |
| | |
| | |
| | # Citation |
| | |
| | |
| | @software{smartknn2025, |
| | author = {Jashwanth Thatipamula}, |
| | title = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours}, |
| | year = {2025}, |
| | publisher = {Zenodo}, |
| | doi = {10.5281/zenodo.17713746}, |
| | url = {https://doi.org/10.5281/zenodo.17713746} |
| | } |
| | |
| | |
| | # Model Card Authors |
| | |
| | Jashwanth Thatipamula |
| | |
| | Model Card Contact |
| | Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn |