--- license: mit language: - en metrics: - r_squared - accuracy - mae - mse - f1 - recall tags: - machine-learning - algorithms - tabular-data - knn - python - weighted-knn - data-science - preprocessing --- SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API. # Model Details Model Description SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability. Developed by: Jashwanth Thatipamula Model type: Weighted KNN for tabular ML License: MIT Language(s): Not language-dependent (numerical tabular ML) Finetuned from model: Not applicable (original algorithm) Model Sources Repository: https://github.com/thatipamula-jashwanth/smart-knn Paper (DOI): https://doi.org/10.5281/zenodo.17713746 Demo: Coming soon # Uses Direct Use • Regression on tabular datasets • Classification on tabular datasets • Interpretable ML where feature importance matters • Real-world ML pipelines with missing values and noisy features Downstream Use • Research on distance-metric learning • Explainable ML baselines • AutoML components for tabular data Out-of-Scope Use • NLP, image or audio modelling • Deep learning / GPU models • Raw categorical datasets without encoding # Bias, Risks, and Limitations • Instance-based prediction can be slower than tree-based models on large datasets • Low performance on categorical-only datasets without encoding • Requires storing full training set for inference Recommendations Users should numerically encode categorical features before fitting SmartKNN. # How to Get Started with the Model pip install smart-knn import pandas as pd from smart_knn import SmartKNN df = pd.read_csv("data.csv") X = df.drop("target", axis=1) y = df["target"] model = SmartKNN(k=5) model.fit(X, y) sample = X.iloc[0] pred = model.predict(sample) print(pred) # Training Details Training Data SmartKNN is not pretrained and does not ship with training data; users train on their own dataset. Preprocessing Performed automatically: • Normalization • NaN / Inf cleaning • Median imputation • Outlier clipping • Feature filtering via learned weights Training Hyperparameters • k = number of neighbors • weight_threshold = drop features below learned importance # Evaluation Testing Data Evaluated across 35 regression and 20 classification public tabular datasets. # Metrics Regression: R², MSE Classification: Accuracy # Results • Regression: SmartKNN outperformed classical KNN on 90%+ datasets • Classification: SmartKNN beat classical KNN on 60% of datasets # Summary SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity. # Environmental Impact SmartKNN requires no GPU and has minimal energy usage. Hardware Type: CPU Hours used: Minimal Carbon Emitted: Negligible # Technical Specifications Model Architecture and Objective • Instance-based learner • Weighted Euclidean distance metric • Learned feature weights (MSE + MI + Random Forest) Compute Infrastructure • Runs efficiently on CPU systems • Implemented using NumPy # Citation @software{smartknn2025, author = {Jashwanth Thatipamula}, title = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours}, year = {2025}, publisher = {Zenodo}, doi = {10.5281/zenodo.17713746}, url = {https://doi.org/10.5281/zenodo.17713746} } # Model Card Authors Jashwanth Thatipamula Model Card Contact Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn