smart-knn / README.md
JashuXo's picture
Duplicate from JashuXo/smart-knn
9f07359
---
license: mit
language:
- en
metrics:
- r_squared
- accuracy
- mae
- mse
- f1
- recall
tags:
- machine-learning
- algorithms
- tabular-data
- knn
- python
- weighted-knn
- data-science
- preprocessing
---
SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API.
# Model Details
Model Description
SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability.
Developed by: Jashwanth Thatipamula
Model type: Weighted KNN for tabular ML
License: MIT
Language(s): Not language-dependent (numerical tabular ML)
Finetuned from model: Not applicable (original algorithm)
Model Sources
Repository: https://github.com/thatipamula-jashwanth/smart-knn
Paper (DOI): https://doi.org/10.5281/zenodo.17713746
Demo: Coming soon
# Uses
Direct Use
• Regression on tabular datasets
• Classification on tabular datasets
• Interpretable ML where feature importance matters
• Real-world ML pipelines with missing values and noisy features
Downstream Use
• Research on distance-metric learning
• Explainable ML baselines
• AutoML components for tabular data
Out-of-Scope Use
• NLP, image or audio modelling
• Deep learning / GPU models
• Raw categorical datasets without encoding
# Bias, Risks, and Limitations
• Instance-based prediction can be slower than tree-based models on large datasets
• Low performance on categorical-only datasets without encoding
• Requires storing full training set for inference
Recommendations
Users should numerically encode categorical features before fitting SmartKNN.
# How to Get Started with the Model
pip install smart-knn
import pandas as pd
from smart_knn import SmartKNN
df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]
model = SmartKNN(k=5)
model.fit(X, y)
sample = X.iloc[0]
pred = model.predict(sample)
print(pred)
# Training Details
Training Data
SmartKNN is not pretrained and does not ship with training data; users train on their own dataset.
Preprocessing
Performed automatically:
• Normalization
• NaN / Inf cleaning
• Median imputation
• Outlier clipping
• Feature filtering via learned weights
Training Hyperparameters
• k = number of neighbors
• weight_threshold = drop features below learned importance
# Evaluation
Testing Data
Evaluated across 35 regression and 20 classification public tabular datasets.
# Metrics
Regression: R², MSE
Classification: Accuracy
# Results
• Regression: SmartKNN outperformed classical KNN on 90%+ datasets
• Classification: SmartKNN beat classical KNN on 60% of datasets
# Summary
SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity.
# Environmental Impact
SmartKNN requires no GPU and has minimal energy usage.
Hardware Type: CPU
Hours used: Minimal
Carbon Emitted: Negligible
# Technical Specifications
Model Architecture and Objective
• Instance-based learner
• Weighted Euclidean distance metric
• Learned feature weights (MSE + MI + Random Forest)
Compute Infrastructure
• Runs efficiently on CPU systems
• Implemented using NumPy
# Citation
@software{smartknn2025,
author = {Jashwanth Thatipamula},
title = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17713746},
url = {https://doi.org/10.5281/zenodo.17713746}
}
# Model Card Authors
Jashwanth Thatipamula
Model Card Contact
Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn