File size: 4,236 Bytes

9f07359

---
license: mit
language:
- en
metrics:
- r_squared
- accuracy
- mae
- mse
- f1
- recall
tags:
- machine-learning
- algorithms
- tabular-data
- knn
- python
- weighted-knn
- data-science
- preprocessing
---


SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API.


# Model Details


Model Description
SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability.

Developed by: Jashwanth Thatipamula  
Model type: Weighted KNN for tabular ML  
License: MIT  
Language(s): Not language-dependent (numerical tabular ML)  
Finetuned from model: Not applicable (original algorithm)

Model Sources
Repository: https://github.com/thatipamula-jashwanth/smart-knn  
Paper (DOI): https://doi.org/10.5281/zenodo.17713746  
Demo: Coming soon


# Uses


Direct Use
• Regression on tabular datasets  
• Classification on tabular datasets  
• Interpretable ML where feature importance matters  
• Real-world ML pipelines with missing values and noisy features

Downstream Use
• Research on distance-metric learning  
• Explainable ML baselines  
• AutoML components for tabular data

Out-of-Scope Use
• NLP, image or audio modelling  
• Deep learning / GPU models  
• Raw categorical datasets without encoding


# Bias, Risks, and Limitations

• Instance-based prediction can be slower than tree-based models on large datasets  
• Low performance on categorical-only datasets without encoding  
• Requires storing full training set for inference

Recommendations
Users should numerically encode categorical features before fitting SmartKNN.


# How to Get Started with the Model


pip install smart-knn

import pandas as pd
from smart_knn import SmartKNN

df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]

model = SmartKNN(k=5)
model.fit(X, y)

sample = X.iloc[0]
pred = model.predict(sample)
print(pred)


# Training Details


Training Data
SmartKNN is not pretrained and does not ship with training data; users train on their own dataset.

Preprocessing
Performed automatically:
• Normalization
• NaN / Inf cleaning
• Median imputation
• Outlier clipping
• Feature filtering via learned weights

Training Hyperparameters
• k = number of neighbors  
• weight_threshold = drop features below learned importance


# Evaluation
Testing Data
Evaluated across 35 regression and 20 classification public tabular datasets.

# Metrics
Regression: R², MSE  
Classification: Accuracy

# Results
• Regression: SmartKNN outperformed classical KNN on 90%+ datasets  
• Classification: SmartKNN beat classical KNN on 60% of datasets  

# Summary
SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity.


# Environmental Impact


SmartKNN requires no GPU and has minimal energy usage.
Hardware Type: CPU  
Hours used: Minimal  
Carbon Emitted: Negligible

# Technical Specifications


Model Architecture and Objective
• Instance-based learner  
• Weighted Euclidean distance metric  
• Learned feature weights (MSE + MI + Random Forest)

Compute Infrastructure
• Runs efficiently on CPU systems  
• Implemented using NumPy


# Citation


@software{smartknn2025,
  author       = {Jashwanth Thatipamula},
  title        = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17713746},
  url          = {https://doi.org/10.5281/zenodo.17713746}
}


# Model Card Authors

Jashwanth Thatipamula

Model Card Contact
Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn