--- title: SimpleClean emoji: 🧹 colorFrom: yellow colorTo: pink sdk: docker app_port: 8501 tags: - streamlit - data-cleaning - preprocessing - imputation - encoding pinned: false short_description: Clean your data interactively β€” no code required. --- # SimpleClean Interactive Streamlit dashboard to clean and preprocess your datasets: handle missing values, encode categories, scale features, remove duplicates. ## Author Eduardo Nacimiento GarcΓ­a πŸ“§ enacimie@ull.edu.es πŸ“œ Apache 2.0 License ## Features - Upload CSV or use built-in demo dataset - Data quality report: missing values, duplicates, data types - Interactive cleaning: - 🧹 Remove duplicate rows - 🩹 Impute missing values (Mean, Median, Mode, Constant, KNN) - πŸ”  Encode categorical variables (Label Encoding, One-Hot Encoding) - πŸ“ Scale numeric variables (StandardScaler, MinMaxScaler) - Visualize missing data with Plotly - Download cleaned dataset as CSV - Reset to original anytime ## Demo Dataset Includes sample data with: - Numeric columns: Age, Income, Satisfaction - Categorical columns: City, Gender, Has_Children - Intentional missing values and duplicates ## Deployment Ready for [Hugging Face Spaces](https://huggingface.co/spaces) (free tier). > ⚠️ Uses `sdk: docker` β€” include `Dockerfile`. ## Requirements - Python 3.8+ - Streamlit, pandas, numpy, scikit-learn, plotly --- πŸ’‘ Tip: Clean step-by-step β†’ preview changes β†’ download when ready!