File size: 1,485 Bytes
c79f7ff
f951806
3877afb
 
 
c79f7ff
 
 
3877afb
 
 
 
 
c79f7ff
3877afb
c79f7ff
 
3877afb
c79f7ff
3877afb
c79f7ff
3877afb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
title: SimpleClean
emoji: 🧹
colorFrom: yellow
colorTo: pink
sdk: docker
app_port: 8501
tags:
  - streamlit
  - data-cleaning
  - preprocessing
  - imputation
  - encoding
pinned: false
short_description: Clean your data interactively  no code required.
---

# SimpleClean

Interactive Streamlit dashboard to clean and preprocess your datasets: handle missing values, encode categories, scale features, remove duplicates.

## Author
Eduardo Nacimiento García  
📧 enacimie@ull.edu.es  
📜 Apache 2.0 License

## Features
- Upload CSV or use built-in demo dataset
- Data quality report: missing values, duplicates, data types
- Interactive cleaning:
  - 🧹 Remove duplicate rows
  - 🩹 Impute missing values (Mean, Median, Mode, Constant, KNN)
  - 🔠 Encode categorical variables (Label Encoding, One-Hot Encoding)
  - 📏 Scale numeric variables (StandardScaler, MinMaxScaler)
- Visualize missing data with Plotly
- Download cleaned dataset as CSV
- Reset to original anytime

## Demo Dataset
Includes sample data with:
- Numeric columns: Age, Income, Satisfaction
- Categorical columns: City, Gender, Has_Children
- Intentional missing values and duplicates

## Deployment
Ready for [Hugging Face Spaces](https://huggingface.co/spaces) (free tier).

> ⚠️ Uses `sdk: docker` — include `Dockerfile`.

## Requirements
- Python 3.8+
- Streamlit, pandas, numpy, scikit-learn, plotly

---

💡 Tip: Clean step-by-step → preview changes → download when ready!