Spaces:

enacimie
/

SimpleClean

Sleeping

SimpleClean / README.md

Update README.md

3877afb verified 5 months ago

1.49 kB

	---
	title: SimpleClean
	emoji: 🧹
	colorFrom: yellow
	colorTo: pink
	sdk: docker
	app_port: 8501
	tags:
	- streamlit
	- data-cleaning
	- preprocessing
	- imputation
	- encoding
	pinned: false
	short_description: Clean your data interactively — no code required.
	---

	# SimpleClean

	Interactive Streamlit dashboard to clean and preprocess your datasets: handle missing values, encode categories, scale features, remove duplicates.

	## Author
	Eduardo Nacimiento García
	📧 enacimie@ull.edu.es
	📜 Apache 2.0 License

	## Features
	- Upload CSV or use built-in demo dataset
	- Data quality report: missing values, duplicates, data types
	- Interactive cleaning:
	- 🧹 Remove duplicate rows
	- 🩹 Impute missing values (Mean, Median, Mode, Constant, KNN)
	- 🔠 Encode categorical variables (Label Encoding, One-Hot Encoding)
	- 📏 Scale numeric variables (StandardScaler, MinMaxScaler)
	- Visualize missing data with Plotly
	- Download cleaned dataset as CSV
	- Reset to original anytime

	## Demo Dataset
	Includes sample data with:
	- Numeric columns: Age, Income, Satisfaction
	- Categorical columns: City, Gender, Has_Children
	- Intentional missing values and duplicates

	## Deployment
	Ready for [Hugging Face Spaces](https://huggingface.co/spaces) (free tier).

	> ⚠️ Uses `sdk: docker` — include `Dockerfile`.

	## Requirements
	- Python 3.8+
	- Streamlit, pandas, numpy, scikit-learn, plotly

	---

	💡 Tip: Clean step-by-step → preview changes → download when ready!