Spaces:
Running
Running
| title: EDA Explorer | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 5.16.0 | |
| python_version: "3.10" | |
| app_file: app.py | |
| pinned: false | |
| # π EDA Explorer β AI-Powered Data Analysis CLI | |
| A lightweight CLI tool that automates exploratory data analysis (EDA) with intelligent insights, feature importance detection, and data quality checks. | |
| Designed to simulate how an **AI Data Analyst** works on real-world datasets used in EDA. | |
| --- | |
| ## β‘ Key Highlights | |
| - π One-command analysis β `analyze <dataset>` | |
| - π§ Auto target detection for ML-based insights | |
| - π Feature importance (no manual setup) | |
| - β οΈ Smart data warnings (missing, ID columns, constants) | |
| - π Correlation & outlier detection | |
| - π Auto report generation (.txt) | |
| - β‘ Efficient handling of large datasets (Parquet + sampling) | |
| --- | |
| ## π¬ Demo | |
| π Full demo: https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314 | |
| https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314 | |
| --- | |
| ## π Example Output | |
| Top Correlations: | |
| - age β income: 0.72 | |
| - tenure β balance: 0.65 | |
| β οΈ Data Warnings: | |
| - customer_id β likely ID column | |
| - income β 52% missing values | |
| π Feature Importance: | |
| - age: 0.41 (strong signal) | |
| - tenure: 0.32 (strong signal) | |
| --- | |
| ## π§ What Makes It Stand Out | |
| - Automatically identifies **useful vs irrelevant features** | |
| - No manual preprocessing required | |
| - Mimics real-world **data analyst reasoning** | |
| - Built using a **modular agent-based system** | |
| --- | |
| ## β‘ Performance | |
| - Parquet-based storage for faster I/O | |
| - Sampling strategy for large datasets | |
| --- | |
| ## π οΈ System Design | |
| - Command handler | |
| - Dataset registry | |
| - Modular agents (AnalysisAgent, etc.) | |
| - Logger integration | |
| --- | |
| ## π¦ Datasets | |
| - Titanic | |
| - Customer Churn | |
| - Credit Card Fraud | |
| --- | |
| ## π οΈ Tech Stack | |
| - Python | |
| - Pandas, NumPy | |
| - Scikit-learn | |
| - Parquet | |
| --- | |
| ## π Future Enhancements | |
| - RAG-based EDA advisor | |
| - SQL query assistant | |
| - Model training pipeline | |