Spaces:
Running
Running
File size: 2,140 Bytes
95a4bcf 6ba44f9 595e83a 95a4bcf b2fb95a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | ---
title: EDA Explorer
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.16.0
python_version: "3.10"
app_file: app.py
pinned: false
---
# π EDA Explorer β AI-Powered Data Analysis CLI
A lightweight CLI tool that automates exploratory data analysis (EDA) with intelligent insights, feature importance detection, and data quality checks.
Designed to simulate how an **AI Data Analyst** works on real-world datasets used in EDA.
---
## β‘ Key Highlights
- π One-command analysis β `analyze <dataset>`
- π§ Auto target detection for ML-based insights
- π Feature importance (no manual setup)
- β οΈ Smart data warnings (missing, ID columns, constants)
- π Correlation & outlier detection
- π Auto report generation (.txt)
- β‘ Efficient handling of large datasets (Parquet + sampling)
---
## π¬ Demo
π Full demo: https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314
https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314
---
## π Example Output
Top Correlations:
- age β income: 0.72
- tenure β balance: 0.65
β οΈ Data Warnings:
- customer_id β likely ID column
- income β 52% missing values
π Feature Importance:
- age: 0.41 (strong signal)
- tenure: 0.32 (strong signal)
---
## π§ What Makes It Stand Out
- Automatically identifies **useful vs irrelevant features**
- No manual preprocessing required
- Mimics real-world **data analyst reasoning**
- Built using a **modular agent-based system**
---
## β‘ Performance
- Parquet-based storage for faster I/O
- Sampling strategy for large datasets
---
## π οΈ System Design
- Command handler
- Dataset registry
- Modular agents (AnalysisAgent, etc.)
- Logger integration
---
## π¦ Datasets
- Titanic
- Customer Churn
- Credit Card Fraud
---
## π οΈ Tech Stack
- Python
- Pandas, NumPy
- Scikit-learn
- Parquet
---
## π Future Enhancements
- RAG-based EDA advisor
- SQL query assistant
- Model training pipeline
|