Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.15.2
metadata
title: EDA Explorer
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.16.0
python_version: '3.10'
app_file: app.py
pinned: false
π EDA Explorer β AI-Powered Data Analysis CLI
A lightweight CLI tool that automates exploratory data analysis (EDA) with intelligent insights, feature importance detection, and data quality checks.
Designed to simulate how an AI Data Analyst works on real-world datasets used in EDA.
β‘ Key Highlights
- π One-command analysis β
analyze <dataset> - π§ Auto target detection for ML-based insights
- π Feature importance (no manual setup)
- β οΈ Smart data warnings (missing, ID columns, constants)
- π Correlation & outlier detection
- π Auto report generation (.txt)
- β‘ Efficient handling of large datasets (Parquet + sampling)
π¬ Demo
π Full demo: https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314
https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314
π Example Output
Top Correlations:
- age β income: 0.72
- tenure β balance: 0.65
β οΈ Data Warnings:
- customer_id β likely ID column
- income β 52% missing values
π Feature Importance:
- age: 0.41 (strong signal)
- tenure: 0.32 (strong signal)
π§ What Makes It Stand Out
- Automatically identifies useful vs irrelevant features
- No manual preprocessing required
- Mimics real-world data analyst reasoning
- Built using a modular agent-based system
β‘ Performance
- Parquet-based storage for faster I/O
- Sampling strategy for large datasets
π οΈ System Design
- Command handler
- Dataset registry
- Modular agents (AnalysisAgent, etc.)
- Logger integration
π¦ Datasets
- Titanic
- Customer Churn
- Credit Card Fraud
π οΈ Tech Stack
- Python
- Pandas, NumPy
- Scikit-learn
- Parquet
π Future Enhancements
- RAG-based EDA advisor
- SQL query assistant
- Model training pipeline