EDA_Explorer / README.md
ProfessionalMario's picture
Update README.md
6ba44f9 verified

A newer version of the Gradio SDK is available: 6.15.2

Upgrade
metadata
title: EDA Explorer
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.16.0
python_version: '3.10'
app_file: app.py
pinned: false

πŸš€ EDA Explorer – AI-Powered Data Analysis CLI

A lightweight CLI tool that automates exploratory data analysis (EDA) with intelligent insights, feature importance detection, and data quality checks.

Designed to simulate how an AI Data Analyst works on real-world datasets used in EDA.


⚑ Key Highlights

  • πŸ” One-command analysis β†’ analyze <dataset>
  • 🧠 Auto target detection for ML-based insights
  • πŸ“ˆ Feature importance (no manual setup)
  • ⚠️ Smart data warnings (missing, ID columns, constants)
  • πŸ“Š Correlation & outlier detection
  • πŸ“ Auto report generation (.txt)
  • ⚑ Efficient handling of large datasets (Parquet + sampling)

🎬 Demo

πŸ‘‰ Full demo: https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314

https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314


πŸ“Š Example Output

Top Correlations:

  • age ↔ income: 0.72
  • tenure ↔ balance: 0.65

⚠️ Data Warnings:

  • customer_id β†’ likely ID column
  • income β†’ 52% missing values

πŸ“ˆ Feature Importance:

  • age: 0.41 (strong signal)
  • tenure: 0.32 (strong signal)

🧠 What Makes It Stand Out

  • Automatically identifies useful vs irrelevant features
  • No manual preprocessing required
  • Mimics real-world data analyst reasoning
  • Built using a modular agent-based system

⚑ Performance

  • Parquet-based storage for faster I/O
  • Sampling strategy for large datasets

πŸ› οΈ System Design

  • Command handler
  • Dataset registry
  • Modular agents (AnalysisAgent, etc.)
  • Logger integration

πŸ“¦ Datasets

  • Titanic
  • Customer Churn
  • Credit Card Fraud

πŸ› οΈ Tech Stack

  • Python
  • Pandas, NumPy
  • Scikit-learn
  • Parquet

πŸš€ Future Enhancements

  • RAG-based EDA advisor
  • SQL query assistant
  • Model training pipeline