--- title: EDA Explorer emoji: 📊 colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.16.0 python_version: "3.10" app_file: app.py pinned: false --- # 🚀 EDA Explorer – AI-Powered Data Analysis CLI A lightweight CLI tool that automates exploratory data analysis (EDA) with intelligent insights, feature importance detection, and data quality checks. Designed to simulate how an **AI Data Analyst** works on real-world datasets used in EDA. --- ## ⚡ Key Highlights - 🔍 One-command analysis → `analyze ` - 🧠 Auto target detection for ML-based insights - 📈 Feature importance (no manual setup) - ⚠️ Smart data warnings (missing, ID columns, constants) - 📊 Correlation & outlier detection - 📁 Auto report generation (.txt) - ⚡ Efficient handling of large datasets (Parquet + sampling) --- ## 🎬 Demo 👉 Full demo: https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314 https://github.com/user-attachments/assets/7dff8329-71e8-4bca-ad01-404e75df8314 --- ## 📊 Example Output Top Correlations: - age ↔ income: 0.72 - tenure ↔ balance: 0.65 ⚠️ Data Warnings: - customer_id → likely ID column - income → 52% missing values 📈 Feature Importance: - age: 0.41 (strong signal) - tenure: 0.32 (strong signal) --- ## 🧠 What Makes It Stand Out - Automatically identifies **useful vs irrelevant features** - No manual preprocessing required - Mimics real-world **data analyst reasoning** - Built using a **modular agent-based system** --- ## ⚡ Performance - Parquet-based storage for faster I/O - Sampling strategy for large datasets --- ## 🛠️ System Design - Command handler - Dataset registry - Modular agents (AnalysisAgent, etc.) - Logger integration --- ## 📦 Datasets - Titanic - Customer Churn - Credit Card Fraud --- ## 🛠️ Tech Stack - Python - Pandas, NumPy - Scikit-learn - Parquet --- ## 🚀 Future Enhancements - RAG-based EDA advisor - SQL query assistant - Model training pipeline