Spaces:
No application file
No application file
| π Project Overview | |
| AI Data Analyst Pro is a comprehensive, no-code data analysis platform that leverages artificial intelligence to help users extract insights from their data instantly. Built with Streamlit, it combines the power of machine learning, statistical analysis, and interactive visualizations into an intuitive interface that both technical and non-technical users can navigate with ease. | |
| β¨ What Makes It Special? | |
| Natural Language Interface: Ask questions about your data in plain English and get instant answers | |
| Zero Configuration: Upload your data and let the AI automatically detect column types, suggest analyses, and generate insights | |
| Enterprise-Grade Analysis: Professional statistical tests, ML algorithms, and visualizations without writing a single line of code | |
| Real-Time Collaboration: Share insights instantly with interactive dashboards | |
| π― Key Features | |
| 1. π€ Smart Data Upload | |
| Multi-format support: CSV, Excel files with automatic encoding detection (UTF-8, Latin-1, ISO-8859-1) | |
| Intelligent validation: Real-time data quality checks with instant feedback | |
| Large file handling: Optimized for files up to 200MB with sampling options | |
| Smart preview: Interactive data viewer with column statistics and type detection | |
| 2. π§Ή Auto Preprocessing | |
| Intelligent missing value handling: Multiple imputation strategies (mean, median, mode, KNN) | |
| Outlier detection: IQR-based and Isolation Forest algorithms | |
| Feature scaling: StandardScaler and MinMaxScaler with automatic detection | |
| Automated encoding: Label encoding and one-hot encoding for categorical variables | |
| Feature engineering: Create interaction features, polynomial features, and binning | |
| 3. π Exploratory Data Analysis | |
| 6 comprehensive analysis tabs: Overview, Missing Data, Univariate, Bivariate, Multivariate, Pattern Discovery | |
| 30+ visualizations: Interactive plots for every data type and relationship | |
| Automatic insight generation: AI-powered pattern detection and anomaly identification | |
| Correlation analysis: Pearson, Spearman, and Kendall with significance testing | |
| Distribution analysis: Normality tests, QQ plots, and distribution fitting | |
| 4. π Interactive Visualizations | |
| 20+ plot types: Bar charts, scatter plots, line charts, box plots, violin plots, heatmaps, and more | |
| 3D visualizations: Interactive 3D scatter plots and surface plots | |
| Time series analysis: Decomposition, trend detection, and forecasting | |
| Custom styling: Professional, publication-ready visualizations | |
| Export capabilities: Download plots as PNG or interactive HTML | |
| 5. π€ Machine Learning Pipeline | |
| 15+ algorithms: | |
| Classification: Logistic Regression, Random Forest, XGBoost, LightGBM, SVM, KNN | |
| Regression: Linear Regression, Ridge, Lasso, Decision Trees, Gradient Boosting | |
| Auto task detection: Automatically identifies classification vs regression problems | |
| Model comparison: Side-by-side performance metrics with best model highlighting | |
| Hyperparameter tuning: Grid search with cross-validation | |
| Feature importance: Built-in and permutation importance with visualizations | |
| Model interpretability: SHAP values and partial dependence plots | |
| 6. π Statistical Analysis | |
| 20+ statistical tests: | |
| Parametric: t-tests (1-sample, independent, paired), ANOVA, Z-test | |
| Non-parametric: Mann-Whitney U, Wilcoxon, Kruskal-Wallis | |
| Post-hoc: Tukey HSD with confidence intervals | |
| Distribution fitting: Normal, exponential, gamma, beta, log-normal | |
| Time series analysis: Stationarity tests (ADF, KPSS), ACF/PACF, forecasting | |
| Probability analysis: Confidence intervals, bootstrap sampling, CDF/PDF fitting | |
| 7. π¬ AI Chatbot Assistant | |
| Natural language queries: "Show me the first 10 rows", "Plot histogram of age" | |
| Context-aware responses: Understands data context and column names | |
| Interactive visualizations: Creates plots based on conversation | |
| Smart filtering: "Show rows where income > 50000" | |
| Statistical questions: "What's the correlation between age and income?" | |
| 8. π Data Quality Assessment | |
| Comprehensive quality score: Combines completeness, uniqueness, consistency | |
| Anomaly detection: Isolation Forest and statistical methods | |
| Quality visualizations: Missing value heatmaps, data type distributions | |
| Actionable recommendations: Specific steps to improve data quality | |
| Exportable reports: Download detailed quality assessments | |
| 9. π‘ Business Insights | |
| Automated pattern discovery: Clustering, trend detection, seasonal patterns | |
| Key driver analysis: Identifies most influential factors | |
| Anomaly alerts: Flags unusual patterns automatically | |
| Strategic recommendations: Data-driven business suggestions | |
| Export insights: Download comprehensive insight reports | |
| π οΈ Installation | |
| Prerequisites | |
| Python 3.9 or higher | |
| pip package manager | |
| Git (optional) | |
| Step-by-Step Installation | |
| bash | |
| # 1. Clone the repository | |
| git clone https://github.com/yourusername/ai-data-analyst-pro.git | |
| cd ai-data-analyst-pro | |
| # 2. Create a virtual environment (recommended) | |
| # On Windows: | |
| python -m venv venv | |
| venv\Scripts\activate | |
| # On macOS/Linux: | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| # 3. Install dependencies | |
| pip install -r requirements.txt | |
| # 4. Run the application | |
| streamlit run app.py | |
| After running, open your browser and navigate to http://localhost:8501 π | |
| π¦ Dependencies | |
| txt | |
| streamlit==1.28.0 | |
| pandas==2.0.3 | |
| numpy==1.24.3 | |
| plotly==5.17.0 | |
| scikit-learn==1.3.0 | |
| scipy==1.11.1 | |
| statsmodels==0.14.0 | |
| xgboost==1.7.6 | |
| lightgbm==4.0.0 | |
| matplotlib==3.7.2 | |
| shap==0.42.1 | |
| openpyxl==3.1.2 | |
| π Quick Start Guide | |
| 1. Upload Your Data | |
| Click on the "Upload Dataset" tab | |
| Drag & drop your CSV or Excel file (or click to browse) | |
| Watch as the system automatically validates and displays your data preview | |
| 2. Explore Your Data | |
| Navigate through the EDA tabs to discover patterns | |
| Use the chatbot to ask questions about your data | |
| Generate visualizations with just a few clicks | |
| 3. Clean & Prepare | |
| Go to the Preprocessing tab to handle missing values | |
| Detect and remove outliers automatically | |
| Scale features and encode categorical variables | |
| 4. Build Models | |
| Navigate to the Machine Learning tab | |
| Select your target variable (the AI will suggest the best options) | |
| Choose models and let the pipeline train and compare them | |
| 5. Get Insights | |
| Review the automatically generated business insights | |
| Download comprehensive reports | |
| Share findings with your team | |
| π Project Structure | |
| text | |
| ai-data-analyst-pro/ | |
| β | |
| βββ app.py # Main application with Streamlit UI | |
| βββ chatbot.py # AI-powered data chatbot | |
| βββ data_preprocessing.py # Data cleaning and transformation | |
| βββ dataset_overview.py # Comprehensive EDA visualizations | |
| βββ data_quality.py # Quality metrics and anomaly detection | |
| βββ insights.py # Automated business insights | |
| βββ ml_pipeline.py # Machine learning pipeline | |
| βββ statistical_analysis.py # Statistical tests and analysis | |
| βββ visualization.py # Interactive plot generation | |
| βββ utils.py # Helper functions and utilities | |
| β | |
| βββ requirements.txt # Project dependencies | |
| βββ Dockerfile # Docker configuration | |
| βββ Procfile # Heroku deployment config | |
| βββ .gitignore # Git ignore rules | |
| β | |
| βββ README.md # Project documentation | |
| π’ Deployment | |
| # A u t o _ |