dataanalyst / README.MD
RamAi2026's picture
Upload 13 files
da8e446 verified
πŸ“‹ Project Overview
AI Data Analyst Pro is a comprehensive, no-code data analysis platform that leverages artificial intelligence to help users extract insights from their data instantly. Built with Streamlit, it combines the power of machine learning, statistical analysis, and interactive visualizations into an intuitive interface that both technical and non-technical users can navigate with ease.
✨ What Makes It Special?
Natural Language Interface: Ask questions about your data in plain English and get instant answers
Zero Configuration: Upload your data and let the AI automatically detect column types, suggest analyses, and generate insights
Enterprise-Grade Analysis: Professional statistical tests, ML algorithms, and visualizations without writing a single line of code
Real-Time Collaboration: Share insights instantly with interactive dashboards
🎯 Key Features
1. πŸ“€ Smart Data Upload
Multi-format support: CSV, Excel files with automatic encoding detection (UTF-8, Latin-1, ISO-8859-1)
Intelligent validation: Real-time data quality checks with instant feedback
Large file handling: Optimized for files up to 200MB with sampling options
Smart preview: Interactive data viewer with column statistics and type detection
2. 🧹 Auto Preprocessing
Intelligent missing value handling: Multiple imputation strategies (mean, median, mode, KNN)
Outlier detection: IQR-based and Isolation Forest algorithms
Feature scaling: StandardScaler and MinMaxScaler with automatic detection
Automated encoding: Label encoding and one-hot encoding for categorical variables
Feature engineering: Create interaction features, polynomial features, and binning
3. πŸ” Exploratory Data Analysis
6 comprehensive analysis tabs: Overview, Missing Data, Univariate, Bivariate, Multivariate, Pattern Discovery
30+ visualizations: Interactive plots for every data type and relationship
Automatic insight generation: AI-powered pattern detection and anomaly identification
Correlation analysis: Pearson, Spearman, and Kendall with significance testing
Distribution analysis: Normality tests, QQ plots, and distribution fitting
4. πŸ“ˆ Interactive Visualizations
20+ plot types: Bar charts, scatter plots, line charts, box plots, violin plots, heatmaps, and more
3D visualizations: Interactive 3D scatter plots and surface plots
Time series analysis: Decomposition, trend detection, and forecasting
Custom styling: Professional, publication-ready visualizations
Export capabilities: Download plots as PNG or interactive HTML
5. πŸ€– Machine Learning Pipeline
15+ algorithms:
Classification: Logistic Regression, Random Forest, XGBoost, LightGBM, SVM, KNN
Regression: Linear Regression, Ridge, Lasso, Decision Trees, Gradient Boosting
Auto task detection: Automatically identifies classification vs regression problems
Model comparison: Side-by-side performance metrics with best model highlighting
Hyperparameter tuning: Grid search with cross-validation
Feature importance: Built-in and permutation importance with visualizations
Model interpretability: SHAP values and partial dependence plots
6. πŸ“ Statistical Analysis
20+ statistical tests:
Parametric: t-tests (1-sample, independent, paired), ANOVA, Z-test
Non-parametric: Mann-Whitney U, Wilcoxon, Kruskal-Wallis
Post-hoc: Tukey HSD with confidence intervals
Distribution fitting: Normal, exponential, gamma, beta, log-normal
Time series analysis: Stationarity tests (ADF, KPSS), ACF/PACF, forecasting
Probability analysis: Confidence intervals, bootstrap sampling, CDF/PDF fitting
7. πŸ’¬ AI Chatbot Assistant
Natural language queries: "Show me the first 10 rows", "Plot histogram of age"
Context-aware responses: Understands data context and column names
Interactive visualizations: Creates plots based on conversation
Smart filtering: "Show rows where income > 50000"
Statistical questions: "What's the correlation between age and income?"
8. πŸ“Š Data Quality Assessment
Comprehensive quality score: Combines completeness, uniqueness, consistency
Anomaly detection: Isolation Forest and statistical methods
Quality visualizations: Missing value heatmaps, data type distributions
Actionable recommendations: Specific steps to improve data quality
Exportable reports: Download detailed quality assessments
9. πŸ’‘ Business Insights
Automated pattern discovery: Clustering, trend detection, seasonal patterns
Key driver analysis: Identifies most influential factors
Anomaly alerts: Flags unusual patterns automatically
Strategic recommendations: Data-driven business suggestions
Export insights: Download comprehensive insight reports
πŸ› οΈ Installation
Prerequisites
Python 3.9 or higher
pip package manager
Git (optional)
Step-by-Step Installation
bash
# 1. Clone the repository
git clone https://github.com/yourusername/ai-data-analyst-pro.git
cd ai-data-analyst-pro
# 2. Create a virtual environment (recommended)
# On Windows:
python -m venv venv
venv\Scripts\activate
# On macOS/Linux:
python3 -m venv venv
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run the application
streamlit run app.py
After running, open your browser and navigate to http://localhost:8501 πŸŽ‰
πŸ“¦ Dependencies
txt
streamlit==1.28.0
pandas==2.0.3
numpy==1.24.3
plotly==5.17.0
scikit-learn==1.3.0
scipy==1.11.1
statsmodels==0.14.0
xgboost==1.7.6
lightgbm==4.0.0
matplotlib==3.7.2
shap==0.42.1
openpyxl==3.1.2
πŸš€ Quick Start Guide
1. Upload Your Data
Click on the "Upload Dataset" tab
Drag & drop your CSV or Excel file (or click to browse)
Watch as the system automatically validates and displays your data preview
2. Explore Your Data
Navigate through the EDA tabs to discover patterns
Use the chatbot to ask questions about your data
Generate visualizations with just a few clicks
3. Clean & Prepare
Go to the Preprocessing tab to handle missing values
Detect and remove outliers automatically
Scale features and encode categorical variables
4. Build Models
Navigate to the Machine Learning tab
Select your target variable (the AI will suggest the best options)
Choose models and let the pipeline train and compare them
5. Get Insights
Review the automatically generated business insights
Download comprehensive reports
Share findings with your team
πŸ“ Project Structure
text
ai-data-analyst-pro/
β”‚
β”œβ”€β”€ app.py # Main application with Streamlit UI
β”œβ”€β”€ chatbot.py # AI-powered data chatbot
β”œβ”€β”€ data_preprocessing.py # Data cleaning and transformation
β”œβ”€β”€ dataset_overview.py # Comprehensive EDA visualizations
β”œβ”€β”€ data_quality.py # Quality metrics and anomaly detection
β”œβ”€β”€ insights.py # Automated business insights
β”œβ”€β”€ ml_pipeline.py # Machine learning pipeline
β”œβ”€β”€ statistical_analysis.py # Statistical tests and analysis
β”œβ”€β”€ visualization.py # Interactive plot generation
β”œβ”€β”€ utils.py # Helper functions and utilities
β”‚
β”œβ”€β”€ requirements.txt # Project dependencies
β”œβ”€β”€ Dockerfile # Docker configuration
β”œβ”€β”€ Procfile # Heroku deployment config
β”œβ”€β”€ .gitignore # Git ignore rules
β”‚
└── README.md # Project documentation
🚒 Deployment
# Auto_Data_Analytics