Spaces:

Aka18
/

AIDA

Running

App Files Files Community

AkashGogineni18 commited on Jul 4, 2025

Commit

0b52104

1 Parent(s): f7e5edf

intial code

Browse files

Files changed (6) hide show

Dockerfile +39 -0
README 2.md +124 -0
__pycache__/data_analysis_agent.cpython-311.pyc +0 -0
data_analysis_agent.py +657 -0
requirements.txt +24 -0
web_app.py +1551 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,39 @@

+# Fixed Dockerfile - works with your existing web_app.py
+FROM python:3.11-slim
+# Set working directory
+WORKDIR /app
+# Environment variables
+ENV PYTHONUNBUFFERED=1
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONPATH=/app:/app/src
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    gcc \
+    g++ \
+    && rm -rf /var/lib/apt/lists/*
+# Copy and install requirements first (for better caching)
+COPY requirements.txt .
+RUN pip install --no-cache-dir --upgrade pip setuptools wheel && \
+    pip install --no-cache-dir -r requirements.txt
+# Copy ALL files (this ensures both .py files are copied)
+COPY . .
+# Ensure files have correct permissions
+RUN chmod +r *.py
+# Debug: Show what files we have
+RUN echo "Files in /app:" && ls -la /app/
+# Create necessary directories
+RUN mkdir -p /app/temp /app/analysis_output
+# Expose port
+EXPOSE 7860
+# Run Streamlit with explicit file path
+CMD ["python", "-m", "streamlit", "run", "/app/web_app.py", "--server.port=7860", "--server.address=0.0.0.0", "--server.headless=true", "--server.enableCORS=false", "--server.enableXsrfProtection=false"]

README 2.md ADDED Viewed

	@@ -0,0 +1,124 @@

+---
+title: AIDA - AI Data Analysis Agent
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: docker
+pinned: false
+license: mit
+---
+🤖 AIDA - AI Data Analysis Agent
+Transform your raw data into actionable business insights with the power of AI.
+AIDA is an intelligent data analysis system powered by Llama 3 and LangGraph that automatically:
+📊 Analyzes dataset structure and quality
+🧠 Generates AI-powered insights
+📈 Creates intelligent visualizations
+🎯 Provides actionable business recommendations
+✨ Features
+🔍 Intelligent Analysis: AI automatically understands your data structure
+📊 Smart Visualizations: Creates the most appropriate charts for your data
+💡 Business Insights: Generates meaningful patterns and trends
+🎯 Actionable Recommendations: Provides specific, measurable action items
+🌐 Beautiful Interface: Modern, responsive web interface
+🚀 How to Use
+Set API Key: Get your free API key from Groq Console
+Upload Data: Support for CSV, Excel, and JSON files
+AI Analysis: Let the AI agents analyze your data automatically
+Get Insights: Review generated insights and recommendations
+Download Results: Export analysis reports and enhanced datasets
+📊 Supported File Formats
+CSV files (.csv) - Most common format
+Excel files (.xlsx, .xls) - Spreadsheet data
+JSON files (.json) - Structured data
+🤖 AI Agents & Workflow
+AIDA uses a sophisticated multi-agent system powered by LangGraph to analyze your data intelligently:
+Agent Architecture
+🔍 Data Profiler Agent
+Analyzes dataset structure and characteristics
+Identifies data types, missing values, and quality issues
+Generates initial dataset overview
+📊 Column Analyzer Agent
+Performs detailed analysis of each column
+Calculates statistical measures and distributions
+Identifies patterns and anomalies in individual features
+🧠 Insight Generator Agent
+Uses AI to generate meaningful business insights
+Identifies correlations and relationships
+Discovers hidden patterns in the data
+📈 Visualization Planner Agent
+Intelligently selects optimal chart types
+Plans visualization strategy based on data characteristics
+Ensures maximum insight communication
+🎨 Chart Creator Agent
+Creates interactive visualizations
+Generates multiple chart types automatically
+Optimizes visual presentation for clarity
+🎯 Recommendation Engine Agent
+Formulates actionable business recommendations
+Provides specific, measurable action items
+Prioritizes recommendations by potential impact
+Workflow Process
+📊 Dataset Upload
+    ↓
+🔍 Data Profiling → 📊 Column Analysis → 🧠 Insight Generation
+    ↓                      ↓                    ↓
+📈 Visualization Planning → 🎨 Chart Creation → 🎯 Recommendations
+    ↓
+✅ Complete Analysis Report
+Each agent operates autonomously while maintaining context through LangGraph's state management, ensuring comprehensive and coherent analysis.
+🔧 Technology Stack
+AI Models: Llama 3 (70B and 8B variants)
+Agent Framework: LangGraph for intelligent multi-agent workflows
+State Management: TypedDict for structured agent communication
+Frontend: Streamlit with custom CSS styling
+Visualization: Plotly for interactive charts
+Data Processing: Pandas and NumPy
+API Integration: Groq API for LLM access
+🌟 Perfect For
+Business Analysts - Quick data insights
+Data Scientists - Rapid exploratory analysis
+Managers - Data-driven decision making
+Students - Learning data analysis patterns
+Researchers - Dataset understanding
+🛡️ Privacy & Security
+No data is stored permanently
+All processing happens in your session
+Files are automatically cleaned up
+API keys are handled securely
+🎯 Get Started
+Simply upload your dataset and let AIDA's AI agents work their magic! The system will automatically:
+Profile your dataset structure
+Analyze data quality and patterns
+Generate business insights
+Create optimal visualizations
+Recommend actionable next steps
+Powered by Llama 3 • Built with LangGraph • Designed for Business Impact

__pycache__/data_analysis_agent.cpython-311.pyc ADDED Viewed

Binary file (38.1 kB). View file

data_analysis_agent.py ADDED Viewed

	@@ -0,0 +1,657 @@

+import os
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import seaborn as sns
+import plotly.express as px
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+import warnings
+warnings.filterwarnings('ignore')
+from typing import Dict, List, Any, Optional, TypedDict
+import json
+from datetime import datetime
+import logging
+# LangGraph and LLM imports
+from langgraph.graph import StateGraph, END
+from langchain_groq import ChatGroq
+from langchain_core.messages import HumanMessage, SystemMessage
+from langchain_core.prompts import ChatPromptTemplate
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class AnalysisState(TypedDict):
+    """State structure for the analysis workflow"""
+    dataset: pd.DataFrame
+    dataset_info: Dict[str, Any]
+    column_analysis: Dict[str, Any]
+    insights: List[str]
+    visualizations: List[Dict[str, Any]]
+    recommendations: List[str]
+    current_step: str
+    error_messages: List[str]
+class DataAnalysisAgent:
+    def __init__(self, groq_api_key: str, model_name: str = "llama3-70b-8192"):
+        """Initialize the Data Analysis Agent"""
+        # Fixed: Use correct model name format
+        self.llm = ChatGroq(
+            groq_api_key=groq_api_key,
+            model_name=model_name,  # Fixed: Use standard model names
+            temperature=0.1,
+            max_tokens=2000
+        )
+        # Set up the analysis workflow graph
+        self.workflow = self._create_workflow()
+    def _create_workflow(self) -> StateGraph:
+        """Create the LangGraph workflow for data analysis"""
+        workflow = StateGraph(AnalysisState)
+        # Add nodes for each analysis step
+        workflow.add_node("data_profiler", self._profile_dataset)
+        workflow.add_node("column_analyzer", self._analyze_columns)
+        workflow.add_node("insight_generator", self._generate_insights)
+        workflow.add_node("visualization_planner", self._plan_visualizations)
+        workflow.add_node("chart_creator", self._create_charts)
+        workflow.add_node("recommendation_engine", self._generate_recommendations)
+        # Define the workflow edges
+        workflow.add_edge("data_profiler", "column_analyzer")
+        workflow.add_edge("column_analyzer", "insight_generator")
+        workflow.add_edge("insight_generator", "visualization_planner")
+        workflow.add_edge("visualization_planner", "chart_creator")
+        workflow.add_edge("chart_creator", "recommendation_engine")
+        workflow.add_edge("recommendation_engine", END)
+        # Set entry point
+        workflow.set_entry_point("data_profiler")
+        return workflow.compile()
+    def _profile_dataset(self, state: AnalysisState) -> AnalysisState:
+        """Profile the dataset to understand its structure and characteristics"""
+        logger.info("Profiling dataset...")
+        try:
+            df = state["dataset"]
+            # Basic dataset information
+            dataset_info = {
+                "shape": df.shape,
+                "columns": list(df.columns),
+                "dtypes": {col: str(dtype) for col, dtype in df.dtypes.to_dict().items()},  # Fixed: Convert to string
+                "memory_usage": int(df.memory_usage(deep=True).sum()),  # Fixed: Convert to int
+                "null_counts": df.isnull().sum().to_dict(),
+                "duplicate_rows": int(df.duplicated().sum()),  # Fixed: Convert to int
+                "numeric_columns": df.select_dtypes(include=[np.number]).columns.tolist(),
+                "categorical_columns": df.select_dtypes(include=['object', 'category']).columns.tolist(),
+                "datetime_columns": df.select_dtypes(include=['datetime64']).columns.tolist()
+            }
+            # Use LLM to generate initial insights about the dataset
+            prompt = f"""
+            Analyze this dataset profile and provide initial observations:
+            Dataset Shape: {dataset_info['shape']}
+            Columns: {dataset_info['columns']}
+            Data Types: {dataset_info['dtypes']}
+            Missing Values: {dataset_info['null_counts']}
+            Duplicate Rows: {dataset_info['duplicate_rows']}
+            Provide a brief analysis of the dataset structure, data quality issues, and potential analysis opportunities.
+            """
+            response = self.llm.invoke([HumanMessage(content=prompt)])
+            dataset_info["llm_profile"] = response.content
+            state["dataset_info"] = dataset_info
+            state["current_step"] = "data_profiler"
+        except Exception as e:
+            logger.error(f"Error in data profiling: {str(e)}")
+            # Ensure error_messages exists and add fallback dataset_info
+            if "error_messages" not in state:
+                state["error_messages"] = []
+            if "dataset_info" not in state:
+                state["dataset_info"] = {}
+            # Add basic fallback info
+            try:
+                df = state["dataset"]
+                state["dataset_info"] = {
+                    "shape": df.shape,
+                    "columns": list(df.columns),
+                    "dtypes": {col: str(dtype) for col, dtype in df.dtypes.items()},
+                    "numeric_columns": df.select_dtypes(include=[np.number]).columns.tolist(),
+                    "categorical_columns": df.select_dtypes(include=['object', 'category']).columns.tolist(),
+                    "datetime_columns": df.select_dtypes(include=['datetime64']).columns.tolist(),
+                    "null_counts": df.isnull().sum().to_dict(),
+                    "duplicate_rows": int(df.duplicated().sum()),
+                    "memory_usage": int(df.memory_usage(deep=True).sum())
+                }
+            except Exception:
+                # Ultimate fallback
+                state["dataset_info"] = {
+                    "shape": [0, 0],
+                    "columns": [],
+                    "dtypes": {},
+                    "numeric_columns": [],
+                    "categorical_columns": [],
+                    "datetime_columns": [],
+                    "null_counts": {},
+                    "duplicate_rows": 0,
+                    "memory_usage": 0
+                }
+            state["error_messages"].append(f"Data profiling error: {str(e)}")
+        return state
+    def _analyze_columns(self, state: AnalysisState) -> AnalysisState:
+        """Analyze individual columns in detail"""
+        logger.info("Analyzing columns...")
+        try:
+            df = state["dataset"]
+            column_analysis = {}
+            for column in df.columns:
+                col_data = df[column]
+                analysis = {
+                    "dtype": str(col_data.dtype),
+                    "null_count": int(col_data.isnull().sum()),  # Fixed: Convert to int
+                    "null_percentage": float((col_data.isnull().sum() / len(col_data)) * 100),  # Fixed: Convert to float
+                    "unique_count": int(col_data.nunique()),  # Fixed: Convert to int
+                    "unique_percentage": float((col_data.nunique() / len(col_data)) * 100)  # Fixed: Convert to float
+                }
+                if col_data.dtype in ['int64', 'float64']:
+                    analysis.update({
+                        "mean": float(col_data.mean()) if not pd.isna(col_data.mean()) else None,  # Fixed: Handle NaN
+                        "median": float(col_data.median()) if not pd.isna(col_data.median()) else None,
+                        "std": float(col_data.std()) if not pd.isna(col_data.std()) else None,
+                        "min": float(col_data.min()) if not pd.isna(col_data.min()) else None,
+                        "max": float(col_data.max()) if not pd.isna(col_data.max()) else None,
+                        "skewness": float(col_data.skew()) if not pd.isna(col_data.skew()) else None,
+                        "kurtosis": float(col_data.kurtosis()) if not pd.isna(col_data.kurtosis()) else None
+                    })
+                elif col_data.dtype == 'object':
+                    try:
+                        top_values = col_data.value_counts().head(5).to_dict()
+                        analysis.update({
+                            "top_values": top_values,
+                            "avg_length": float(col_data.astype(str).str.len().mean()),
+                            "max_length": int(col_data.astype(str).str.len().max())
+                        })
+                    except Exception:
+                        analysis.update({
+                            "top_values": {},
+                            "avg_length": 0,
+                            "max_length": 0
+                        })
+                column_analysis[column] = analysis
+            # Use LLM to interpret column analysis
+            prompt = f"""
+            Analyze these column statistics and identify patterns, anomalies, and insights:
+            {json.dumps(column_analysis, indent=2, default=str)}
+            Focus on:
+            1. Data quality issues
+            2. Distribution patterns
+            3. Potential relationships between columns
+            4. Outliers or anomalies
+            5. Business insights
+            """
+            response = self.llm.invoke([HumanMessage(content=prompt)])
+            column_analysis["llm_interpretation"] = response.content
+            state["column_analysis"] = column_analysis
+            state["current_step"] = "column_analyzer"
+        except Exception as e:
+            logger.error(f"Error in column analysis: {str(e)}")
+            if "error_messages" not in state:
+                state["error_messages"] = []
+            if "column_analysis" not in state:
+                state["column_analysis"] = {}
+            state["error_messages"].append(f"Column analysis error: {str(e)}")
+        return state
+    def _generate_insights(self, state: AnalysisState) -> AnalysisState:
+        """Generate insights from the data analysis"""
+        logger.info("Generating insights...")
+        try:
+            df = state["dataset"]
+            dataset_info = state["dataset_info"]
+            # Ensure required keys exist in dataset_info
+            if "numeric_columns" not in dataset_info:
+                dataset_info["numeric_columns"] = df.select_dtypes(include=[np.number]).columns.tolist()
+            if "categorical_columns" not in dataset_info:
+                dataset_info["categorical_columns"] = df.select_dtypes(include=['object', 'category']).columns.tolist()
+            # Correlation analysis for numeric columns
+            correlations = {}
+            numeric_cols = dataset_info.get("numeric_columns", [])
+            if len(numeric_cols) > 1:
+                corr_matrix = df[numeric_cols].corr()
+                high_correlations = []
+                for i in range(len(corr_matrix.columns)):
+                    for j in range(i+1, len(corr_matrix.columns)):
+                        corr_val = corr_matrix.iloc[i, j]
+                        if not pd.isna(corr_val) and abs(corr_val) > 0.7:  # Fixed: Check for NaN
+                            high_correlations.append({
+                                "col1": corr_matrix.columns[i],
+                                "col2": corr_matrix.columns[j],
+                                "correlation": float(corr_val)  # Fixed: Convert to float
+                            })
+                correlations["high_correlations"] = high_correlations
+            # Use LLM to generate comprehensive insights
+            prompt = f"""
+            Based on the dataset analysis, generate key insights and findings:
+            Dataset Info: {json.dumps(dataset_info, indent=2, default=str)}
+            High Correlations: {json.dumps(correlations, indent=2, default=str)}
+            Generate 5-10 specific, actionable insights that would be valuable for business decision-making.
+            Focus on trends, patterns, anomalies, and opportunities.
+            """
+            response = self.llm.invoke([HumanMessage(content=prompt)])
+            insights = response.content.split('\n')
+            insights = [insight.strip() for insight in insights if insight.strip()]
+            state["insights"] = insights
+            state["current_step"] = "insight_generator"
+        except Exception as e:
+            logger.error(f"Error in insight generation: {str(e)}")
+            if "error_messages" not in state:
+                state["error_messages"] = []
+            if "insights" not in state:
+                state["insights"] = []
+            state["error_messages"].append(f"Insight generation error: {str(e)}")
+        return state
+    def _plan_visualizations(self, state: AnalysisState) -> AnalysisState:
+        """Plan appropriate visualizations based on data characteristics"""
+        logger.info("Planning visualizations...")
+        try:
+            dataset_info = state["dataset_info"]
+            insights = state["insights"]
+            # Ensure required keys exist
+            if "numeric_columns" not in dataset_info:
+                df = state["dataset"]
+                dataset_info["numeric_columns"] = df.select_dtypes(include=[np.number]).columns.tolist()
+                dataset_info["categorical_columns"] = df.select_dtypes(include=['object', 'category']).columns.tolist()
+            # Use LLM to plan visualizations
+            prompt = f"""
+            Plan the most effective visualizations for this dataset:
+            Dataset Info: {json.dumps(dataset_info, indent=2, default=str)}
+            Key Insights: {insights}
+            Suggest 5-8 different visualization types with:
+            1. Chart type (histogram, scatter, bar, line, heatmap, etc.)
+            2. Columns to use
+            3. Purpose/insight to communicate
+            4. Title and description
+            Return as a JSON list with this structure:
+            [
+                {{
+                    "type": "histogram",
+                    "columns": ["column_name"],
+                    "title": "Distribution of...",
+                    "description": "Shows the...",
+                    "purpose": "Understand distribution"
+                }}
+            ]
+            """
+            response = self.llm.invoke([HumanMessage(content=prompt)])
+            try:
+                # Extract JSON from response
+                json_start = response.content.find('[')
+                json_end = response.content.rfind(']') + 1
+                if json_start >= 0 and json_end > json_start:
+                    viz_plan = json.loads(response.content[json_start:json_end])
+                else:
+                    viz_plan = self._create_default_viz_plan(dataset_info)
+            except Exception:
+                # Fallback visualization plan
+                viz_plan = self._create_default_viz_plan(dataset_info)
+            state["visualizations"] = viz_plan
+            state["current_step"] = "visualization_planner"
+        except Exception as e:
+            logger.error(f"Error in visualization planning: {str(e)}")
+            if "error_messages" not in state:
+                state["error_messages"] = []
+            if "visualizations" not in state:
+                state["visualizations"] = []
+            state["error_messages"].append(f"Visualization planning error: {str(e)}")
+            # Ensure we have dataset_info for fallback
+            if "dataset_info" not in state:
+                state["dataset_info"] = {}
+            state["visualizations"] = self._create_default_viz_plan(state["dataset_info"])
+        return state
+    def _create_default_viz_plan(self, dataset_info: Dict) -> List[Dict]:
+        """Create a default visualization plan"""
+        viz_plan = []
+        # Ensure keys exist with defaults
+        numeric_columns = dataset_info.get("numeric_columns", [])
+        categorical_columns = dataset_info.get("categorical_columns", [])
+        # Distribution plots for numeric columns
+        for col in numeric_columns[:3]:
+            viz_plan.append({
+                "type": "histogram",
+                "columns": [col],
+                "title": f"Distribution of {col}",
+                "description": f"Shows the distribution pattern of {col}",
+                "purpose": "Understand data distribution"
+            })
+        # Bar plots for categorical columns
+        for col in categorical_columns[:2]:
+            viz_plan.append({
+                "type": "bar",
+                "columns": [col],
+                "title": f"Frequency of {col}",
+                "description": f"Shows the frequency of different {col} values",
+                "purpose": "Understand categorical distribution"
+            })
+        # Correlation heatmap if multiple numeric columns
+        if len(numeric_columns) > 1:
+            viz_plan.append({
+                "type": "heatmap",
+                "columns": numeric_columns,
+                "title": "Correlation Matrix",
+                "description": "Shows correlations between numeric variables",
+                "purpose": "Identify relationships"
+            })
+        return viz_plan
+    def _create_charts(self, state: AnalysisState) -> AnalysisState:
+        """Create the planned visualizations"""
+        logger.info("Creating charts...")
+        try:
+            df = state["dataset"]
+            viz_plans = state["visualizations"]
+            # Fixed: Use a working matplotlib style
+            try:
+                plt.style.use('default')  # Fixed: Use default instead of seaborn-v0_8
+            except:
+                pass  # If style fails, continue with default
+            for i, viz in enumerate(viz_plans):
+                try:
+                    fig, ax = plt.subplots(figsize=(10, 6))
+                    if viz["type"] == "histogram":
+                        col = viz["columns"][0]
+                        if col in df.columns and df[col].dtype in ['int64', 'float64']:
+                            df[col].dropna().hist(bins=30, ax=ax, alpha=0.7)  # Fixed: Drop NaN values
+                            ax.set_title(viz["title"])
+                            ax.set_xlabel(col)
+                            ax.set_ylabel('Frequency')
+                    elif viz["type"] == "bar":
+                        col = viz["columns"][0]
+                        if col in df.columns:
+                            value_counts = df[col].value_counts().head(10)
+                            value_counts.plot(kind='bar', ax=ax)
+                            ax.set_title(viz["title"])
+                            ax.set_xlabel(col)
+                            ax.set_ylabel('Count')
+                            plt.xticks(rotation=45)
+                    elif viz["type"] == "heatmap":
+                        numeric_cols = [col for col in viz["columns"] if col in df.columns and df[col].dtype in ['int64', 'float64']]
+                        if len(numeric_cols) > 1:
+                            corr_matrix = df[numeric_cols].corr()
+                            # Fixed: Use matplotlib imshow instead of seaborn
+                            im = ax.imshow(corr_matrix, cmap='coolwarm', aspect='auto')
+                            ax.set_xticks(range(len(corr_matrix.columns)))
+                            ax.set_yticks(range(len(corr_matrix.columns)))
+                            ax.set_xticklabels(corr_matrix.columns, rotation=45)
+                            ax.set_yticklabels(corr_matrix.columns)
+                            ax.set_title(viz["title"])
+                            plt.colorbar(im, ax=ax)
+                    elif viz["type"] == "scatter":
+                        if len(viz["columns"]) >= 2:
+                            col1, col2 = viz["columns"][0], viz["columns"][1]
+                            if col1 in df.columns and col2 in df.columns:
+                                clean_data = df[[col1, col2]].dropna()  # Fixed: Remove NaN values
+                                ax.scatter(clean_data[col1], clean_data[col2], alpha=0.6)
+                                ax.set_xlabel(col1)
+                                ax.set_ylabel(col2)
+                                ax.set_title(viz["title"])
+                    plt.tight_layout()
+                    plt.savefig(f'chart_{i+1}_{viz["type"]}.png', dpi=300, bbox_inches='tight')
+                    plt.close()
+                except Exception as e:
+                    logger.warning(f"Failed to create {viz['type']} chart: {str(e)}")
+                    plt.close()  # Fixed: Ensure figure is closed even on error
+                    continue
+            state["current_step"] = "chart_creator"
+        except Exception as e:
+            logger.error(f"Error in chart creation: {str(e)}")
+            if "error_messages" not in state:
+                state["error_messages"] = []
+            state["error_messages"].append(f"Chart creation error: {str(e)}")
+        return state
+    def _generate_recommendations(self, state: AnalysisState) -> AnalysisState:
+        """Generate actionable recommendations based on analysis"""
+        logger.info("Generating recommendations...")
+        try:
+            insights = state["insights"]
+            dataset_info = state["dataset_info"]
+            # Use LLM to generate recommendations
+            prompt = f"""
+            Based on the complete data analysis, generate specific, actionable recommendations:
+            Dataset Info: {json.dumps(dataset_info, indent=2, default=str)}
+            Key Insights: {insights}
+            Generate 5-10 specific recommendations that include:
+            1. Data quality improvements
+            2. Business opportunities
+            3. Further analysis suggestions
+            4. Action items for stakeholders
+            Make recommendations specific, measurable, and actionable.
+            """
+            response = self.llm.invoke([HumanMessage(content=prompt)])
+            recommendations = response.content.split('\n')
+            recommendations = [rec.strip() for rec in recommendations if rec.strip()]
+            state["recommendations"] = recommendations
+            state["current_step"] = "recommendation_engine"
+        except Exception as e:
+            logger.error(f"Error in recommendation generation: {str(e)}")
+            if "error_messages" not in state:
+                state["error_messages"] = []
+            if "recommendations" not in state:
+                state["recommendations"] = []
+            state["error_messages"].append(f"Recommendation generation error: {str(e)}")
+        return state
+    def analyze_dataset(self, dataset_path: str) -> Dict[str, Any]:
+        """Main method to analyze a dataset"""
+        logger.info(f"Starting analysis of dataset: {dataset_path}")
+        try:
+            # Load dataset
+            if dataset_path.endswith('.csv'):
+                df = pd.read_csv(dataset_path)
+            elif dataset_path.endswith(('.xlsx', '.xls')):
+                df = pd.read_excel(dataset_path)
+            elif dataset_path.endswith('.json'):
+                df = pd.read_json(dataset_path)
+            else:
+                raise ValueError("Unsupported file format. Use CSV, Excel, or JSON.")
+            # Initialize state with all required fields
+            initial_state = AnalysisState(
+                dataset=df,
+                dataset_info={},
+                column_analysis={},
+                insights=[],
+                visualizations=[],
+                recommendations=[],
+                current_step="",
+                error_messages=[]
+            )
+            # Run the workflow
+            final_state = self.workflow.invoke(initial_state)
+            # Prepare results
+            results = {
+                "dataset_info": final_state.get("dataset_info", {}),
+                "column_analysis": final_state.get("column_analysis", {}),
+                "insights": final_state.get("insights", []),
+                "visualizations": final_state.get("visualizations", []),
+                "recommendations": final_state.get("recommendations", []),
+                "analysis_timestamp": datetime.now().isoformat(),
+                "errors": final_state.get("error_messages", [])
+            }
+            # Generate summary report
+            self._generate_report(results, dataset_path)
+            logger.info("Analysis completed successfully!")
+            return results
+        except Exception as e:
+            logger.error(f"Error in dataset analysis: {str(e)}")
+            return {"error": str(e)}
+    def _generate_report(self, results: Dict[str, Any], dataset_path: str):
+        """Generate a comprehensive analysis report"""
+        try:
+            report_content = f"""
+# Data Analysis Report
+## Dataset: {dataset_path}
+## Analysis Date: {results['analysis_timestamp']}
+### Dataset Overview
+- Shape: {results['dataset_info'].get('shape', 'N/A')}
+- Columns: {len(results['dataset_info'].get('columns', []))}
+- Missing Values: {sum(results['dataset_info'].get('null_counts', {}).values())}
+- Duplicate Rows: {results['dataset_info'].get('duplicate_rows', 'N/A')}
+### Key Insights
+"""
+            for i, insight in enumerate(results.get('insights', []), 1):
+                report_content += f"{i}. {insight}\n"
+            report_content += "\n### Recommendations\n"
+            for i, rec in enumerate(results.get('recommendations', []), 1):
+                report_content += f"{i}. {rec}\n"
+            # Save report
+            with open('analysis_report.md', 'w') as f:
+                f.write(report_content)
+            print("Analysis report saved as 'analysis_report.md'")
+        except Exception as e:
+            logger.error(f"Error generating report: {str(e)}")
+# Usage example and configuration
+class DataAnalysisConfig:
+    """Configuration class for easy customization"""
+    def __init__(self):
+        self.groq_api_key = os.environ.get('GROQ_API_KEY')
+        self.model_name = "llama3-70b-8192"  # Fixed: Use correct model name
+        self.output_directory = "analysis_output"
+        self.chart_style = "default"  # Fixed: Use default style
+    def validate(self):
+        """Validate configuration"""
+        if not self.groq_api_key:
+            raise ValueError("GROQ_API_KEY environment variable is required")
+        if not os.path.exists(self.output_directory):
+            os.makedirs(self.output_directory)
+def main():
+    """Main function to run the data analysis system"""
+    # Example usage
+    config = DataAnalysisConfig()
+    try:
+        config.validate()
+    except ValueError as e:
+        print(f"Configuration error: {e}")
+        print("Please set the GROQ_API_KEY environment variable")
+        return
+    # Initialize the agent
+    agent = DataAnalysisAgent(
+        groq_api_key=config.groq_api_key,
+        model_name=config.model_name
+    )
+    # Example: Analyze a dataset
+    dataset_path = "your_dataset.csv"  # Replace with your dataset path
+    if os.path.exists(dataset_path):
+        results = agent.analyze_dataset(dataset_path)
+        if "error" not in results:
+            print("Analysis completed successfully!")
+            print(f"Generated {len(results['insights'])} insights")
+            print(f"Created {len(results['visualizations'])} visualizations")
+            print(f"Provided {len(results['recommendations'])} recommendations")
+        else:
+            print(f"Analysis failed: {results['error']}")
+    else:
+        print(f"Dataset file not found: {dataset_path}")
+        print("Please provide a valid dataset path")
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,24 @@

+# Core dependencies
+pandas>=2.0.0
+numpy>=1.24.0
+matplotlib>=3.7.0
+plotly>=5.15.0
+# AI/ML dependencies
+langchain>=0.1.0
+langchain-groq>=0.1.0
+langgraph>=0.0.40
+# File handling
+openpyxl>=3.1.0
+python-dotenv>=1.0.0
+# Web interface
+streamlit>=1.28.0
+# Data processing utilities
+scipy>=1.11.0
+# Additional dependencies for stability
+requests>=2.28.0
+typing-extensions>=4.0.0

web_app.py ADDED Viewed

	@@ -0,0 +1,1551 @@

+# web_app.py
+# Beautiful Web interface for the AI Data Analysis Agent System
+import streamlit as st
+import pandas as pd
+import plotly.express as px
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+import io
+import base64
+from datetime import datetime
+import json
+import os
+import sys
+from pathlib import Path
+import time
+# Add the current directory to path to import our agent
+sys.path.append(str(Path(__file__).parent))
+try:
+    from data_analysis_agent import DataAnalysisAgent, DataAnalysisConfig
+except ImportError:
+    st.error("❌ Please ensure data_analysis_agent.py is in the same directory")
+    st.info("Download both files and place them in the same folder")
+    st.stop()
+# Page configuration
+st.set_page_config(
+    page_title="AI Data Analysis Agent",
+    page_icon="🤖",
+    layout="wide",
+    initial_sidebar_state="expanded",
+    menu_items={
+        'Get Help': 'https://github.com/yourusername/ai-data-analysis-agent',
+        'Report a bug': "https://github.com/yourusername/ai-data-analysis-agent/issues",
+        'About': "# AI Data Analysis Agent\nPowered by Llama 3 & LangGraph"
+    }
+)
+# Custom CSS for beautiful styling
+st.markdown("""
+<style>
+    /* Import Google Fonts */
+    @import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap');
+    /* Global Styles */
+    .main .block-container {
+        padding-top: 2rem;
+        max-width: 1200px;
+    }
+    /* Main Header */
+    .main-header {
+        font-family: 'Inter', sans-serif;
+        font-size: 3.5rem;
+        font-weight: 700;
+        text-align: center;
+        margin: 2rem 0;
+        background: linear-gradient(135deg, #1e40af 0%, #3b82f6 50%, #06b6d4 100%);
+        -webkit-background-clip: text;
+        -webkit-text-fill-color: transparent;
+        background-clip: text;
+        text-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
+    }
+    /* Subtitle */
+    .subtitle {
+        font-family: 'Inter', sans-serif;
+        font-size: 1.2rem;
+        text-align: center;
+        color: #64748b;
+        margin-bottom: 3rem;
+        font-weight: 400;
+    }
+    /* Feature Cards */
+    .feature-card {
+        background: linear-gradient(145deg, #ffffff 0%, #f8fafc 100%);
+        border: 1px solid #e2e8f0;
+        border-radius: 16px;
+        padding: 2rem;
+        margin: 1rem 0;
+        box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06);
+        transition: all 0.3s ease;
+        height: 100%;
+    }
+    .feature-card:hover {
+        transform: translateY(-4px);
+        box-shadow: 0 20px 25px -5px rgba(0, 0, 0, 0.1), 0 10px 10px -5px rgba(0, 0, 0, 0.04);
+    }
+    .feature-icon {
+        font-size: 3rem;
+        margin-bottom: 1rem;
+        display: block;
+    }
+    .feature-title {
+        font-family: 'Inter', sans-serif;
+        font-size: 1.5rem;
+        font-weight: 600;
+        color: #1e293b;
+        margin-bottom: 0.5rem;
+    }
+    .feature-description {
+        color: #64748b;
+        font-size: 1rem;
+        line-height: 1.6;
+    }
+    /* Metric Cards */
+    .metric-container {
+        display: flex;
+        gap: 1rem;
+        margin: 2rem 0;
+    }
+    .metric-card {
+        background: linear-gradient(135deg, #4f46e5 0%, #7c3aed 100%);
+        color: white;
+        padding: 1.5 rem;
+        border-radius: 12px;
+        text-align: center;
+        box-shadow: 0 10px 15px -3px rgba(0, 0, 0, 0.1);
+        flex: 1;
+        transition: transform 0.2s ease;
+    }
+    .metric-card:hover {
+        transform: scale(1.05);
+    }
+    .metric-value {
+        font-size: 2rem;
+        font-weight: 700;
+        margin-bottom: 0.5rem;
+    }
+    .metric-label {
+        font-size: 0.9rem;
+        opacity: 0.9;
+        font-weight: 500;
+    }
+    /* Insight and Recommendation Boxes */
+    .insight-box {
+        background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%);
+        border-left: 5px solid #3b82f6;
+        padding: 1.5rem;
+        margin: 1rem 0;
+        border-radius: 0 12px 12px 0;
+        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05);
+        transition: all 0.3s ease;
+    }
+    .insight-box:hover {
+        transform: translateX(4px);
+        box-shadow: 0 8px 25px rgba(0, 0, 0, 0.1);
+    }
+    .recommendation-box {
+        background: linear-gradient(135deg, #f0fdf4 0%, #dcfce7 100%);
+        border-left: 5px solid #22c55e;
+        padding: 1.5rem;
+        margin: 1rem 0;
+        border-radius: 0 12px 12px 0;
+        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05);
+        transition: all 0.3s ease;
+    }
+    .recommendation-box:hover {
+        transform: translateX(4px);
+        box-shadow: 0 8px 25px rgba(0, 0, 0, 0.1);
+    }
+    /* Upload Area */
+    .upload-area {
+        border: 2px dashed #cbd5e1;
+        border-radius: 12px;
+        padding: 3rem 2rem;
+        text-align: center;
+        background: linear-gradient(135deg, #f8fafc 0%, #f1f5f9 100%);
+        margin: 2rem 0;
+        transition: all 0.3s ease;
+    }
+    .upload-area:hover {
+        border-color: #3b82f6;
+        background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%);
+    }
+    /* Progress Bar */
+    .stProgress > div > div > div > div {
+        background: linear-gradient(135deg, #3b82f6 0%, #8b5cf6 100%);
+        border-radius: 10px;
+    }
+    /* Buttons */
+    .stButton > button {
+        background: linear-gradient(135deg, #3b82f6 0%, #8b5cf6 100%);
+        color: white;
+        border: none;
+        border-radius: 12px;
+        padding: 0.75rem 2rem;
+        font-weight: 600;
+        font-size: 1rem;
+        transition: all 0.3s ease;
+        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
+    }
+    .stButton > button:hover {
+        transform: translateY(-2px);
+        box-shadow: 0 8px 15px rgba(0, 0, 0, 0.2);
+    }
+    /* Sidebar Styling */
+    .css-1d391kg {
+        background: linear-gradient(180deg, #1e293b 0%, #334155 100%);
+    }
+    .css-1d391kg .sidebar-content {
+        color: white;
+    }
+    /* Tab Styling */
+    .stTabs [data-baseweb="tab-list"] {
+        gap: 8px;
+    }
+    .stTabs [data-baseweb="tab"] {
+        height: 50px;
+        background: linear-gradient(135deg, #f1f5f9 0%, #e2e8f0 100%);
+        border-radius: 12px;
+        border: 1px solid #cbd5e1;
+        color: #475569;
+        font-weight: 500;
+        transition: all 0.3s ease;
+    }
+    .stTabs [aria-selected="true"] {
+        background: linear-gradient(135deg, #3b82f6 0%, #8b5cf6 100%);
+        color: white;
+        border: 1px solid #3b82f6;
+    }
+    /* Success/Warning/Error Messages */
+    .stSuccess {
+        background: linear-gradient(135deg, #dcfce7 0%, #bbf7d0 100%);
+        border: 1px solid #22c55e;
+        border-radius: 12px;
+    }
+    .stWarning {
+        background: linear-gradient(135deg, #fef3c7 0%, #fde68a 100%);
+        border: 1px solid #f59e0b;
+        border-radius: 12px;
+    }
+    .stError {
+        background: linear-gradient(135deg, #fee2e2 0%, #fecaca 100%);
+        border: 1px solid #ef4444;
+        border-radius: 12px;
+    }
+    /* Animation */
+    @keyframes fadeInUp {
+        from {
+            opacity: 0;
+            transform: translateY(30px);
+        }
+        to {
+            opacity: 1;
+            transform: translateY(0);
+        }
+    }
+    .animate-fade-in {
+        animation: fadeInUp 0.6s ease-out;
+    }
+    /* Data Table Styling */
+    .stDataFrame {
+        border-radius: 12px;
+        overflow: hidden;
+        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05);
+    }
+    /* Expander Styling */
+    .streamlit-expanderHeader {
+        background: linear-gradient(135deg, #f8fafc 0%, #f1f5f9 100%);
+        border-radius: 12px;
+        border: 1px solid #e2e8f0;
+    }
+    /* Footer */
+    .footer {
+        text-align: center;
+        padding: 3rem 0;
+        color: #64748b;
+        font-size: 0.9rem;
+        border-top: 1px solid #e2e8f0;
+        margin-top: 4rem;
+    }
+    .footer a {
+        color: #3b82f6;
+        text-decoration: none;
+        font-weight: 500;
+    }
+    .footer a:hover {
+        text-decoration: underline;
+    }
+    /* Loading Animation */
+    .loading-container {
+        display: flex;
+        justify-content: center;
+        align-items: center;
+        padding: 2rem;
+    }
+    .loading-spinner {
+        border: 4px solid #f3f4f6;
+        border-top: 4px solid #3b82f6;
+        border-radius: 50%;
+        width: 40px;
+        height: 40px;
+        animation: spin 1s linear infinite;
+    }
+    @keyframes spin {
+        0% { transform: rotate(0deg); }
+        100% { transform: rotate(360deg); }
+    }
+</style>
+""", unsafe_allow_html=True)
+def initialize_session_state():
+    """Initialize session state variables"""
+    if 'analysis_results' not in st.session_state:
+        st.session_state.analysis_results = None
+    if 'dataset' not in st.session_state:
+        st.session_state.dataset = None
+    if 'agent' not in st.session_state:
+        st.session_state.agent = None
+    if 'groq_api_key' not in st.session_state:
+        st.session_state.groq_api_key = ""
+    if 'model_name' not in st.session_state:
+        st.session_state.model_name = "llama3-70b-8192"
+    if 'analysis_complete' not in st.session_state:
+        st.session_state.analysis_complete = False
+def create_agent():
+    """Create and configure the data analysis agent"""
+    try:
+        # Check environment variable first, then session state
+        groq_api_key = os.environ.get('GROQ_API_KEY') or st.session_state.get('groq_api_key', '')
+        if not groq_api_key:
+            return None
+        agent = DataAnalysisAgent(
+            groq_api_key=groq_api_key,
+            model_name=st.session_state.get('model_name', 'llama3-70b-8192')
+        )
+        return agent
+    except Exception as e:
+        st.error(f"Failed to create agent: {str(e)}")
+        return None
+def sidebar_config():
+    """Configure the beautiful sidebar"""
+    with st.sidebar:
+        st.markdown("""
+        <div style='text-align: center; padding: 1rem 0;'>
+            <div style='font-size: 4.5rem; margin-bottom: 0 rem;'>🤖</div>
+            <h1 style='
+                background: linear-gradient(135deg, #1e40af 0%, #3b82f6 50%, #06b6d4 100%);
+                -webkit-background-clip: text;
+                -webkit-text-fill-color: transparent;
+                background-clip: text;
+                margin: 0;
+                font-size: 1.6rem;
+                font-weight: 700;
+            '>AI Agents on action</h1>
+            <p style='color: #94a3b8; margin: 0.5rem 0 0 0; font-size: 0.9rem;'>Powered by Llama 3</p>
+        </div>
+        """, unsafe_allow_html=True)
+        st.markdown("---")
+        # Check for environment variable first
+        env_api_key = os.environ.get('GROQ_API_KEY')
+        if env_api_key:
+            st.success("✅ API Key Configured")
+            st.session_state.groq_api_key = env_api_key
+            api_key_configured = True
+        else:
+            st.subheader("🔑 API Setup")
+            st.info("💡 Set GROQ_API_KEY environment variable")
+            groq_api_key = st.text_input(
+                "Groq API Key",
+                type="password",
+                value=st.session_state.groq_api_key,
+                help="Get your API key from console.groq.com"
+            )
+            if groq_api_key:
+                st.session_state.groq_api_key = groq_api_key
+                api_key_configured = True
+            else:
+                api_key_configured = False
+        st.markdown("---")
+        # Model Selection
+        st.subheader("🧠 AI Model")
+        model_options = {
+            "llama3-70b-8192": "Llama 3 70B (Recommended)",
+            "llama3-8b-8192": "Llama 3 8B (Faster)",
+            "mixtral-8x7b-32768": "Mixtral 8x7B"
+        }
+        selected_model = st.selectbox(
+            "Choose Model",
+            options=list(model_options.keys()),
+            format_func=lambda x: model_options[x],
+            index=0
+        )
+        st.session_state.model_name = selected_model
+        st.markdown("---")
+        # Analysis Options
+        st.subheader("⚙️ Analysis Settings")
+        industry_type = st.selectbox(
+            "Industry Focus",
+            ["General", "Retail", "Healthcare", "Finance", "Manufacturing", "Technology"],
+            help="Customize insights for your industry"
+        )
+        st.session_state.industry_type = industry_type
+        enable_advanced = st.toggle(
+            "Advanced Analysis",
+            value=True,
+            help="Include correlation analysis and advanced insights"
+        )
+        st.session_state.enable_advanced = enable_advanced
+        auto_insights = st.toggle(
+            "Auto-Generate Insights",
+            value=True,
+            help="Automatically generate business insights"
+        )
+        st.session_state.auto_insights = auto_insights
+        st.markdown("---")
+        # Quick Stats with dynamic insights count
+        if st.session_state.dataset is not None:
+            st.subheader("📊 Dataset Info")
+            df = st.session_state.dataset
+            col1, col2 = st.columns(2)
+            with col1:
+                st.metric("Rows", f"{df.shape[0]:,}")
+                st.metric("Columns", df.shape[1])
+            with col2:
+                st.metric("Missing", f"{df.isnull().sum().sum():,}")
+                st.metric("Size", f"{df.memory_usage(deep=True).sum() / 1024**2:.1f} MB")
+        # Show insights count if analysis is complete (now shows top 5)
+        if st.session_state.analysis_results:
+            insights = st.session_state.analysis_results.get('insights', [])
+            recommendations = st.session_state.analysis_results.get('recommendations', [])
+            # Process to get clean counts (max 5 each)
+            processed_insights_count = min(len([i for i in insights if isinstance(i, str) and len(i.strip()) > 20]), 5)
+            processed_recommendations_count = min(len([r for r in recommendations if isinstance(r, str) and len(r.strip()) > 20]), 5)
+            st.markdown("---")
+            st.subheader("🧠 Analysis Results")
+            col1, col2 = st.columns(2)
+            with col1:
+                st.metric("💡 Top Insights", processed_insights_count)
+            with col2:
+                st.metric("🎯 Top Recommendations", processed_recommendations_count)
+        st.markdown("---")
+        # Help Section
+        with st.expander("💡 Quick Help"):
+            st.markdown("""
+            **Supported Formats:**
+            - CSV files (.csv)
+            - Excel files (.xlsx, .xls)
+            - JSON files (.json)
+            **Best Practices:**
+            - Clean column names
+            - Handle missing values
+            - Include date columns
+            - Mix numeric & categorical data
+            **Need Help?**
+            - [Documentation](https://github.com/yourusername/ai-data-analysis-agent)
+            - [Examples](https://github.com/yourusername/ai-data-analysis-agent/examples)
+            """)
+    return api_key_configured
+def display_hero_section():
+    """Display the beautiful hero section"""
+    st.markdown('<div class="main-header animate-fade-in">AIDA-AI Data Analyzer </div>', unsafe_allow_html=True)
+    st.markdown("""
+    <div class="subtitle animate-fade-in">
+        Transform your raw data into actionable business insights with the power of AI.<br>
+        Upload, analyze, and discover patterns automatically using intelligent agents.
+    </div>
+    """, unsafe_allow_html=True)
+def display_features():
+    """Display feature cards"""
+    st.markdown("### ✨ What This AI Agent Can Do")
+    col1, col2, col3 = st.columns(3)
+    with col1:
+        st.markdown("""
+        <div class="feature-card">
+            <div class="feature-icon">🧠</div>
+            <div class="feature-title">Intelligent Analysis</div>
+            <div class="feature-description">
+                Our AI automatically understands your data structure, identifies patterns,
+                and generates meaningful insights without any manual configuration.
+            </div>
+        </div>
+        """, unsafe_allow_html=True)
+    with col2:
+        st.markdown("""
+        <div class="feature-card">
+            <div class="feature-icon">📊</div>
+            <div class="feature-title">Smart Visualizations</div>
+            <div class="feature-description">
+                Automatically creates the most appropriate charts and graphs for your data,
+                with interactive visualizations.
+            </div>
+        </div>
+        """, unsafe_allow_html=True)
+    with col3:
+        st.markdown("""
+        <div class="feature-card">
+            <div class="feature-icon">🎯</div>
+            <div class="feature-title">Actionable Recommendations</div>
+            <div class="feature-description">
+                Get specific, measurable recommendations for improving your business
+                based on data-driven insights.
+            </div>
+        </div>
+        """, unsafe_allow_html=True)
+def upload_dataset():
+    """Beautiful dataset upload section"""
+    st.markdown("### 📊 Upload Your Dataset")
+    uploaded_file = st.file_uploader(
+        "",
+        type=['csv', 'xlsx', 'xls', 'json'],
+        help="Drag and drop your file here or click to browse",
+        label_visibility="collapsed"
+    )
+    if uploaded_file is not None:
+        try:
+            # Show loading spinner
+            with st.spinner("🔍 Processing your dataset..."):
+                time.sleep(1)  # Small delay for UX
+                # Read the file based on extension
+                if uploaded_file.name.endswith('.csv'):
+                    df = pd.read_csv(uploaded_file)
+                elif uploaded_file.name.endswith(('.xlsx', '.xls')):
+                    df = pd.read_excel(uploaded_file)
+                elif uploaded_file.name.endswith('.json'):
+                    df = pd.read_json(uploaded_file)
+                else:
+                    st.error("Unsupported file format")
+                    return False
+            st.session_state.dataset = df
+            st.session_state.uploaded_filename = uploaded_file.name
+            # Success message
+            st.success(f"✅ Successfully loaded **{uploaded_file.name}**")
+            # Beautiful metrics display
+            col1, col2, col3, col4 = st.columns(4)
+            with col1:
+                st.markdown(f"""
+                <div class="metric-card">
+                    <div class="metric-value">{df.shape[0]:,}</div>
+                    <div class="metric-label">Rows</div>
+                </div>
+                """, unsafe_allow_html=True)
+            with col2:
+                st.markdown(f"""
+                <div class="metric-card">
+                    <div class="metric-value">{df.shape[1]}</div>
+                    <div class="metric-label">Columns</div>
+                </div>
+                """, unsafe_allow_html=True)
+            with col3:
+                missing = df.isnull().sum().sum()
+                st.markdown(f"""
+                <div class="metric-card">
+                    <div class="metric-value">{missing:,}</div>
+                    <div class="metric-label">Missing Values</div>
+                </div>
+                """, unsafe_allow_html=True)
+            with col4:
+                size_mb = df.memory_usage(deep=True).sum() / 1024**2
+                st.markdown(f"""
+                <div class="metric-card">
+                    <div class="metric-value">{size_mb:.1f} MB</div>
+                    <div class="metric-label">File Size</div>
+                </div>
+                """, unsafe_allow_html=True)
+            st.markdown("<br>", unsafe_allow_html=True)
+            # Data preview with beautiful styling
+            st.markdown("#### 📋 Data Preview")
+            st.dataframe(
+                df.head(10),
+                use_container_width=True,
+                height=300
+            )
+            # Column information in expandable section
+            with st.expander("📊 Detailed Column Information", expanded=False):
+                col_info = pd.DataFrame({
+                    'Column': df.columns,
+                    'Type': df.dtypes.astype(str),
+                    'Non-Null': df.count(),
+                    'Null Count': df.isnull().sum(),
+                    'Unique Values': df.nunique(),
+                    'Sample Data': [str(df[col].iloc[0]) if len(df) > 0 else '' for col in df.columns]
+                })
+                st.dataframe(col_info, use_container_width=True)
+            return True
+        except Exception as e:
+            st.error(f"❌ Error reading file: {str(e)}")
+            return False
+    else:
+        # Show upload placeholder
+        st.markdown("""
+        <div class="upload-area">
+            <div style="font-size: 3rem; margin-bottom: 1rem;">📁</div>
+            <div style="font-size: 1.2rem; font-weight: 600; margin-bottom: 0.5rem;">
+                Drop your dataset here
+            </div>
+            <div style="color: #64748b;">
+                Supports CSV, Excel, and JSON files • Max 200MB
+            </div>
+        </div>
+        """, unsafe_allow_html=True)
+    return False
+def run_analysis():
+    """Run the AI analysis with beautiful progress indicators"""
+    if st.session_state.dataset is None:
+        st.warning("Please upload a dataset first.")
+        return
+    # Check for API key from environment or session state
+    api_key = os.environ.get('GROQ_API_KEY') or st.session_state.get('groq_api_key')
+    if not api_key:
+        st.warning("Please set GROQ_API_KEY environment variable or enter it in the sidebar.")
+        return
+    # Create agent
+    with st.spinner("🤖 Initializing AI agent..."):
+        agent = create_agent()
+        if agent is None:
+            st.error("Failed to initialize AI agent. Check your API key.")
+            return
+    st.session_state.agent = agent
+    # Save dataset temporarily
+    temp_file = "temp_dataset.csv"
+    st.session_state.dataset.to_csv(temp_file, index=False)
+    # Beautiful progress tracking
+    progress_container = st.container()
+    with progress_container:
+        st.markdown("### 🚀 AI Analysis in Progress")
+        # Progress bar
+        progress_bar = st.progress(0)
+        status_text = st.empty()
+        # Step indicators
+        steps = [
+            ("🔍", "Analyzing dataset structure"),
+            ("📊", "Examining columns and data quality"),
+            ("🧠", "Generating AI insights"),
+            ("📈", "Planning visualizations"),
+            ("🎨", "Creating charts"),
+            ("🎯", "Formulating recommendations")
+        ]
+        step_cols = st.columns(len(steps))
+        step_indicators = []
+        for i, (icon, desc) in enumerate(steps):
+            with step_cols[i]:
+                step_indicators.append(st.empty())
+                step_indicators[i].markdown(f"""
+                <div style="text-align: center; padding: 1rem; opacity: 0.3;">
+                    <div style="font-size: 2rem;">{icon}</div>
+                    <div style="font-size: 0.8rem; margin-top: 0.5rem;">{desc}</div>
+                </div>
+                """, unsafe_allow_html=True)
+    try:
+        # Step 1
+        step_indicators[0].markdown(f"""
+        <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
+            <div style="font-size: 2rem;">🔍</div>
+            <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Analyzing Structure</div>
+        </div>
+        """, unsafe_allow_html=True)
+        status_text.markdown("**🔍 AI agent analyzing dataset structure...**")
+        progress_bar.progress(15)
+        time.sleep(1)
+        # Step 2
+        step_indicators[1].markdown(f"""
+        <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
+            <div style="font-size: 2rem;">📊</div>
+            <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Examining Data</div>
+        </div>
+        """, unsafe_allow_html=True)
+        status_text.markdown("**📊 Analyzing columns and data quality...**")
+        progress_bar.progress(30)
+        time.sleep(1)
+        # Step 3
+        step_indicators[2].markdown(f"""
+        <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
+            <div style="font-size: 2rem;">🧠</div>
+            <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">AI Thinking</div>
+        </div>
+        """, unsafe_allow_html=True)
+        status_text.markdown("**🧠 Generating insights with AI...**")
+        progress_bar.progress(50)
+        time.sleep(1)
+        # Step 4
+        step_indicators[3].markdown(f"""
+        <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
+            <div style="font-size: 2rem;">📈</div>
+            <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Planning Charts</div>
+        </div>
+        """, unsafe_allow_html=True)
+        status_text.markdown("**📈 Planning optimal visualizations...**")
+        progress_bar.progress(70)
+        time.sleep(1)
+        # Step 5
+        step_indicators[4].markdown(f"""
+        <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
+            <div style="font-size: 2rem;">🎨</div>
+            <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Creating Charts</div>
+        </div>
+        """, unsafe_allow_html=True)
+        status_text.markdown("**🎨 Creating beautiful visualizations...**")
+        progress_bar.progress(85)
+        # Run the actual analysis
+        results = agent.analyze_dataset(temp_file)
+        # Step 6
+        step_indicators[5].markdown(f"""
+        <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
+            <div style="font-size: 2rem;">🎯</div>
+            <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Final Recommendations</div>
+        </div>
+        """, unsafe_allow_html=True)
+        status_text.markdown("**🎯 Formulating actionable recommendations...**")
+        progress_bar.progress(100)
+        # Clean up temp file
+        if os.path.exists(temp_file):
+            os.remove(temp_file)
+        if "error" in results:
+            st.error(f"❌ Analysis failed: {results['error']}")
+            return
+        st.session_state.analysis_results = results
+        st.session_state.analysis_complete = True
+        # Success animation
+        status_text.markdown("**✅ Analysis completed successfully!**")
+        # Show completion message
+        st.balloons()
+        time.sleep(1)
+        # Clear progress and show results
+        progress_container.empty()
+        st.rerun()
+    except Exception as e:
+        st.error(f"❌ Analysis failed: {str(e)}")
+        if os.path.exists(temp_file):
+            os.remove(temp_file)
+def display_results():
+    """Display beautiful analysis results"""
+    results = st.session_state.analysis_results
+    if results is None:
+        return
+    # Results header
+    st.markdown("""
+    <div style="text-align: center; margin: 3rem 0;">
+        <h1 style="font-size: 2.5rem; color: #1e293b; margin-bottom: 0.5rem;">📊 Analysis Complete!</h1>
+        <p style="font-size: 1.1rem; color: #64748b;">Here are your AI-generated insights and recommendations</p>
+    </div>
+    """, unsafe_allow_html=True)
+    # Dataset Overview with beautiful cards
+    st.markdown("### 📋 Dataset Overview")
+    info = results.get('dataset_info', {})
+    col1, col2, col3, col4, col5 = st.columns(5)
+    metrics = [
+        ("📊", "Total Rows", f"{info.get('shape', [0])[0]:,}", "#3b82f6"),
+        ("📋", "Columns", str(info.get('shape', [0, 0])[1]), "#8b5cf6"),
+        ("🔢", "Numeric", str(len(info.get('numeric_columns', []))), "#06b6d4"),
+        ("📝", "Categorical", str(len(info.get('categorical_columns', []))), "#10b981"),
+        ("✨", "Quality Score", f"{max(0, 100 - (sum(info.get('null_counts', {}).values()) / max(info.get('shape', [1, 1])[0] * info.get('shape', [1, 1])[1], 1) * 100)):.0f}%", "#f59e0b")
+    ]
+    for i, (icon, label, value, color) in enumerate(metrics):
+        with [col1, col2, col3, col4, col5][i]:
+            st.markdown(f"""
+            <div style="
+                background: linear-gradient(135deg, {color}15 0%, {color}25 100%);
+                border: 2px solid {color}30;
+                border-radius: 16px;
+                padding: 1.5rem;
+                text-align: center;
+                margin: 0.5rem 0;
+                transition: transform 0.2s ease;
+            ">
+                <div style="font-size: 2rem; margin-bottom: 0.5rem;">{icon}</div>
+                <div style="font-size: 1.8rem; font-weight: 700; color: {color}; margin-bottom: 0.25rem;">{value}</div>
+                <div style="font-size: 0.9rem; color: #64748b; font-weight: 500;">{label}</div>
+            </div>
+            """, unsafe_allow_html=True)
+    st.markdown("<br>", unsafe_allow_html=True)
+    # Key Insights Section - Extract complete insights with headers and content combined
+    st.markdown("### 💡 Key Insights")
+    insights = results.get('insights', [])
+    if insights:
+        # Combine all insight text and parse properly
+        full_text = ' '.join(str(item) for item in insights)
+        # Extract complete insights (header + content) using regex
+        import re
+        # Pattern to match **Insight X:** followed by content until next insight or end
+        insight_pattern = r'\*\*Insight (\d+):(.*?)(?=\*\*Insight \d+:|$)'
+        matches = re.findall(insight_pattern, full_text, re.DOTALL)
+        processed_insights = []
+        for match in matches:
+            insight_num, content = match
+            clean_content = content.strip().rstrip('*')
+            if len(clean_content) > 20:
+                processed_insights.append(clean_content)
+        # Take top 5 insights
+        top_insights = processed_insights[:5]
+        if top_insights:
+            st.markdown(f"**Top {len(top_insights)} key insights from your data:**")
+            st.markdown("<br>", unsafe_allow_html=True)
+            for i, insight in enumerate(top_insights):
+                st.markdown(f"""
+                <div class="insight-box animate-fade-in">
+                    <div style="display: flex; align-items: flex-start; gap: 1rem;">
+                        <div style="
+                            background: #3b82f6;
+                            color: white;
+                            border-radius: 50%;
+                            width: 32px;
+                            height: 32px;
+                            display: flex;
+                            align-items: center;
+                            justify-content: center;
+                            font-weight: bold;
+                            font-size: 0.9rem;
+                            flex-shrink: 0;
+                        ">{i+1}</div>
+                        <div style="flex: 1;">
+                            <strong style="color: #1e293b;">💡 Key Insight {i+1}:</strong><br>
+                            <span style="color: #475569; line-height: 1.6;">{insight}</span>
+                        </div>
+                    </div>
+                </div>
+                """, unsafe_allow_html=True)
+        else:
+            st.info("🔍 No insights could be extracted from the analysis.")
+    else:
+        st.info("🔍 No insights were generated.")
+    # Interactive Visualizations Section
+    st.markdown("### 📈 Interactive Data Exploration")
+    if st.session_state.dataset is not None:
+        df = st.session_state.dataset
+        # Beautiful tabs
+        tab1, tab2, tab3, tab4 = st.tabs([
+            "📊 Distributions",
+            "🔗 Correlations",
+            "📈 Trends & Patterns",
+            "🎯 Custom Analysis"
+        ])
+        with tab1:
+            st.markdown("#### 📊 Distribution Analysis")
+            numeric_cols = df.select_dtypes(include=['number']).columns.tolist()
+            if len(numeric_cols) > 0:
+                # Column selector at the top
+                selected_col = st.selectbox(
+                    "Select column to analyze",
+                    numeric_cols,
+                    key="dist_col"
+                )
+                st.markdown("<br>", unsafe_allow_html=True)
+                # Show all three plots side by side
+                col1, col2, col3 = st.columns(3)
+                with col1:
+                    st.markdown("**Histogram**")
+                    fig_hist = px.histogram(
+                        df,
+                        x=selected_col,
+                        title=f"Histogram",
+                        nbins=30,
+                        color_discrete_sequence=['#3b82f6']
+                    )
+                    fig_hist.update_layout(
+                        height=380,
+                        plot_bgcolor='rgba(0,0,0,0)',
+                        paper_bgcolor='rgba(0,0,0,0)',
+                        title_font_size=14,
+                        margin=dict(t=40, b=40, l=40, r=40)
+                    )
+                    st.plotly_chart(fig_hist, use_container_width=True)
+                with col2:
+                    st.markdown("**Box Plot**")
+                    fig_box = px.box(
+                        df,
+                        y=selected_col,
+                        title=f"Box Plot",
+                        color_discrete_sequence=['#8b5cf6']
+                    )
+                    fig_box.update_layout(
+                        height=380,
+                        plot_bgcolor='rgba(0,0,0,0)',
+                        paper_bgcolor='rgba(0,0,0,0)',
+                        title_font_size=14,
+                        margin=dict(t=40, b=40, l=40, r=40)
+                    )
+                    st.plotly_chart(fig_box, use_container_width=True)
+                with col3:
+                    st.markdown("**Violin Plot**")
+                    fig_violin = px.violin(
+                        df,
+                        y=selected_col,
+                        title=f"Violin Plot",
+                        color_discrete_sequence=['#06b6d4']
+                    )
+                    fig_violin.update_layout(
+                        height=380,
+                        plot_bgcolor='rgba(0,0,0,0)',
+                        paper_bgcolor='rgba(0,0,0,0)',
+                        title_font_size=14,
+                        margin=dict(t=40, b=40, l=40, r=40)
+                    )
+                    st.plotly_chart(fig_violin, use_container_width=True)
+                # Statistics cards below the plots
+                st.markdown("#### 📊 Statistical Summary")
+                stats_col1, stats_col2, stats_col3, stats_col4, stats_col5 = st.columns(5)
+                stats = [
+                    ("Mean", f"{df[selected_col].mean():.2f}", "#3b82f6"),
+                    ("Median", f"{df[selected_col].median():.2f}", "#8b5cf6"),
+                    ("Std Dev", f"{df[selected_col].std():.2f}", "#06b6d4"),
+                    ("Min", f"{df[selected_col].min():.2f}", "#10b981"),
+                    ("Max", f"{df[selected_col].max():.2f}", "#f59e0b")
+                ]
+                for i, (label, value, color) in enumerate(stats):
+                    with [stats_col1, stats_col2, stats_col3, stats_col4, stats_col5][i]:
+                        st.markdown(f"""
+                        <div style="
+                            background: {color}15;
+                            border: 1px solid {color}30;
+                            border-radius: 12px;
+                            padding: 1rem;
+                            text-align: center;
+                        ">
+                            <div style="font-size: 1.4rem; font-weight: 700; color: {color};">{value}</div>
+                            <div style="font-size: 0.85rem; color: #64748b; margin-top: 0.25rem;">{label}</div>
+                        </div>
+                        """, unsafe_allow_html=True)
+            else:
+                st.info("📊 No numeric columns found for distribution analysis.")
+        with tab2:
+            st.markdown("#### 🔗 Correlation Analysis")
+            if len(numeric_cols) > 1:
+                # Correlation matrix heatmap
+                corr_matrix = df[numeric_cols].corr()
+                fig = px.imshow(
+                    corr_matrix,
+                    text_auto=True,
+                    aspect="auto",
+                    title="Correlation Matrix",
+                    color_continuous_scale="RdBu_r",
+                    zmin=-1,
+                    zmax=1
+                )
+                fig.update_layout(
+                    height=500,
+                    plot_bgcolor='rgba(0,0,0,0)',
+                    paper_bgcolor='rgba(0,0,0,0)'
+                )
+                st.plotly_chart(fig, use_container_width=True)
+                # Top correlations
+                st.markdown("#### 🔗 Strongest Correlations")
+                correlations = []
+                for i in range(len(corr_matrix.columns)):
+                    for j in range(i+1, len(corr_matrix.columns)):
+                        corr_val = corr_matrix.iloc[i, j]
+                        if not pd.isna(corr_val):
+                            correlations.append({
+                                'Variable 1': corr_matrix.columns[i],
+                                'Variable 2': corr_matrix.columns[j],
+                                'Correlation': corr_val,
+                                'Strength': abs(corr_val)
+                            })
+                if correlations:
+                    corr_df = pd.DataFrame(correlations)
+                    corr_df = corr_df.sort_values('Strength', ascending=False).head(10)
+                    # Display as beautiful cards
+                    for _, row in corr_df.head(5).iterrows():
+                        strength = "Strong" if row['Strength'] > 0.7 else "Moderate" if row['Strength'] > 0.5 else "Weak"
+                        color = "#ef4444" if row['Strength'] > 0.7 else "#f59e0b" if row['Strength'] > 0.5 else "#10b981"
+                        st.markdown(f"""
+                        <div style="
+                            background: {color}15;
+                            border-left: 4px solid {color};
+                            border-radius: 8px;
+                            padding: 1rem;
+                            margin: 0.5rem 0;
+                        ">
+                            <div style="font-weight: 600; color: #1e293b; margin-bottom: 0.5rem;">
+                                {row['Variable 1']} ↔ {row['Variable 2']}
+                            </div>
+                            <div style="color: #64748b;">
+                                Correlation: <strong style="color: {color};">{row['Correlation']:.3f}</strong>
+                                ({strength} relationship)
+                            </div>
+                        </div>
+                        """, unsafe_allow_html=True)
+            else:
+                st.info("🔗 Need at least 2 numeric columns for correlation analysis.")
+        with tab3:
+            st.markdown("#### 📈 Trends & Patterns")
+            date_cols = df.select_dtypes(include=['datetime64']).columns.tolist()
+            cat_cols = df.select_dtypes(include=['object', 'category']).columns.tolist()
+            if len(date_cols) > 0 and len(numeric_cols) > 0:
+                col1, col2 = st.columns(2)
+                with col1:
+                    date_col = st.selectbox("Date column", date_cols, key="trend_date")
+                with col2:
+                    value_col = st.selectbox("Value column", numeric_cols, key="trend_value")
+                df_sorted = df.sort_values(date_col)
+                fig = px.line(
+                    df_sorted,
+                    x=date_col,
+                    y=value_col,
+                    title=f"{value_col} Over Time",
+                    color_discrete_sequence=['#3b82f6']
+                )
+                fig.update_layout(height=400)
+                st.plotly_chart(fig, use_container_width=True)
+            elif cat_cols and numeric_cols:
+                st.markdown("#### 📊 Category-based Analysis")
+                col1, col2, col3 = st.columns(3)
+                with col1:
+                    cat_col = st.selectbox("Category", cat_cols, key="cat_trend")
+                with col2:
+                    num_col = st.selectbox("Numeric value", numeric_cols, key="num_trend")
+                with col3:
+                    agg_func = st.selectbox("Aggregation", ["mean", "sum", "count", "median"])
+                if agg_func == "count":
+                    grouped = df.groupby(cat_col).size().reset_index(name='count')
+                    y_col = 'count'
+                else:
+                    grouped = df.groupby(cat_col)[num_col].agg(agg_func).reset_index()
+                    y_col = num_col
+                fig = px.bar(
+                    grouped,
+                    x=cat_col,
+                    y=y_col,
+                    title=f"{agg_func.title()} of {num_col if agg_func != 'count' else 'Count'} by {cat_col}",
+                    color_discrete_sequence=['#8b5cf6']
+                )
+                fig.update_layout(height=400)
+                st.plotly_chart(fig, use_container_width=True)
+            else:
+                st.info("📈 Upload data with date columns or categorical data to see trends.")
+        with tab4:
+            st.markdown("#### 🎯 Custom Analysis Builder")
+            col1, col2 = st.columns([1, 2])
+            with col1:
+                viz_type = st.selectbox(
+                    "Chart Type",
+                    ["Scatter Plot", "Bar Chart", "Pie Chart", "Sunburst", "Treemap"]
+                )
+                if viz_type == "Scatter Plot" and len(numeric_cols) >= 2:
+                    x_col = st.selectbox("X-axis", numeric_cols, key="custom_x")
+                    y_col = st.selectbox("Y-axis", numeric_cols, key="custom_y")
+                    color_col = st.selectbox("Color by", ["None"] + list(df.columns), key="custom_color")
+                    size_col = st.selectbox("Size by", ["None"] + numeric_cols, key="custom_size")
+                elif viz_type in ["Bar Chart", "Pie Chart"] and cat_cols:
+                    cat_col = st.selectbox("Category", cat_cols, key="custom_cat")
+                    if numeric_cols:
+                        val_col = st.selectbox("Value (optional)", ["Count"] + numeric_cols, key="custom_val")
+                    else:
+                        val_col = "Count"
+            with col2:
+                try:
+                    if viz_type == "Scatter Plot" and len(numeric_cols) >= 2:
+                        fig = px.scatter(
+                            df,
+                            x=x_col,
+                            y=y_col,
+                            color=None if color_col == "None" else color_col,
+                            size=None if size_col == "None" else size_col,
+                            title=f"{y_col} vs {x_col}",
+                            color_discrete_sequence=['#3b82f6'],
+                            hover_data=df.columns[:5].tolist()
+                        )
+                        fig.update_layout(height=500)
+                        st.plotly_chart(fig, use_container_width=True)
+                    elif viz_type == "Pie Chart" and cat_cols:
+                        if val_col == "Count":
+                            value_counts = df[cat_col].value_counts().head(8)
+                            fig = px.pie(
+                                values=value_counts.values,
+                                names=value_counts.index,
+                                title=f"Distribution of {cat_col}"
+                            )
+                        else:
+                            grouped = df.groupby(cat_col)[val_col].sum().head(8)
+                            fig = px.pie(
+                                values=grouped.values,
+                                names=grouped.index,
+                                title=f"{val_col} by {cat_col}"
+                            )
+                        fig.update_layout(height=500)
+                        st.plotly_chart(fig, use_container_width=True)
+                except Exception as e:
+                    st.error(f"Error creating visualization: {str(e)}")
+    # Recommendations Section - Extract complete recommendations with headers and content combined
+    st.markdown("### 🎯 AI-Generated Recommendations")
+    recommendations = results.get('recommendations', [])
+    if recommendations:
+        # Combine all recommendation text and parse properly
+        full_text = ' '.join(str(item) for item in recommendations)
+        # Extract complete recommendations using regex
+        import re
+        # Pattern to match recommendations (various formats)
+        rec_patterns = [
+            r'\*\*.*?(\d+):(.*?)(?=\*\*.*?\d+:|$)',  # **Something 1:** format
+            r'(\d+)\.\s+(.*?)(?=\d+\.|$)',           # 1. format
+        ]
+        processed_recommendations = []
+        for pattern in rec_patterns:
+            matches = re.findall(pattern, full_text, re.DOTALL)
+            if matches:
+                for match in matches:
+                    if len(match) == 2:
+                        rec_num, content = match
+                        clean_content = content.strip().rstrip('*')
+                        if len(clean_content) > 20:
+                            processed_recommendations.append(clean_content)
+                break
+        # Take top 5 recommendations
+        top_recommendations = processed_recommendations[:5]
+        if top_recommendations:
+            st.markdown(f"**Top {len(top_recommendations)} actionable recommendations:**")
+            st.markdown("<br>", unsafe_allow_html=True)
+            for i, rec in enumerate(top_recommendations):
+                st.markdown(f"""
+                <div class="recommendation-box animate-fade-in">
+                    <div style="display: flex; align-items: flex-start; gap: 1rem;">
+                        <div style="
+                            background: #22c55e;
+                            color: white;
+                            border-radius: 50%;
+                            width: 32px;
+                            height: 32px;
+                            display: flex;
+                            align-items: center;
+                            justify-content: center;
+                            font-weight: bold;
+                            font-size: 0.9rem;
+                            flex-shrink: 0;
+                        ">{i+1}</div>
+                        <div style="flex: 1;">
+                            <strong style="color: #1e293b;">🎯 Recommendation {i+1}:</strong><br>
+                            <span style="color: #475569; line-height: 1.6;">{rec}</span>
+                        </div>
+                    </div>
+                </div>
+                """, unsafe_allow_html=True)
+        else:
+            st.info("🎯 No recommendations could be extracted from the analysis.")
+    else:
+        st.info("🎯 No recommendations were generated.")
+    # Download Results Section
+    st.markdown("### 💾 Download Your Results")
+    col1, col2, col3 = st.columns(3)
+    download_items = [
+        ("📄", "Analysis Report (JSON)", "Download complete analysis", "json"),
+        ("📊", "Enhanced Dataset (CSV)", "Download processed data", "csv"),
+        ("📋", "Executive Summary (MD)", "Download business report", "md")
+    ]
+    for i, (icon, title, desc, file_type) in enumerate(download_items):
+        with [col1, col2, col3][i]:
+            st.markdown(f"""
+            <div style="
+                background: linear-gradient(135deg, #f8fafc 0%, #f1f5f9 100%);
+                border: 2px solid #e2e8f0;
+                border-radius: 16px;
+                padding: 1.5rem;
+                text-align: center;
+                margin: 0.5rem 0;
+                transition: all 0.3s ease;
+            ">
+                <div style="font-size: 2.5rem; margin-bottom: 1rem;">{icon}</div>
+                <div style="font-size: 1.1rem; font-weight: 600; margin-bottom: 0.5rem; color: #1e293b;">{title}</div>
+                <div style="font-size: 0.9rem; color: #64748b; margin-bottom: 1rem;">{desc}</div>
+            """, unsafe_allow_html=True)
+            if file_type == "json":
+                data = json.dumps(results, indent=2, default=str)
+                filename = f"analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
+                mime = "application/json"
+            elif file_type == "csv":
+                data = st.session_state.dataset.to_csv(index=False)
+                filename = f"enhanced_dataset_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
+                mime = "text/csv"
+            else:  # md
+                data = generate_report(results)
+                filename = f"executive_summary_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md"
+                mime = "text/markdown"
+            st.download_button(
+                label=f"Download {file_type.upper()}",
+                data=data,
+                file_name=filename,
+                mime=mime,
+                use_container_width=True
+            )
+            st.markdown("</div>", unsafe_allow_html=True)
+def generate_report(results):
+    """Generate a beautiful markdown report"""
+    filename = getattr(st.session_state, 'uploaded_filename', 'dataset')
+    report = f"""# 🤖 AI Data Analysis Executive Summary
+**Dataset:** {filename}
+**Generated:** {datetime.now().strftime('%B %d, %Y at %I:%M %p')}
+**Powered by:** Llama 3 & LangGraph AI Agents
+---
+## 📊 Executive Overview
+This report presents key findings from an AI-powered analysis of your dataset. Our advanced language models have identified patterns, trends, and opportunities that can drive business decisions.
+### Dataset Metrics
+- **Total Records:** {results.get('dataset_info', {}).get('shape', [0])[0]:,}
+- **Data Points:** {len(results.get('dataset_info', {}).get('columns', []))}
+- **Data Quality Score:** {max(0, 100 - (sum(results.get('dataset_info', {}).get('null_counts', {}).values()) / max(results.get('dataset_info', {}).get('shape', [1, 1])[0] * results.get('dataset_info', {}).get('shape', [1, 1])[1], 1) * 100)):.0f}%
+---
+## 💡 Strategic Insights
+Our AI analysis has uncovered the following key insights:
+"""
+    insights = results.get('insights', [])
+    if insights:
+        for i, insight in enumerate(insights, 1):
+            report += f"**{i}.** {insight}\n\n"
+    else:
+        report += "*No specific insights were generated for this dataset.*\n\n"
+    report += """---
+## 🎯 Recommended Actions
+Based on the data analysis, we recommend the following strategic actions:
+"""
+    recommendations = results.get('recommendations', [])
+    if recommendations:
+        for i, rec in enumerate(recommendations, 1):
+            report += f"**{i}.** {rec}\n\n"
+    else:
+        report += "*No specific recommendations were generated for this dataset.*\n\n"
+    report += f"""---
+## 🔧 Technical Summary
+- **Analysis Completed:** {results.get('analysis_timestamp', 'N/A')}
+- **Visualizations Created:** {len(results.get('visualizations', []))}
+- **Processing Errors:** {len(results.get('errors', []))}
+- **AI Model Used:** Llama 3 (70B parameters)
+---
+## 📈 Next Steps
+1. **Review Insights:** Analyze each insight for immediate actionable opportunities
+2. **Implement Recommendations:** Prioritize recommendations based on business impact
+3. **Monitor Progress:** Track key metrics identified in this analysis
+4. **Iterate:** Regular re-analysis as new data becomes available
+---
+*This report was generated automatically by our AI Data Analysis Agent. For questions or support, please contact your data team.*
+"""
+    return report
+def main():
+    """Main application function with beautiful design"""
+    initialize_session_state()
+    # Check if analysis is complete to show results immediately
+    if st.session_state.analysis_complete and st.session_state.analysis_results:
+        display_results()
+        # Add a "Start New Analysis" button
+        st.markdown("---")
+        col1, col2, col3 = st.columns([1, 1, 1])
+        with col2:
+            if st.button("🔄 Start New Analysis", use_container_width=True):
+                # Reset session state
+                st.session_state.analysis_results = None
+                st.session_state.analysis_complete = False
+                st.session_state.dataset = None
+                st.rerun()
+        return
+    # Hero Section
+    display_hero_section()
+    # Feature showcase
+    display_features()
+    # Sidebar configuration
+    api_configured = sidebar_config()
+    if not api_configured:
+        # Beautiful warning with setup instructions
+        st.markdown("""
+        <div style="
+            background: linear-gradient(135deg, #fef3c7 0%, #fde68a 100%);
+            border: 2px solid #f59e0b;
+            border-radius: 16px;
+            padding: 2rem;
+            margin: 2rem 0;
+            text-align: center;
+        ">
+            <div style="font-size: 3rem; margin-bottom: 1rem;">🔑</div>
+            <h3 style="color: #92400e; margin-bottom: 1rem;">API Key Required</h3>
+            <p style="color: #78350f; margin-bottom: 1.5rem;">
+                Please configure your Groq API key to unlock the power of AI analysis
+            </p>
+        </div>
+        """, unsafe_allow_html=True)
+        # Expandable setup guide
+        with st.expander("🚀 Quick Setup Guide", expanded=True):
+            st.markdown("""
+            ### Option 1: Environment Variable (Recommended)
+            ```bash
+            export GROQ_API_KEY="your_api_key_here"
+            streamlit run web_app.py
+            ```
+            ### Option 2: Manual Entry
+            1. Visit [Groq Console](https://console.groq.com/) 🔗
+            2. Create a free account and generate your API key
+            3. Enter the key in the sidebar ←
+            4. Upload your dataset and start analyzing!
+            ### Supported File Formats
+            - **CSV files** (.csv) - Most common format
+            - **Excel files** (.xlsx, .xls) - Spreadsheet data
+            - **JSON files** (.json) - Structured data
+            ### Tips for Best Results
+            - Ensure clean, well-structured data
+            - Include meaningful column names
+            - Mix of numeric and categorical columns works best
+            - Date/time columns enable trend analysis
+            """)
+        return
+    # Main content area with beautiful layout
+    st.markdown("---")
+    # Dataset upload section
+    dataset_uploaded = upload_dataset()
+    # Analysis section
+    if dataset_uploaded:
+        st.markdown("---")
+        # Center the analyze button with beautiful styling
+        col1, col2, col3 = st.columns([1, 2, 1])
+        with col2:
+            if st.button(
+                "🚀 Analyze My Data with AI",
+                type="primary",
+                use_container_width=True,
+                help="Start the AI-powered analysis of your dataset"
+            ):
+                run_analysis()
+    # Footer
+    st.markdown("""
+    <div class="footer">
+        <div style="max-width: 800px; margin: 0 auto;">
+            <div style="font-size: 1.5rem; margin-bottom: 1rem;">🤖✨</div>
+            <p style="margin-bottom: 1rem;">
+                <strong>AI Data Analysis Agent</strong> - Transform your data into actionable insights
+            </p>
+            <p style="font-size: 0.85rem; margin-bottom: 1rem;">
+                Powered by <strong>Llama 3</strong> • Built with <strong>LangGraph</strong> •
+                Designed with <strong>Streamlit</strong>
+            </p>
+            <div style="display: flex; justify-content: center; gap: 2rem; font-size: 0.9rem;">
+                <a href="#" style="color: #3b82f6; text-decoration: none;">📖 Documentation</a>
+                <a href="#" style="color: #3b82f6; text-decoration: none;">🐛 Report Issues</a>
+                <a href="#" style="color: #3b82f6; text-decoration: none;">⭐ Give Feedback</a>
+                <a href="#" style="color: #3b82f6; text-decoration: none;">💡 Feature Requests</a>
+            </div>
+        </div>
+    </div>
+    """, unsafe_allow_html=True)
+if __name__ == "__main__":
+    main()