AkashGogineni18 commited on
Commit
0b52104
Β·
1 Parent(s): f7e5edf

intial code

Browse files
Dockerfile ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fixed Dockerfile - works with your existing web_app.py
2
+ FROM python:3.11-slim
3
+
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
+ # Environment variables
8
+ ENV PYTHONUNBUFFERED=1
9
+ ENV PYTHONDONTWRITEBYTECODE=1
10
+ ENV PYTHONPATH=/app:/app/src
11
+
12
+ # Install system dependencies
13
+ RUN apt-get update && apt-get install -y \
14
+ gcc \
15
+ g++ \
16
+ && rm -rf /var/lib/apt/lists/*
17
+
18
+ # Copy and install requirements first (for better caching)
19
+ COPY requirements.txt .
20
+ RUN pip install --no-cache-dir --upgrade pip setuptools wheel && \
21
+ pip install --no-cache-dir -r requirements.txt
22
+
23
+ # Copy ALL files (this ensures both .py files are copied)
24
+ COPY . .
25
+
26
+ # Ensure files have correct permissions
27
+ RUN chmod +r *.py
28
+
29
+ # Debug: Show what files we have
30
+ RUN echo "Files in /app:" && ls -la /app/
31
+
32
+ # Create necessary directories
33
+ RUN mkdir -p /app/temp /app/analysis_output
34
+
35
+ # Expose port
36
+ EXPOSE 7860
37
+
38
+ # Run Streamlit with explicit file path
39
+ CMD ["python", "-m", "streamlit", "run", "/app/web_app.py", "--server.port=7860", "--server.address=0.0.0.0", "--server.headless=true", "--server.enableCORS=false", "--server.enableXsrfProtection=false"]
README 2.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: AIDA - AI Data Analysis Agent
3
+ emoji: πŸ€–
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
9
+ ---
10
+ πŸ€– AIDA - AI Data Analysis Agent
11
+ Transform your raw data into actionable business insights with the power of AI.
12
+ AIDA is an intelligent data analysis system powered by Llama 3 and LangGraph that automatically:
13
+
14
+ πŸ“Š Analyzes dataset structure and quality
15
+ 🧠 Generates AI-powered insights
16
+ πŸ“ˆ Creates intelligent visualizations
17
+ 🎯 Provides actionable business recommendations
18
+
19
+ ✨ Features
20
+
21
+ πŸ” Intelligent Analysis: AI automatically understands your data structure
22
+ πŸ“Š Smart Visualizations: Creates the most appropriate charts for your data
23
+ πŸ’‘ Business Insights: Generates meaningful patterns and trends
24
+ 🎯 Actionable Recommendations: Provides specific, measurable action items
25
+ 🌐 Beautiful Interface: Modern, responsive web interface
26
+
27
+ πŸš€ How to Use
28
+
29
+ Set API Key: Get your free API key from Groq Console
30
+ Upload Data: Support for CSV, Excel, and JSON files
31
+ AI Analysis: Let the AI agents analyze your data automatically
32
+ Get Insights: Review generated insights and recommendations
33
+ Download Results: Export analysis reports and enhanced datasets
34
+
35
+ πŸ“Š Supported File Formats
36
+
37
+ CSV files (.csv) - Most common format
38
+ Excel files (.xlsx, .xls) - Spreadsheet data
39
+ JSON files (.json) - Structured data
40
+
41
+ πŸ€– AI Agents & Workflow
42
+ AIDA uses a sophisticated multi-agent system powered by LangGraph to analyze your data intelligently:
43
+ Agent Architecture
44
+ πŸ” Data Profiler Agent
45
+
46
+ Analyzes dataset structure and characteristics
47
+ Identifies data types, missing values, and quality issues
48
+ Generates initial dataset overview
49
+
50
+ πŸ“Š Column Analyzer Agent
51
+
52
+ Performs detailed analysis of each column
53
+ Calculates statistical measures and distributions
54
+ Identifies patterns and anomalies in individual features
55
+
56
+ 🧠 Insight Generator Agent
57
+
58
+ Uses AI to generate meaningful business insights
59
+ Identifies correlations and relationships
60
+ Discovers hidden patterns in the data
61
+
62
+ πŸ“ˆ Visualization Planner Agent
63
+
64
+ Intelligently selects optimal chart types
65
+ Plans visualization strategy based on data characteristics
66
+ Ensures maximum insight communication
67
+
68
+ 🎨 Chart Creator Agent
69
+
70
+ Creates interactive visualizations
71
+ Generates multiple chart types automatically
72
+ Optimizes visual presentation for clarity
73
+
74
+ 🎯 Recommendation Engine Agent
75
+
76
+ Formulates actionable business recommendations
77
+ Provides specific, measurable action items
78
+ Prioritizes recommendations by potential impact
79
+
80
+ Workflow Process
81
+ πŸ“Š Dataset Upload
82
+ ↓
83
+ πŸ” Data Profiling β†’ πŸ“Š Column Analysis β†’ 🧠 Insight Generation
84
+ ↓ ↓ ↓
85
+ πŸ“ˆ Visualization Planning β†’ 🎨 Chart Creation β†’ 🎯 Recommendations
86
+ ↓
87
+ βœ… Complete Analysis Report
88
+ Each agent operates autonomously while maintaining context through LangGraph's state management, ensuring comprehensive and coherent analysis.
89
+ πŸ”§ Technology Stack
90
+
91
+ AI Models: Llama 3 (70B and 8B variants)
92
+ Agent Framework: LangGraph for intelligent multi-agent workflows
93
+ State Management: TypedDict for structured agent communication
94
+ Frontend: Streamlit with custom CSS styling
95
+ Visualization: Plotly for interactive charts
96
+ Data Processing: Pandas and NumPy
97
+ API Integration: Groq API for LLM access
98
+
99
+ 🌟 Perfect For
100
+
101
+ Business Analysts - Quick data insights
102
+ Data Scientists - Rapid exploratory analysis
103
+ Managers - Data-driven decision making
104
+ Students - Learning data analysis patterns
105
+ Researchers - Dataset understanding
106
+
107
+ πŸ›‘οΈ Privacy & Security
108
+
109
+ No data is stored permanently
110
+ All processing happens in your session
111
+ Files are automatically cleaned up
112
+ API keys are handled securely
113
+
114
+ 🎯 Get Started
115
+ Simply upload your dataset and let AIDA's AI agents work their magic! The system will automatically:
116
+
117
+ Profile your dataset structure
118
+ Analyze data quality and patterns
119
+ Generate business insights
120
+ Create optimal visualizations
121
+ Recommend actionable next steps
122
+
123
+
124
+ Powered by Llama 3 β€’ Built with LangGraph β€’ Designed for Business Impact
__pycache__/data_analysis_agent.cpython-311.pyc ADDED
Binary file (38.1 kB). View file
 
data_analysis_agent.py ADDED
@@ -0,0 +1,657 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import pandas as pd
3
+ import numpy as np
4
+ import matplotlib.pyplot as plt
5
+ import seaborn as sns
6
+ import plotly.express as px
7
+ import plotly.graph_objects as go
8
+ from plotly.subplots import make_subplots
9
+ import warnings
10
+ warnings.filterwarnings('ignore')
11
+
12
+ from typing import Dict, List, Any, Optional, TypedDict
13
+ import json
14
+ from datetime import datetime
15
+ import logging
16
+
17
+ # LangGraph and LLM imports
18
+ from langgraph.graph import StateGraph, END
19
+ from langchain_groq import ChatGroq
20
+ from langchain_core.messages import HumanMessage, SystemMessage
21
+ from langchain_core.prompts import ChatPromptTemplate
22
+
23
+ # Configure logging
24
+ logging.basicConfig(level=logging.INFO)
25
+ logger = logging.getLogger(__name__)
26
+
27
+ class AnalysisState(TypedDict):
28
+ """State structure for the analysis workflow"""
29
+ dataset: pd.DataFrame
30
+ dataset_info: Dict[str, Any]
31
+ column_analysis: Dict[str, Any]
32
+ insights: List[str]
33
+ visualizations: List[Dict[str, Any]]
34
+ recommendations: List[str]
35
+ current_step: str
36
+ error_messages: List[str]
37
+
38
+ class DataAnalysisAgent:
39
+ def __init__(self, groq_api_key: str, model_name: str = "llama3-70b-8192"):
40
+ """Initialize the Data Analysis Agent"""
41
+ # Fixed: Use correct model name format
42
+ self.llm = ChatGroq(
43
+ groq_api_key=groq_api_key,
44
+ model_name=model_name, # Fixed: Use standard model names
45
+ temperature=0.1,
46
+ max_tokens=2000
47
+ )
48
+
49
+ # Set up the analysis workflow graph
50
+ self.workflow = self._create_workflow()
51
+
52
+ def _create_workflow(self) -> StateGraph:
53
+ """Create the LangGraph workflow for data analysis"""
54
+ workflow = StateGraph(AnalysisState)
55
+
56
+ # Add nodes for each analysis step
57
+ workflow.add_node("data_profiler", self._profile_dataset)
58
+ workflow.add_node("column_analyzer", self._analyze_columns)
59
+ workflow.add_node("insight_generator", self._generate_insights)
60
+ workflow.add_node("visualization_planner", self._plan_visualizations)
61
+ workflow.add_node("chart_creator", self._create_charts)
62
+ workflow.add_node("recommendation_engine", self._generate_recommendations)
63
+
64
+ # Define the workflow edges
65
+ workflow.add_edge("data_profiler", "column_analyzer")
66
+ workflow.add_edge("column_analyzer", "insight_generator")
67
+ workflow.add_edge("insight_generator", "visualization_planner")
68
+ workflow.add_edge("visualization_planner", "chart_creator")
69
+ workflow.add_edge("chart_creator", "recommendation_engine")
70
+ workflow.add_edge("recommendation_engine", END)
71
+
72
+ # Set entry point
73
+ workflow.set_entry_point("data_profiler")
74
+
75
+ return workflow.compile()
76
+
77
+ def _profile_dataset(self, state: AnalysisState) -> AnalysisState:
78
+ """Profile the dataset to understand its structure and characteristics"""
79
+ logger.info("Profiling dataset...")
80
+
81
+ try:
82
+ df = state["dataset"]
83
+
84
+ # Basic dataset information
85
+ dataset_info = {
86
+ "shape": df.shape,
87
+ "columns": list(df.columns),
88
+ "dtypes": {col: str(dtype) for col, dtype in df.dtypes.to_dict().items()}, # Fixed: Convert to string
89
+ "memory_usage": int(df.memory_usage(deep=True).sum()), # Fixed: Convert to int
90
+ "null_counts": df.isnull().sum().to_dict(),
91
+ "duplicate_rows": int(df.duplicated().sum()), # Fixed: Convert to int
92
+ "numeric_columns": df.select_dtypes(include=[np.number]).columns.tolist(),
93
+ "categorical_columns": df.select_dtypes(include=['object', 'category']).columns.tolist(),
94
+ "datetime_columns": df.select_dtypes(include=['datetime64']).columns.tolist()
95
+ }
96
+
97
+ # Use LLM to generate initial insights about the dataset
98
+ prompt = f"""
99
+ Analyze this dataset profile and provide initial observations:
100
+
101
+ Dataset Shape: {dataset_info['shape']}
102
+ Columns: {dataset_info['columns']}
103
+ Data Types: {dataset_info['dtypes']}
104
+ Missing Values: {dataset_info['null_counts']}
105
+ Duplicate Rows: {dataset_info['duplicate_rows']}
106
+
107
+ Provide a brief analysis of the dataset structure, data quality issues, and potential analysis opportunities.
108
+ """
109
+
110
+ response = self.llm.invoke([HumanMessage(content=prompt)])
111
+ dataset_info["llm_profile"] = response.content
112
+
113
+ state["dataset_info"] = dataset_info
114
+ state["current_step"] = "data_profiler"
115
+
116
+ except Exception as e:
117
+ logger.error(f"Error in data profiling: {str(e)}")
118
+ # Ensure error_messages exists and add fallback dataset_info
119
+ if "error_messages" not in state:
120
+ state["error_messages"] = []
121
+ if "dataset_info" not in state:
122
+ state["dataset_info"] = {}
123
+
124
+ # Add basic fallback info
125
+ try:
126
+ df = state["dataset"]
127
+ state["dataset_info"] = {
128
+ "shape": df.shape,
129
+ "columns": list(df.columns),
130
+ "dtypes": {col: str(dtype) for col, dtype in df.dtypes.items()},
131
+ "numeric_columns": df.select_dtypes(include=[np.number]).columns.tolist(),
132
+ "categorical_columns": df.select_dtypes(include=['object', 'category']).columns.tolist(),
133
+ "datetime_columns": df.select_dtypes(include=['datetime64']).columns.tolist(),
134
+ "null_counts": df.isnull().sum().to_dict(),
135
+ "duplicate_rows": int(df.duplicated().sum()),
136
+ "memory_usage": int(df.memory_usage(deep=True).sum())
137
+ }
138
+ except Exception:
139
+ # Ultimate fallback
140
+ state["dataset_info"] = {
141
+ "shape": [0, 0],
142
+ "columns": [],
143
+ "dtypes": {},
144
+ "numeric_columns": [],
145
+ "categorical_columns": [],
146
+ "datetime_columns": [],
147
+ "null_counts": {},
148
+ "duplicate_rows": 0,
149
+ "memory_usage": 0
150
+ }
151
+
152
+ state["error_messages"].append(f"Data profiling error: {str(e)}")
153
+
154
+ return state
155
+
156
+ def _analyze_columns(self, state: AnalysisState) -> AnalysisState:
157
+ """Analyze individual columns in detail"""
158
+ logger.info("Analyzing columns...")
159
+
160
+ try:
161
+ df = state["dataset"]
162
+ column_analysis = {}
163
+
164
+ for column in df.columns:
165
+ col_data = df[column]
166
+
167
+ analysis = {
168
+ "dtype": str(col_data.dtype),
169
+ "null_count": int(col_data.isnull().sum()), # Fixed: Convert to int
170
+ "null_percentage": float((col_data.isnull().sum() / len(col_data)) * 100), # Fixed: Convert to float
171
+ "unique_count": int(col_data.nunique()), # Fixed: Convert to int
172
+ "unique_percentage": float((col_data.nunique() / len(col_data)) * 100) # Fixed: Convert to float
173
+ }
174
+
175
+ if col_data.dtype in ['int64', 'float64']:
176
+ analysis.update({
177
+ "mean": float(col_data.mean()) if not pd.isna(col_data.mean()) else None, # Fixed: Handle NaN
178
+ "median": float(col_data.median()) if not pd.isna(col_data.median()) else None,
179
+ "std": float(col_data.std()) if not pd.isna(col_data.std()) else None,
180
+ "min": float(col_data.min()) if not pd.isna(col_data.min()) else None,
181
+ "max": float(col_data.max()) if not pd.isna(col_data.max()) else None,
182
+ "skewness": float(col_data.skew()) if not pd.isna(col_data.skew()) else None,
183
+ "kurtosis": float(col_data.kurtosis()) if not pd.isna(col_data.kurtosis()) else None
184
+ })
185
+ elif col_data.dtype == 'object':
186
+ try:
187
+ top_values = col_data.value_counts().head(5).to_dict()
188
+ analysis.update({
189
+ "top_values": top_values,
190
+ "avg_length": float(col_data.astype(str).str.len().mean()),
191
+ "max_length": int(col_data.astype(str).str.len().max())
192
+ })
193
+ except Exception:
194
+ analysis.update({
195
+ "top_values": {},
196
+ "avg_length": 0,
197
+ "max_length": 0
198
+ })
199
+
200
+ column_analysis[column] = analysis
201
+
202
+ # Use LLM to interpret column analysis
203
+ prompt = f"""
204
+ Analyze these column statistics and identify patterns, anomalies, and insights:
205
+
206
+ {json.dumps(column_analysis, indent=2, default=str)}
207
+
208
+ Focus on:
209
+ 1. Data quality issues
210
+ 2. Distribution patterns
211
+ 3. Potential relationships between columns
212
+ 4. Outliers or anomalies
213
+ 5. Business insights
214
+ """
215
+
216
+ response = self.llm.invoke([HumanMessage(content=prompt)])
217
+ column_analysis["llm_interpretation"] = response.content
218
+
219
+ state["column_analysis"] = column_analysis
220
+ state["current_step"] = "column_analyzer"
221
+
222
+ except Exception as e:
223
+ logger.error(f"Error in column analysis: {str(e)}")
224
+ if "error_messages" not in state:
225
+ state["error_messages"] = []
226
+ if "column_analysis" not in state:
227
+ state["column_analysis"] = {}
228
+ state["error_messages"].append(f"Column analysis error: {str(e)}")
229
+
230
+ return state
231
+
232
+ def _generate_insights(self, state: AnalysisState) -> AnalysisState:
233
+ """Generate insights from the data analysis"""
234
+ logger.info("Generating insights...")
235
+
236
+ try:
237
+ df = state["dataset"]
238
+ dataset_info = state["dataset_info"]
239
+
240
+ # Ensure required keys exist in dataset_info
241
+ if "numeric_columns" not in dataset_info:
242
+ dataset_info["numeric_columns"] = df.select_dtypes(include=[np.number]).columns.tolist()
243
+ if "categorical_columns" not in dataset_info:
244
+ dataset_info["categorical_columns"] = df.select_dtypes(include=['object', 'category']).columns.tolist()
245
+
246
+ # Correlation analysis for numeric columns
247
+ correlations = {}
248
+ numeric_cols = dataset_info.get("numeric_columns", [])
249
+ if len(numeric_cols) > 1:
250
+ corr_matrix = df[numeric_cols].corr()
251
+ high_correlations = []
252
+ for i in range(len(corr_matrix.columns)):
253
+ for j in range(i+1, len(corr_matrix.columns)):
254
+ corr_val = corr_matrix.iloc[i, j]
255
+ if not pd.isna(corr_val) and abs(corr_val) > 0.7: # Fixed: Check for NaN
256
+ high_correlations.append({
257
+ "col1": corr_matrix.columns[i],
258
+ "col2": corr_matrix.columns[j],
259
+ "correlation": float(corr_val) # Fixed: Convert to float
260
+ })
261
+ correlations["high_correlations"] = high_correlations
262
+
263
+ # Use LLM to generate comprehensive insights
264
+ prompt = f"""
265
+ Based on the dataset analysis, generate key insights and findings:
266
+
267
+ Dataset Info: {json.dumps(dataset_info, indent=2, default=str)}
268
+ High Correlations: {json.dumps(correlations, indent=2, default=str)}
269
+
270
+ Generate 5-10 specific, actionable insights that would be valuable for business decision-making.
271
+ Focus on trends, patterns, anomalies, and opportunities.
272
+ """
273
+
274
+ response = self.llm.invoke([HumanMessage(content=prompt)])
275
+ insights = response.content.split('\n')
276
+ insights = [insight.strip() for insight in insights if insight.strip()]
277
+
278
+ state["insights"] = insights
279
+ state["current_step"] = "insight_generator"
280
+
281
+ except Exception as e:
282
+ logger.error(f"Error in insight generation: {str(e)}")
283
+ if "error_messages" not in state:
284
+ state["error_messages"] = []
285
+ if "insights" not in state:
286
+ state["insights"] = []
287
+ state["error_messages"].append(f"Insight generation error: {str(e)}")
288
+
289
+ return state
290
+
291
+ def _plan_visualizations(self, state: AnalysisState) -> AnalysisState:
292
+ """Plan appropriate visualizations based on data characteristics"""
293
+ logger.info("Planning visualizations...")
294
+
295
+ try:
296
+ dataset_info = state["dataset_info"]
297
+ insights = state["insights"]
298
+
299
+ # Ensure required keys exist
300
+ if "numeric_columns" not in dataset_info:
301
+ df = state["dataset"]
302
+ dataset_info["numeric_columns"] = df.select_dtypes(include=[np.number]).columns.tolist()
303
+ dataset_info["categorical_columns"] = df.select_dtypes(include=['object', 'category']).columns.tolist()
304
+
305
+ # Use LLM to plan visualizations
306
+ prompt = f"""
307
+ Plan the most effective visualizations for this dataset:
308
+
309
+ Dataset Info: {json.dumps(dataset_info, indent=2, default=str)}
310
+ Key Insights: {insights}
311
+
312
+ Suggest 5-8 different visualization types with:
313
+ 1. Chart type (histogram, scatter, bar, line, heatmap, etc.)
314
+ 2. Columns to use
315
+ 3. Purpose/insight to communicate
316
+ 4. Title and description
317
+
318
+ Return as a JSON list with this structure:
319
+ [
320
+ {{
321
+ "type": "histogram",
322
+ "columns": ["column_name"],
323
+ "title": "Distribution of...",
324
+ "description": "Shows the...",
325
+ "purpose": "Understand distribution"
326
+ }}
327
+ ]
328
+ """
329
+
330
+ response = self.llm.invoke([HumanMessage(content=prompt)])
331
+ try:
332
+ # Extract JSON from response
333
+ json_start = response.content.find('[')
334
+ json_end = response.content.rfind(']') + 1
335
+ if json_start >= 0 and json_end > json_start:
336
+ viz_plan = json.loads(response.content[json_start:json_end])
337
+ else:
338
+ viz_plan = self._create_default_viz_plan(dataset_info)
339
+ except Exception:
340
+ # Fallback visualization plan
341
+ viz_plan = self._create_default_viz_plan(dataset_info)
342
+
343
+ state["visualizations"] = viz_plan
344
+ state["current_step"] = "visualization_planner"
345
+
346
+ except Exception as e:
347
+ logger.error(f"Error in visualization planning: {str(e)}")
348
+ if "error_messages" not in state:
349
+ state["error_messages"] = []
350
+ if "visualizations" not in state:
351
+ state["visualizations"] = []
352
+ state["error_messages"].append(f"Visualization planning error: {str(e)}")
353
+ # Ensure we have dataset_info for fallback
354
+ if "dataset_info" not in state:
355
+ state["dataset_info"] = {}
356
+ state["visualizations"] = self._create_default_viz_plan(state["dataset_info"])
357
+
358
+ return state
359
+
360
+ def _create_default_viz_plan(self, dataset_info: Dict) -> List[Dict]:
361
+ """Create a default visualization plan"""
362
+ viz_plan = []
363
+
364
+ # Ensure keys exist with defaults
365
+ numeric_columns = dataset_info.get("numeric_columns", [])
366
+ categorical_columns = dataset_info.get("categorical_columns", [])
367
+
368
+ # Distribution plots for numeric columns
369
+ for col in numeric_columns[:3]:
370
+ viz_plan.append({
371
+ "type": "histogram",
372
+ "columns": [col],
373
+ "title": f"Distribution of {col}",
374
+ "description": f"Shows the distribution pattern of {col}",
375
+ "purpose": "Understand data distribution"
376
+ })
377
+
378
+ # Bar plots for categorical columns
379
+ for col in categorical_columns[:2]:
380
+ viz_plan.append({
381
+ "type": "bar",
382
+ "columns": [col],
383
+ "title": f"Frequency of {col}",
384
+ "description": f"Shows the frequency of different {col} values",
385
+ "purpose": "Understand categorical distribution"
386
+ })
387
+
388
+ # Correlation heatmap if multiple numeric columns
389
+ if len(numeric_columns) > 1:
390
+ viz_plan.append({
391
+ "type": "heatmap",
392
+ "columns": numeric_columns,
393
+ "title": "Correlation Matrix",
394
+ "description": "Shows correlations between numeric variables",
395
+ "purpose": "Identify relationships"
396
+ })
397
+
398
+ return viz_plan
399
+
400
+ def _create_charts(self, state: AnalysisState) -> AnalysisState:
401
+ """Create the planned visualizations"""
402
+ logger.info("Creating charts...")
403
+
404
+ try:
405
+ df = state["dataset"]
406
+ viz_plans = state["visualizations"]
407
+
408
+ # Fixed: Use a working matplotlib style
409
+ try:
410
+ plt.style.use('default') # Fixed: Use default instead of seaborn-v0_8
411
+ except:
412
+ pass # If style fails, continue with default
413
+
414
+ for i, viz in enumerate(viz_plans):
415
+ try:
416
+ fig, ax = plt.subplots(figsize=(10, 6))
417
+
418
+ if viz["type"] == "histogram":
419
+ col = viz["columns"][0]
420
+ if col in df.columns and df[col].dtype in ['int64', 'float64']:
421
+ df[col].dropna().hist(bins=30, ax=ax, alpha=0.7) # Fixed: Drop NaN values
422
+ ax.set_title(viz["title"])
423
+ ax.set_xlabel(col)
424
+ ax.set_ylabel('Frequency')
425
+
426
+ elif viz["type"] == "bar":
427
+ col = viz["columns"][0]
428
+ if col in df.columns:
429
+ value_counts = df[col].value_counts().head(10)
430
+ value_counts.plot(kind='bar', ax=ax)
431
+ ax.set_title(viz["title"])
432
+ ax.set_xlabel(col)
433
+ ax.set_ylabel('Count')
434
+ plt.xticks(rotation=45)
435
+
436
+ elif viz["type"] == "heatmap":
437
+ numeric_cols = [col for col in viz["columns"] if col in df.columns and df[col].dtype in ['int64', 'float64']]
438
+ if len(numeric_cols) > 1:
439
+ corr_matrix = df[numeric_cols].corr()
440
+ # Fixed: Use matplotlib imshow instead of seaborn
441
+ im = ax.imshow(corr_matrix, cmap='coolwarm', aspect='auto')
442
+ ax.set_xticks(range(len(corr_matrix.columns)))
443
+ ax.set_yticks(range(len(corr_matrix.columns)))
444
+ ax.set_xticklabels(corr_matrix.columns, rotation=45)
445
+ ax.set_yticklabels(corr_matrix.columns)
446
+ ax.set_title(viz["title"])
447
+ plt.colorbar(im, ax=ax)
448
+
449
+ elif viz["type"] == "scatter":
450
+ if len(viz["columns"]) >= 2:
451
+ col1, col2 = viz["columns"][0], viz["columns"][1]
452
+ if col1 in df.columns and col2 in df.columns:
453
+ clean_data = df[[col1, col2]].dropna() # Fixed: Remove NaN values
454
+ ax.scatter(clean_data[col1], clean_data[col2], alpha=0.6)
455
+ ax.set_xlabel(col1)
456
+ ax.set_ylabel(col2)
457
+ ax.set_title(viz["title"])
458
+
459
+ plt.tight_layout()
460
+ plt.savefig(f'chart_{i+1}_{viz["type"]}.png', dpi=300, bbox_inches='tight')
461
+ plt.close()
462
+
463
+ except Exception as e:
464
+ logger.warning(f"Failed to create {viz['type']} chart: {str(e)}")
465
+ plt.close() # Fixed: Ensure figure is closed even on error
466
+ continue
467
+
468
+ state["current_step"] = "chart_creator"
469
+
470
+ except Exception as e:
471
+ logger.error(f"Error in chart creation: {str(e)}")
472
+ if "error_messages" not in state:
473
+ state["error_messages"] = []
474
+ state["error_messages"].append(f"Chart creation error: {str(e)}")
475
+
476
+ return state
477
+
478
+ def _generate_recommendations(self, state: AnalysisState) -> AnalysisState:
479
+ """Generate actionable recommendations based on analysis"""
480
+ logger.info("Generating recommendations...")
481
+
482
+ try:
483
+ insights = state["insights"]
484
+ dataset_info = state["dataset_info"]
485
+
486
+ # Use LLM to generate recommendations
487
+ prompt = f"""
488
+ Based on the complete data analysis, generate specific, actionable recommendations:
489
+
490
+ Dataset Info: {json.dumps(dataset_info, indent=2, default=str)}
491
+ Key Insights: {insights}
492
+
493
+ Generate 5-10 specific recommendations that include:
494
+ 1. Data quality improvements
495
+ 2. Business opportunities
496
+ 3. Further analysis suggestions
497
+ 4. Action items for stakeholders
498
+
499
+ Make recommendations specific, measurable, and actionable.
500
+ """
501
+
502
+ response = self.llm.invoke([HumanMessage(content=prompt)])
503
+ recommendations = response.content.split('\n')
504
+ recommendations = [rec.strip() for rec in recommendations if rec.strip()]
505
+
506
+ state["recommendations"] = recommendations
507
+ state["current_step"] = "recommendation_engine"
508
+
509
+ except Exception as e:
510
+ logger.error(f"Error in recommendation generation: {str(e)}")
511
+ if "error_messages" not in state:
512
+ state["error_messages"] = []
513
+ if "recommendations" not in state:
514
+ state["recommendations"] = []
515
+ state["error_messages"].append(f"Recommendation generation error: {str(e)}")
516
+
517
+ return state
518
+
519
+ def analyze_dataset(self, dataset_path: str) -> Dict[str, Any]:
520
+ """Main method to analyze a dataset"""
521
+ logger.info(f"Starting analysis of dataset: {dataset_path}")
522
+
523
+ try:
524
+ # Load dataset
525
+ if dataset_path.endswith('.csv'):
526
+ df = pd.read_csv(dataset_path)
527
+ elif dataset_path.endswith(('.xlsx', '.xls')):
528
+ df = pd.read_excel(dataset_path)
529
+ elif dataset_path.endswith('.json'):
530
+ df = pd.read_json(dataset_path)
531
+ else:
532
+ raise ValueError("Unsupported file format. Use CSV, Excel, or JSON.")
533
+
534
+ # Initialize state with all required fields
535
+ initial_state = AnalysisState(
536
+ dataset=df,
537
+ dataset_info={},
538
+ column_analysis={},
539
+ insights=[],
540
+ visualizations=[],
541
+ recommendations=[],
542
+ current_step="",
543
+ error_messages=[]
544
+ )
545
+
546
+ # Run the workflow
547
+ final_state = self.workflow.invoke(initial_state)
548
+
549
+ # Prepare results
550
+ results = {
551
+ "dataset_info": final_state.get("dataset_info", {}),
552
+ "column_analysis": final_state.get("column_analysis", {}),
553
+ "insights": final_state.get("insights", []),
554
+ "visualizations": final_state.get("visualizations", []),
555
+ "recommendations": final_state.get("recommendations", []),
556
+ "analysis_timestamp": datetime.now().isoformat(),
557
+ "errors": final_state.get("error_messages", [])
558
+ }
559
+
560
+ # Generate summary report
561
+ self._generate_report(results, dataset_path)
562
+
563
+ logger.info("Analysis completed successfully!")
564
+ return results
565
+
566
+ except Exception as e:
567
+ logger.error(f"Error in dataset analysis: {str(e)}")
568
+ return {"error": str(e)}
569
+
570
+ def _generate_report(self, results: Dict[str, Any], dataset_path: str):
571
+ """Generate a comprehensive analysis report"""
572
+ try:
573
+ report_content = f"""
574
+ # Data Analysis Report
575
+ ## Dataset: {dataset_path}
576
+ ## Analysis Date: {results['analysis_timestamp']}
577
+
578
+ ### Dataset Overview
579
+ - Shape: {results['dataset_info'].get('shape', 'N/A')}
580
+ - Columns: {len(results['dataset_info'].get('columns', []))}
581
+ - Missing Values: {sum(results['dataset_info'].get('null_counts', {}).values())}
582
+ - Duplicate Rows: {results['dataset_info'].get('duplicate_rows', 'N/A')}
583
+
584
+ ### Key Insights
585
+ """
586
+
587
+ for i, insight in enumerate(results.get('insights', []), 1):
588
+ report_content += f"{i}. {insight}\n"
589
+
590
+ report_content += "\n### Recommendations\n"
591
+ for i, rec in enumerate(results.get('recommendations', []), 1):
592
+ report_content += f"{i}. {rec}\n"
593
+
594
+ # Save report
595
+ with open('analysis_report.md', 'w') as f:
596
+ f.write(report_content)
597
+
598
+ print("Analysis report saved as 'analysis_report.md'")
599
+ except Exception as e:
600
+ logger.error(f"Error generating report: {str(e)}")
601
+
602
+ # Usage example and configuration
603
+ class DataAnalysisConfig:
604
+ """Configuration class for easy customization"""
605
+
606
+ def __init__(self):
607
+ self.groq_api_key = os.environ.get('GROQ_API_KEY')
608
+ self.model_name = "llama3-70b-8192" # Fixed: Use correct model name
609
+ self.output_directory = "analysis_output"
610
+ self.chart_style = "default" # Fixed: Use default style
611
+
612
+ def validate(self):
613
+ """Validate configuration"""
614
+ if not self.groq_api_key:
615
+ raise ValueError("GROQ_API_KEY environment variable is required")
616
+
617
+ if not os.path.exists(self.output_directory):
618
+ os.makedirs(self.output_directory)
619
+
620
+ def main():
621
+ """Main function to run the data analysis system"""
622
+
623
+ # Example usage
624
+ config = DataAnalysisConfig()
625
+
626
+ try:
627
+ config.validate()
628
+ except ValueError as e:
629
+ print(f"Configuration error: {e}")
630
+ print("Please set the GROQ_API_KEY environment variable")
631
+ return
632
+
633
+ # Initialize the agent
634
+ agent = DataAnalysisAgent(
635
+ groq_api_key=config.groq_api_key,
636
+ model_name=config.model_name
637
+ )
638
+
639
+ # Example: Analyze a dataset
640
+ dataset_path = "your_dataset.csv" # Replace with your dataset path
641
+
642
+ if os.path.exists(dataset_path):
643
+ results = agent.analyze_dataset(dataset_path)
644
+
645
+ if "error" not in results:
646
+ print("Analysis completed successfully!")
647
+ print(f"Generated {len(results['insights'])} insights")
648
+ print(f"Created {len(results['visualizations'])} visualizations")
649
+ print(f"Provided {len(results['recommendations'])} recommendations")
650
+ else:
651
+ print(f"Analysis failed: {results['error']}")
652
+ else:
653
+ print(f"Dataset file not found: {dataset_path}")
654
+ print("Please provide a valid dataset path")
655
+
656
+ if __name__ == "__main__":
657
+ main()
requirements.txt ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies
2
+ pandas>=2.0.0
3
+ numpy>=1.24.0
4
+ matplotlib>=3.7.0
5
+ plotly>=5.15.0
6
+
7
+ # AI/ML dependencies
8
+ langchain>=0.1.0
9
+ langchain-groq>=0.1.0
10
+ langgraph>=0.0.40
11
+
12
+ # File handling
13
+ openpyxl>=3.1.0
14
+ python-dotenv>=1.0.0
15
+
16
+ # Web interface
17
+ streamlit>=1.28.0
18
+
19
+ # Data processing utilities
20
+ scipy>=1.11.0
21
+
22
+ # Additional dependencies for stability
23
+ requests>=2.28.0
24
+ typing-extensions>=4.0.0
web_app.py ADDED
@@ -0,0 +1,1551 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # web_app.py
2
+ # Beautiful Web interface for the AI Data Analysis Agent System
3
+
4
+ import streamlit as st
5
+ import pandas as pd
6
+ import plotly.express as px
7
+ import plotly.graph_objects as go
8
+ from plotly.subplots import make_subplots
9
+ import io
10
+ import base64
11
+ from datetime import datetime
12
+ import json
13
+ import os
14
+ import sys
15
+ from pathlib import Path
16
+ import time
17
+
18
+
19
+ # Add the current directory to path to import our agent
20
+ sys.path.append(str(Path(__file__).parent))
21
+
22
+ try:
23
+ from data_analysis_agent import DataAnalysisAgent, DataAnalysisConfig
24
+ except ImportError:
25
+ st.error("❌ Please ensure data_analysis_agent.py is in the same directory")
26
+ st.info("Download both files and place them in the same folder")
27
+ st.stop()
28
+
29
+ # Page configuration
30
+ st.set_page_config(
31
+ page_title="AI Data Analysis Agent",
32
+ page_icon="πŸ€–",
33
+ layout="wide",
34
+ initial_sidebar_state="expanded",
35
+ menu_items={
36
+ 'Get Help': 'https://github.com/yourusername/ai-data-analysis-agent',
37
+ 'Report a bug': "https://github.com/yourusername/ai-data-analysis-agent/issues",
38
+ 'About': "# AI Data Analysis Agent\nPowered by Llama 3 & LangGraph"
39
+ }
40
+ )
41
+
42
+ # Custom CSS for beautiful styling
43
+ st.markdown("""
44
+ <style>
45
+ /* Import Google Fonts */
46
+ @import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap');
47
+
48
+ /* Global Styles */
49
+ .main .block-container {
50
+ padding-top: 2rem;
51
+ max-width: 1200px;
52
+ }
53
+
54
+ /* Main Header */
55
+ .main-header {
56
+ font-family: 'Inter', sans-serif;
57
+ font-size: 3.5rem;
58
+ font-weight: 700;
59
+ text-align: center;
60
+ margin: 2rem 0;
61
+ background: linear-gradient(135deg, #1e40af 0%, #3b82f6 50%, #06b6d4 100%);
62
+ -webkit-background-clip: text;
63
+ -webkit-text-fill-color: transparent;
64
+ background-clip: text;
65
+ text-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
66
+ }
67
+
68
+ /* Subtitle */
69
+ .subtitle {
70
+ font-family: 'Inter', sans-serif;
71
+ font-size: 1.2rem;
72
+ text-align: center;
73
+ color: #64748b;
74
+ margin-bottom: 3rem;
75
+ font-weight: 400;
76
+ }
77
+
78
+ /* Feature Cards */
79
+ .feature-card {
80
+ background: linear-gradient(145deg, #ffffff 0%, #f8fafc 100%);
81
+ border: 1px solid #e2e8f0;
82
+ border-radius: 16px;
83
+ padding: 2rem;
84
+ margin: 1rem 0;
85
+ box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06);
86
+ transition: all 0.3s ease;
87
+ height: 100%;
88
+ }
89
+
90
+ .feature-card:hover {
91
+ transform: translateY(-4px);
92
+ box-shadow: 0 20px 25px -5px rgba(0, 0, 0, 0.1), 0 10px 10px -5px rgba(0, 0, 0, 0.04);
93
+ }
94
+
95
+ .feature-icon {
96
+ font-size: 3rem;
97
+ margin-bottom: 1rem;
98
+ display: block;
99
+ }
100
+
101
+ .feature-title {
102
+ font-family: 'Inter', sans-serif;
103
+ font-size: 1.5rem;
104
+ font-weight: 600;
105
+ color: #1e293b;
106
+ margin-bottom: 0.5rem;
107
+ }
108
+
109
+ .feature-description {
110
+ color: #64748b;
111
+ font-size: 1rem;
112
+ line-height: 1.6;
113
+ }
114
+
115
+ /* Metric Cards */
116
+ .metric-container {
117
+ display: flex;
118
+ gap: 1rem;
119
+ margin: 2rem 0;
120
+ }
121
+
122
+ .metric-card {
123
+ background: linear-gradient(135deg, #4f46e5 0%, #7c3aed 100%);
124
+ color: white;
125
+ padding: 1.5 rem;
126
+ border-radius: 12px;
127
+ text-align: center;
128
+ box-shadow: 0 10px 15px -3px rgba(0, 0, 0, 0.1);
129
+ flex: 1;
130
+ transition: transform 0.2s ease;
131
+ }
132
+
133
+ .metric-card:hover {
134
+ transform: scale(1.05);
135
+ }
136
+
137
+ .metric-value {
138
+ font-size: 2rem;
139
+ font-weight: 700;
140
+ margin-bottom: 0.5rem;
141
+ }
142
+
143
+ .metric-label {
144
+ font-size: 0.9rem;
145
+ opacity: 0.9;
146
+ font-weight: 500;
147
+ }
148
+
149
+ /* Insight and Recommendation Boxes */
150
+ .insight-box {
151
+ background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%);
152
+ border-left: 5px solid #3b82f6;
153
+ padding: 1.5rem;
154
+ margin: 1rem 0;
155
+ border-radius: 0 12px 12px 0;
156
+ box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05);
157
+ transition: all 0.3s ease;
158
+ }
159
+
160
+ .insight-box:hover {
161
+ transform: translateX(4px);
162
+ box-shadow: 0 8px 25px rgba(0, 0, 0, 0.1);
163
+ }
164
+
165
+ .recommendation-box {
166
+ background: linear-gradient(135deg, #f0fdf4 0%, #dcfce7 100%);
167
+ border-left: 5px solid #22c55e;
168
+ padding: 1.5rem;
169
+ margin: 1rem 0;
170
+ border-radius: 0 12px 12px 0;
171
+ box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05);
172
+ transition: all 0.3s ease;
173
+ }
174
+
175
+ .recommendation-box:hover {
176
+ transform: translateX(4px);
177
+ box-shadow: 0 8px 25px rgba(0, 0, 0, 0.1);
178
+ }
179
+
180
+ /* Upload Area */
181
+ .upload-area {
182
+ border: 2px dashed #cbd5e1;
183
+ border-radius: 12px;
184
+ padding: 3rem 2rem;
185
+ text-align: center;
186
+ background: linear-gradient(135deg, #f8fafc 0%, #f1f5f9 100%);
187
+ margin: 2rem 0;
188
+ transition: all 0.3s ease;
189
+ }
190
+
191
+ .upload-area:hover {
192
+ border-color: #3b82f6;
193
+ background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%);
194
+ }
195
+
196
+ /* Progress Bar */
197
+ .stProgress > div > div > div > div {
198
+ background: linear-gradient(135deg, #3b82f6 0%, #8b5cf6 100%);
199
+ border-radius: 10px;
200
+ }
201
+
202
+ /* Buttons */
203
+ .stButton > button {
204
+ background: linear-gradient(135deg, #3b82f6 0%, #8b5cf6 100%);
205
+ color: white;
206
+ border: none;
207
+ border-radius: 12px;
208
+ padding: 0.75rem 2rem;
209
+ font-weight: 600;
210
+ font-size: 1rem;
211
+ transition: all 0.3s ease;
212
+ box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
213
+ }
214
+
215
+ .stButton > button:hover {
216
+ transform: translateY(-2px);
217
+ box-shadow: 0 8px 15px rgba(0, 0, 0, 0.2);
218
+ }
219
+
220
+ /* Sidebar Styling */
221
+ .css-1d391kg {
222
+ background: linear-gradient(180deg, #1e293b 0%, #334155 100%);
223
+ }
224
+
225
+ .css-1d391kg .sidebar-content {
226
+ color: white;
227
+ }
228
+
229
+ /* Tab Styling */
230
+ .stTabs [data-baseweb="tab-list"] {
231
+ gap: 8px;
232
+ }
233
+
234
+ .stTabs [data-baseweb="tab"] {
235
+ height: 50px;
236
+ background: linear-gradient(135deg, #f1f5f9 0%, #e2e8f0 100%);
237
+ border-radius: 12px;
238
+ border: 1px solid #cbd5e1;
239
+ color: #475569;
240
+ font-weight: 500;
241
+ transition: all 0.3s ease;
242
+ }
243
+
244
+ .stTabs [aria-selected="true"] {
245
+ background: linear-gradient(135deg, #3b82f6 0%, #8b5cf6 100%);
246
+ color: white;
247
+ border: 1px solid #3b82f6;
248
+ }
249
+
250
+ /* Success/Warning/Error Messages */
251
+ .stSuccess {
252
+ background: linear-gradient(135deg, #dcfce7 0%, #bbf7d0 100%);
253
+ border: 1px solid #22c55e;
254
+ border-radius: 12px;
255
+ }
256
+
257
+ .stWarning {
258
+ background: linear-gradient(135deg, #fef3c7 0%, #fde68a 100%);
259
+ border: 1px solid #f59e0b;
260
+ border-radius: 12px;
261
+ }
262
+
263
+ .stError {
264
+ background: linear-gradient(135deg, #fee2e2 0%, #fecaca 100%);
265
+ border: 1px solid #ef4444;
266
+ border-radius: 12px;
267
+ }
268
+
269
+ /* Animation */
270
+ @keyframes fadeInUp {
271
+ from {
272
+ opacity: 0;
273
+ transform: translateY(30px);
274
+ }
275
+ to {
276
+ opacity: 1;
277
+ transform: translateY(0);
278
+ }
279
+ }
280
+
281
+ .animate-fade-in {
282
+ animation: fadeInUp 0.6s ease-out;
283
+ }
284
+
285
+ /* Data Table Styling */
286
+ .stDataFrame {
287
+ border-radius: 12px;
288
+ overflow: hidden;
289
+ box-shadow: 0 4px 6px rgba(0, 0, 0, 0.05);
290
+ }
291
+
292
+ /* Expander Styling */
293
+ .streamlit-expanderHeader {
294
+ background: linear-gradient(135deg, #f8fafc 0%, #f1f5f9 100%);
295
+ border-radius: 12px;
296
+ border: 1px solid #e2e8f0;
297
+ }
298
+
299
+ /* Footer */
300
+ .footer {
301
+ text-align: center;
302
+ padding: 3rem 0;
303
+ color: #64748b;
304
+ font-size: 0.9rem;
305
+ border-top: 1px solid #e2e8f0;
306
+ margin-top: 4rem;
307
+ }
308
+
309
+ .footer a {
310
+ color: #3b82f6;
311
+ text-decoration: none;
312
+ font-weight: 500;
313
+ }
314
+
315
+ .footer a:hover {
316
+ text-decoration: underline;
317
+ }
318
+
319
+ /* Loading Animation */
320
+ .loading-container {
321
+ display: flex;
322
+ justify-content: center;
323
+ align-items: center;
324
+ padding: 2rem;
325
+ }
326
+
327
+ .loading-spinner {
328
+ border: 4px solid #f3f4f6;
329
+ border-top: 4px solid #3b82f6;
330
+ border-radius: 50%;
331
+ width: 40px;
332
+ height: 40px;
333
+ animation: spin 1s linear infinite;
334
+ }
335
+
336
+ @keyframes spin {
337
+ 0% { transform: rotate(0deg); }
338
+ 100% { transform: rotate(360deg); }
339
+ }
340
+ </style>
341
+ """, unsafe_allow_html=True)
342
+
343
+ def initialize_session_state():
344
+ """Initialize session state variables"""
345
+ if 'analysis_results' not in st.session_state:
346
+ st.session_state.analysis_results = None
347
+ if 'dataset' not in st.session_state:
348
+ st.session_state.dataset = None
349
+ if 'agent' not in st.session_state:
350
+ st.session_state.agent = None
351
+ if 'groq_api_key' not in st.session_state:
352
+ st.session_state.groq_api_key = ""
353
+ if 'model_name' not in st.session_state:
354
+ st.session_state.model_name = "llama3-70b-8192"
355
+ if 'analysis_complete' not in st.session_state:
356
+ st.session_state.analysis_complete = False
357
+
358
+ def create_agent():
359
+ """Create and configure the data analysis agent"""
360
+ try:
361
+ # Check environment variable first, then session state
362
+ groq_api_key = os.environ.get('GROQ_API_KEY') or st.session_state.get('groq_api_key', '')
363
+ if not groq_api_key:
364
+ return None
365
+
366
+ agent = DataAnalysisAgent(
367
+ groq_api_key=groq_api_key,
368
+ model_name=st.session_state.get('model_name', 'llama3-70b-8192')
369
+ )
370
+ return agent
371
+ except Exception as e:
372
+ st.error(f"Failed to create agent: {str(e)}")
373
+ return None
374
+
375
+ def sidebar_config():
376
+ """Configure the beautiful sidebar"""
377
+ with st.sidebar:
378
+ st.markdown("""
379
+ <div style='text-align: center; padding: 1rem 0;'>
380
+ <div style='font-size: 4.5rem; margin-bottom: 0 rem;'>πŸ€–</div>
381
+ <h1 style='
382
+ background: linear-gradient(135deg, #1e40af 0%, #3b82f6 50%, #06b6d4 100%);
383
+ -webkit-background-clip: text;
384
+ -webkit-text-fill-color: transparent;
385
+ background-clip: text;
386
+ margin: 0;
387
+ font-size: 1.6rem;
388
+ font-weight: 700;
389
+ '>AI Agents on action</h1>
390
+ <p style='color: #94a3b8; margin: 0.5rem 0 0 0; font-size: 0.9rem;'>Powered by Llama 3</p>
391
+ </div>
392
+ """, unsafe_allow_html=True)
393
+
394
+ st.markdown("---")
395
+
396
+ # Check for environment variable first
397
+ env_api_key = os.environ.get('GROQ_API_KEY')
398
+
399
+ if env_api_key:
400
+ st.success("βœ… API Key Configured")
401
+ st.session_state.groq_api_key = env_api_key
402
+ api_key_configured = True
403
+ else:
404
+ st.subheader("πŸ”‘ API Setup")
405
+ st.info("πŸ’‘ Set GROQ_API_KEY environment variable")
406
+
407
+ groq_api_key = st.text_input(
408
+ "Groq API Key",
409
+ type="password",
410
+ value=st.session_state.groq_api_key,
411
+ help="Get your API key from console.groq.com"
412
+ )
413
+
414
+ if groq_api_key:
415
+ st.session_state.groq_api_key = groq_api_key
416
+ api_key_configured = True
417
+ else:
418
+ api_key_configured = False
419
+
420
+ st.markdown("---")
421
+
422
+ # Model Selection
423
+ st.subheader("🧠 AI Model")
424
+ model_options = {
425
+ "llama3-70b-8192": "Llama 3 70B (Recommended)",
426
+ "llama3-8b-8192": "Llama 3 8B (Faster)",
427
+ "mixtral-8x7b-32768": "Mixtral 8x7B"
428
+ }
429
+
430
+ selected_model = st.selectbox(
431
+ "Choose Model",
432
+ options=list(model_options.keys()),
433
+ format_func=lambda x: model_options[x],
434
+ index=0
435
+ )
436
+ st.session_state.model_name = selected_model
437
+
438
+ st.markdown("---")
439
+
440
+ # Analysis Options
441
+ st.subheader("βš™οΈ Analysis Settings")
442
+
443
+ industry_type = st.selectbox(
444
+ "Industry Focus",
445
+ ["General", "Retail", "Healthcare", "Finance", "Manufacturing", "Technology"],
446
+ help="Customize insights for your industry"
447
+ )
448
+ st.session_state.industry_type = industry_type
449
+
450
+ enable_advanced = st.toggle(
451
+ "Advanced Analysis",
452
+ value=True,
453
+ help="Include correlation analysis and advanced insights"
454
+ )
455
+ st.session_state.enable_advanced = enable_advanced
456
+
457
+ auto_insights = st.toggle(
458
+ "Auto-Generate Insights",
459
+ value=True,
460
+ help="Automatically generate business insights"
461
+ )
462
+ st.session_state.auto_insights = auto_insights
463
+
464
+ st.markdown("---")
465
+
466
+ # Quick Stats with dynamic insights count
467
+ if st.session_state.dataset is not None:
468
+ st.subheader("πŸ“Š Dataset Info")
469
+ df = st.session_state.dataset
470
+
471
+ col1, col2 = st.columns(2)
472
+ with col1:
473
+ st.metric("Rows", f"{df.shape[0]:,}")
474
+ st.metric("Columns", df.shape[1])
475
+ with col2:
476
+ st.metric("Missing", f"{df.isnull().sum().sum():,}")
477
+ st.metric("Size", f"{df.memory_usage(deep=True).sum() / 1024**2:.1f} MB")
478
+
479
+ # Show insights count if analysis is complete (now shows top 5)
480
+ if st.session_state.analysis_results:
481
+ insights = st.session_state.analysis_results.get('insights', [])
482
+ recommendations = st.session_state.analysis_results.get('recommendations', [])
483
+
484
+ # Process to get clean counts (max 5 each)
485
+ processed_insights_count = min(len([i for i in insights if isinstance(i, str) and len(i.strip()) > 20]), 5)
486
+ processed_recommendations_count = min(len([r for r in recommendations if isinstance(r, str) and len(r.strip()) > 20]), 5)
487
+
488
+ st.markdown("---")
489
+ st.subheader("🧠 Analysis Results")
490
+
491
+ col1, col2 = st.columns(2)
492
+ with col1:
493
+ st.metric("πŸ’‘ Top Insights", processed_insights_count)
494
+ with col2:
495
+ st.metric("🎯 Top Recommendations", processed_recommendations_count)
496
+
497
+ st.markdown("---")
498
+
499
+ # Help Section
500
+ with st.expander("πŸ’‘ Quick Help"):
501
+ st.markdown("""
502
+ **Supported Formats:**
503
+ - CSV files (.csv)
504
+ - Excel files (.xlsx, .xls)
505
+ - JSON files (.json)
506
+
507
+ **Best Practices:**
508
+ - Clean column names
509
+ - Handle missing values
510
+ - Include date columns
511
+ - Mix numeric & categorical data
512
+
513
+ **Need Help?**
514
+ - [Documentation](https://github.com/yourusername/ai-data-analysis-agent)
515
+ - [Examples](https://github.com/yourusername/ai-data-analysis-agent/examples)
516
+ """)
517
+
518
+ return api_key_configured
519
+
520
+ def display_hero_section():
521
+ """Display the beautiful hero section"""
522
+ st.markdown('<div class="main-header animate-fade-in">AIDA-AI Data Analyzer </div>', unsafe_allow_html=True)
523
+
524
+ st.markdown("""
525
+ <div class="subtitle animate-fade-in">
526
+ Transform your raw data into actionable business insights with the power of AI.<br>
527
+ Upload, analyze, and discover patterns automatically using intelligent agents.
528
+ </div>
529
+ """, unsafe_allow_html=True)
530
+
531
+ def display_features():
532
+ """Display feature cards"""
533
+ st.markdown("### ✨ What This AI Agent Can Do")
534
+
535
+ col1, col2, col3 = st.columns(3)
536
+
537
+ with col1:
538
+ st.markdown("""
539
+ <div class="feature-card">
540
+ <div class="feature-icon">🧠</div>
541
+ <div class="feature-title">Intelligent Analysis</div>
542
+ <div class="feature-description">
543
+ Our AI automatically understands your data structure, identifies patterns,
544
+ and generates meaningful insights without any manual configuration.
545
+ </div>
546
+ </div>
547
+ """, unsafe_allow_html=True)
548
+
549
+ with col2:
550
+ st.markdown("""
551
+ <div class="feature-card">
552
+ <div class="feature-icon">πŸ“Š</div>
553
+ <div class="feature-title">Smart Visualizations</div>
554
+ <div class="feature-description">
555
+ Automatically creates the most appropriate charts and graphs for your data,
556
+ with interactive visualizations.
557
+ </div>
558
+ </div>
559
+ """, unsafe_allow_html=True)
560
+
561
+ with col3:
562
+ st.markdown("""
563
+ <div class="feature-card">
564
+ <div class="feature-icon">🎯</div>
565
+ <div class="feature-title">Actionable Recommendations</div>
566
+ <div class="feature-description">
567
+ Get specific, measurable recommendations for improving your business
568
+ based on data-driven insights.
569
+ </div>
570
+ </div>
571
+ """, unsafe_allow_html=True)
572
+
573
+ def upload_dataset():
574
+ """Beautiful dataset upload section"""
575
+ st.markdown("### πŸ“Š Upload Your Dataset")
576
+
577
+ uploaded_file = st.file_uploader(
578
+ "",
579
+ type=['csv', 'xlsx', 'xls', 'json'],
580
+ help="Drag and drop your file here or click to browse",
581
+ label_visibility="collapsed"
582
+ )
583
+
584
+ if uploaded_file is not None:
585
+ try:
586
+ # Show loading spinner
587
+ with st.spinner("πŸ” Processing your dataset..."):
588
+ time.sleep(1) # Small delay for UX
589
+
590
+ # Read the file based on extension
591
+ if uploaded_file.name.endswith('.csv'):
592
+ df = pd.read_csv(uploaded_file)
593
+ elif uploaded_file.name.endswith(('.xlsx', '.xls')):
594
+ df = pd.read_excel(uploaded_file)
595
+ elif uploaded_file.name.endswith('.json'):
596
+ df = pd.read_json(uploaded_file)
597
+ else:
598
+ st.error("Unsupported file format")
599
+ return False
600
+
601
+ st.session_state.dataset = df
602
+ st.session_state.uploaded_filename = uploaded_file.name
603
+
604
+ # Success message
605
+ st.success(f"βœ… Successfully loaded **{uploaded_file.name}**")
606
+
607
+ # Beautiful metrics display
608
+ col1, col2, col3, col4 = st.columns(4)
609
+
610
+ with col1:
611
+ st.markdown(f"""
612
+ <div class="metric-card">
613
+ <div class="metric-value">{df.shape[0]:,}</div>
614
+ <div class="metric-label">Rows</div>
615
+ </div>
616
+ """, unsafe_allow_html=True)
617
+
618
+ with col2:
619
+ st.markdown(f"""
620
+ <div class="metric-card">
621
+ <div class="metric-value">{df.shape[1]}</div>
622
+ <div class="metric-label">Columns</div>
623
+ </div>
624
+ """, unsafe_allow_html=True)
625
+
626
+ with col3:
627
+ missing = df.isnull().sum().sum()
628
+ st.markdown(f"""
629
+ <div class="metric-card">
630
+ <div class="metric-value">{missing:,}</div>
631
+ <div class="metric-label">Missing Values</div>
632
+ </div>
633
+ """, unsafe_allow_html=True)
634
+
635
+ with col4:
636
+ size_mb = df.memory_usage(deep=True).sum() / 1024**2
637
+ st.markdown(f"""
638
+ <div class="metric-card">
639
+ <div class="metric-value">{size_mb:.1f} MB</div>
640
+ <div class="metric-label">File Size</div>
641
+ </div>
642
+ """, unsafe_allow_html=True)
643
+
644
+ st.markdown("<br>", unsafe_allow_html=True)
645
+
646
+ # Data preview with beautiful styling
647
+ st.markdown("#### πŸ“‹ Data Preview")
648
+ st.dataframe(
649
+ df.head(10),
650
+ use_container_width=True,
651
+ height=300
652
+ )
653
+
654
+ # Column information in expandable section
655
+ with st.expander("πŸ“Š Detailed Column Information", expanded=False):
656
+ col_info = pd.DataFrame({
657
+ 'Column': df.columns,
658
+ 'Type': df.dtypes.astype(str),
659
+ 'Non-Null': df.count(),
660
+ 'Null Count': df.isnull().sum(),
661
+ 'Unique Values': df.nunique(),
662
+ 'Sample Data': [str(df[col].iloc[0]) if len(df) > 0 else '' for col in df.columns]
663
+ })
664
+ st.dataframe(col_info, use_container_width=True)
665
+
666
+ return True
667
+
668
+ except Exception as e:
669
+ st.error(f"❌ Error reading file: {str(e)}")
670
+ return False
671
+ else:
672
+ # Show upload placeholder
673
+ st.markdown("""
674
+ <div class="upload-area">
675
+ <div style="font-size: 3rem; margin-bottom: 1rem;">πŸ“</div>
676
+ <div style="font-size: 1.2rem; font-weight: 600; margin-bottom: 0.5rem;">
677
+ Drop your dataset here
678
+ </div>
679
+ <div style="color: #64748b;">
680
+ Supports CSV, Excel, and JSON files β€’ Max 200MB
681
+ </div>
682
+ </div>
683
+ """, unsafe_allow_html=True)
684
+
685
+ return False
686
+
687
+ def run_analysis():
688
+ """Run the AI analysis with beautiful progress indicators"""
689
+ if st.session_state.dataset is None:
690
+ st.warning("Please upload a dataset first.")
691
+ return
692
+
693
+ # Check for API key from environment or session state
694
+ api_key = os.environ.get('GROQ_API_KEY') or st.session_state.get('groq_api_key')
695
+ if not api_key:
696
+ st.warning("Please set GROQ_API_KEY environment variable or enter it in the sidebar.")
697
+ return
698
+
699
+ # Create agent
700
+ with st.spinner("πŸ€– Initializing AI agent..."):
701
+ agent = create_agent()
702
+ if agent is None:
703
+ st.error("Failed to initialize AI agent. Check your API key.")
704
+ return
705
+
706
+ st.session_state.agent = agent
707
+
708
+ # Save dataset temporarily
709
+ temp_file = "temp_dataset.csv"
710
+ st.session_state.dataset.to_csv(temp_file, index=False)
711
+
712
+ # Beautiful progress tracking
713
+ progress_container = st.container()
714
+
715
+ with progress_container:
716
+ st.markdown("### πŸš€ AI Analysis in Progress")
717
+
718
+ # Progress bar
719
+ progress_bar = st.progress(0)
720
+ status_text = st.empty()
721
+
722
+ # Step indicators
723
+ steps = [
724
+ ("πŸ”", "Analyzing dataset structure"),
725
+ ("πŸ“Š", "Examining columns and data quality"),
726
+ ("🧠", "Generating AI insights"),
727
+ ("πŸ“ˆ", "Planning visualizations"),
728
+ ("🎨", "Creating charts"),
729
+ ("🎯", "Formulating recommendations")
730
+ ]
731
+
732
+ step_cols = st.columns(len(steps))
733
+ step_indicators = []
734
+
735
+ for i, (icon, desc) in enumerate(steps):
736
+ with step_cols[i]:
737
+ step_indicators.append(st.empty())
738
+ step_indicators[i].markdown(f"""
739
+ <div style="text-align: center; padding: 1rem; opacity: 0.3;">
740
+ <div style="font-size: 2rem;">{icon}</div>
741
+ <div style="font-size: 0.8rem; margin-top: 0.5rem;">{desc}</div>
742
+ </div>
743
+ """, unsafe_allow_html=True)
744
+
745
+ try:
746
+ # Step 1
747
+ step_indicators[0].markdown(f"""
748
+ <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
749
+ <div style="font-size: 2rem;">πŸ”</div>
750
+ <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Analyzing Structure</div>
751
+ </div>
752
+ """, unsafe_allow_html=True)
753
+ status_text.markdown("**πŸ” AI agent analyzing dataset structure...**")
754
+ progress_bar.progress(15)
755
+ time.sleep(1)
756
+
757
+ # Step 2
758
+ step_indicators[1].markdown(f"""
759
+ <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
760
+ <div style="font-size: 2rem;">πŸ“Š</div>
761
+ <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Examining Data</div>
762
+ </div>
763
+ """, unsafe_allow_html=True)
764
+ status_text.markdown("**πŸ“Š Analyzing columns and data quality...**")
765
+ progress_bar.progress(30)
766
+ time.sleep(1)
767
+
768
+ # Step 3
769
+ step_indicators[2].markdown(f"""
770
+ <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
771
+ <div style="font-size: 2rem;">🧠</div>
772
+ <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">AI Thinking</div>
773
+ </div>
774
+ """, unsafe_allow_html=True)
775
+ status_text.markdown("**🧠 Generating insights with AI...**")
776
+ progress_bar.progress(50)
777
+ time.sleep(1)
778
+
779
+ # Step 4
780
+ step_indicators[3].markdown(f"""
781
+ <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
782
+ <div style="font-size: 2rem;">πŸ“ˆ</div>
783
+ <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Planning Charts</div>
784
+ </div>
785
+ """, unsafe_allow_html=True)
786
+ status_text.markdown("**πŸ“ˆ Planning optimal visualizations...**")
787
+ progress_bar.progress(70)
788
+ time.sleep(1)
789
+
790
+ # Step 5
791
+ step_indicators[4].markdown(f"""
792
+ <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
793
+ <div style="font-size: 2rem;">🎨</div>
794
+ <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Creating Charts</div>
795
+ </div>
796
+ """, unsafe_allow_html=True)
797
+ status_text.markdown("**🎨 Creating beautiful visualizations...**")
798
+ progress_bar.progress(85)
799
+
800
+ # Run the actual analysis
801
+ results = agent.analyze_dataset(temp_file)
802
+
803
+ # Step 6
804
+ step_indicators[5].markdown(f"""
805
+ <div style="text-align: center; padding: 1rem; opacity: 1; background: linear-gradient(135deg, #eff6ff 0%, #dbeafe 100%); border-radius: 12px;">
806
+ <div style="font-size: 2rem;">🎯</div>
807
+ <div style="font-size: 0.8rem; margin-top: 0.5rem; font-weight: 600;">Final Recommendations</div>
808
+ </div>
809
+ """, unsafe_allow_html=True)
810
+ status_text.markdown("**🎯 Formulating actionable recommendations...**")
811
+ progress_bar.progress(100)
812
+
813
+ # Clean up temp file
814
+ if os.path.exists(temp_file):
815
+ os.remove(temp_file)
816
+
817
+ if "error" in results:
818
+ st.error(f"❌ Analysis failed: {results['error']}")
819
+ return
820
+
821
+ st.session_state.analysis_results = results
822
+ st.session_state.analysis_complete = True
823
+
824
+ # Success animation
825
+ status_text.markdown("**βœ… Analysis completed successfully!**")
826
+
827
+ # Show completion message
828
+ st.balloons()
829
+ time.sleep(1)
830
+
831
+ # Clear progress and show results
832
+ progress_container.empty()
833
+ st.rerun()
834
+
835
+ except Exception as e:
836
+ st.error(f"❌ Analysis failed: {str(e)}")
837
+ if os.path.exists(temp_file):
838
+ os.remove(temp_file)
839
+
840
+ def display_results():
841
+ """Display beautiful analysis results"""
842
+ results = st.session_state.analysis_results
843
+ if results is None:
844
+ return
845
+
846
+ # Results header
847
+ st.markdown("""
848
+ <div style="text-align: center; margin: 3rem 0;">
849
+ <h1 style="font-size: 2.5rem; color: #1e293b; margin-bottom: 0.5rem;">πŸ“Š Analysis Complete!</h1>
850
+ <p style="font-size: 1.1rem; color: #64748b;">Here are your AI-generated insights and recommendations</p>
851
+ </div>
852
+ """, unsafe_allow_html=True)
853
+
854
+ # Dataset Overview with beautiful cards
855
+ st.markdown("### πŸ“‹ Dataset Overview")
856
+ info = results.get('dataset_info', {})
857
+
858
+ col1, col2, col3, col4, col5 = st.columns(5)
859
+
860
+ metrics = [
861
+ ("πŸ“Š", "Total Rows", f"{info.get('shape', [0])[0]:,}", "#3b82f6"),
862
+ ("πŸ“‹", "Columns", str(info.get('shape', [0, 0])[1]), "#8b5cf6"),
863
+ ("πŸ”’", "Numeric", str(len(info.get('numeric_columns', []))), "#06b6d4"),
864
+ ("πŸ“", "Categorical", str(len(info.get('categorical_columns', []))), "#10b981"),
865
+ ("✨", "Quality Score", f"{max(0, 100 - (sum(info.get('null_counts', {}).values()) / max(info.get('shape', [1, 1])[0] * info.get('shape', [1, 1])[1], 1) * 100)):.0f}%", "#f59e0b")
866
+ ]
867
+
868
+ for i, (icon, label, value, color) in enumerate(metrics):
869
+ with [col1, col2, col3, col4, col5][i]:
870
+ st.markdown(f"""
871
+ <div style="
872
+ background: linear-gradient(135deg, {color}15 0%, {color}25 100%);
873
+ border: 2px solid {color}30;
874
+ border-radius: 16px;
875
+ padding: 1.5rem;
876
+ text-align: center;
877
+ margin: 0.5rem 0;
878
+ transition: transform 0.2s ease;
879
+ ">
880
+ <div style="font-size: 2rem; margin-bottom: 0.5rem;">{icon}</div>
881
+ <div style="font-size: 1.8rem; font-weight: 700; color: {color}; margin-bottom: 0.25rem;">{value}</div>
882
+ <div style="font-size: 0.9rem; color: #64748b; font-weight: 500;">{label}</div>
883
+ </div>
884
+ """, unsafe_allow_html=True)
885
+
886
+ st.markdown("<br>", unsafe_allow_html=True)
887
+
888
+ # Key Insights Section - Extract complete insights with headers and content combined
889
+ st.markdown("### πŸ’‘ Key Insights")
890
+ insights = results.get('insights', [])
891
+
892
+ if insights:
893
+ # Combine all insight text and parse properly
894
+ full_text = ' '.join(str(item) for item in insights)
895
+
896
+ # Extract complete insights (header + content) using regex
897
+ import re
898
+
899
+ # Pattern to match **Insight X:** followed by content until next insight or end
900
+ insight_pattern = r'\*\*Insight (\d+):(.*?)(?=\*\*Insight \d+:|$)'
901
+ matches = re.findall(insight_pattern, full_text, re.DOTALL)
902
+
903
+ processed_insights = []
904
+ for match in matches:
905
+ insight_num, content = match
906
+ clean_content = content.strip().rstrip('*')
907
+ if len(clean_content) > 20:
908
+ processed_insights.append(clean_content)
909
+
910
+ # Take top 5 insights
911
+ top_insights = processed_insights[:5]
912
+
913
+ if top_insights:
914
+ st.markdown(f"**Top {len(top_insights)} key insights from your data:**")
915
+ st.markdown("<br>", unsafe_allow_html=True)
916
+
917
+ for i, insight in enumerate(top_insights):
918
+ st.markdown(f"""
919
+ <div class="insight-box animate-fade-in">
920
+ <div style="display: flex; align-items: flex-start; gap: 1rem;">
921
+ <div style="
922
+ background: #3b82f6;
923
+ color: white;
924
+ border-radius: 50%;
925
+ width: 32px;
926
+ height: 32px;
927
+ display: flex;
928
+ align-items: center;
929
+ justify-content: center;
930
+ font-weight: bold;
931
+ font-size: 0.9rem;
932
+ flex-shrink: 0;
933
+ ">{i+1}</div>
934
+ <div style="flex: 1;">
935
+ <strong style="color: #1e293b;">πŸ’‘ Key Insight {i+1}:</strong><br>
936
+ <span style="color: #475569; line-height: 1.6;">{insight}</span>
937
+ </div>
938
+ </div>
939
+ </div>
940
+ """, unsafe_allow_html=True)
941
+ else:
942
+ st.info("πŸ” No insights could be extracted from the analysis.")
943
+ else:
944
+ st.info("πŸ” No insights were generated.")
945
+
946
+ # Interactive Visualizations Section
947
+ st.markdown("### πŸ“ˆ Interactive Data Exploration")
948
+
949
+ if st.session_state.dataset is not None:
950
+ df = st.session_state.dataset
951
+
952
+ # Beautiful tabs
953
+ tab1, tab2, tab3, tab4 = st.tabs([
954
+ "πŸ“Š Distributions",
955
+ "πŸ”— Correlations",
956
+ "πŸ“ˆ Trends & Patterns",
957
+ "🎯 Custom Analysis"
958
+ ])
959
+
960
+ with tab1:
961
+ st.markdown("#### πŸ“Š Distribution Analysis")
962
+ numeric_cols = df.select_dtypes(include=['number']).columns.tolist()
963
+
964
+ if len(numeric_cols) > 0:
965
+ # Column selector at the top
966
+ selected_col = st.selectbox(
967
+ "Select column to analyze",
968
+ numeric_cols,
969
+ key="dist_col"
970
+ )
971
+
972
+ st.markdown("<br>", unsafe_allow_html=True)
973
+
974
+ # Show all three plots side by side
975
+ col1, col2, col3 = st.columns(3)
976
+
977
+ with col1:
978
+ st.markdown("**Histogram**")
979
+ fig_hist = px.histogram(
980
+ df,
981
+ x=selected_col,
982
+ title=f"Histogram",
983
+ nbins=30,
984
+ color_discrete_sequence=['#3b82f6']
985
+ )
986
+ fig_hist.update_layout(
987
+ height=380,
988
+ plot_bgcolor='rgba(0,0,0,0)',
989
+ paper_bgcolor='rgba(0,0,0,0)',
990
+ title_font_size=14,
991
+ margin=dict(t=40, b=40, l=40, r=40)
992
+ )
993
+ st.plotly_chart(fig_hist, use_container_width=True)
994
+
995
+ with col2:
996
+ st.markdown("**Box Plot**")
997
+ fig_box = px.box(
998
+ df,
999
+ y=selected_col,
1000
+ title=f"Box Plot",
1001
+ color_discrete_sequence=['#8b5cf6']
1002
+ )
1003
+ fig_box.update_layout(
1004
+ height=380,
1005
+ plot_bgcolor='rgba(0,0,0,0)',
1006
+ paper_bgcolor='rgba(0,0,0,0)',
1007
+ title_font_size=14,
1008
+ margin=dict(t=40, b=40, l=40, r=40)
1009
+ )
1010
+ st.plotly_chart(fig_box, use_container_width=True)
1011
+
1012
+ with col3:
1013
+ st.markdown("**Violin Plot**")
1014
+ fig_violin = px.violin(
1015
+ df,
1016
+ y=selected_col,
1017
+ title=f"Violin Plot",
1018
+ color_discrete_sequence=['#06b6d4']
1019
+ )
1020
+ fig_violin.update_layout(
1021
+ height=380,
1022
+ plot_bgcolor='rgba(0,0,0,0)',
1023
+ paper_bgcolor='rgba(0,0,0,0)',
1024
+ title_font_size=14,
1025
+ margin=dict(t=40, b=40, l=40, r=40)
1026
+ )
1027
+ st.plotly_chart(fig_violin, use_container_width=True)
1028
+
1029
+ # Statistics cards below the plots
1030
+ st.markdown("#### πŸ“Š Statistical Summary")
1031
+ stats_col1, stats_col2, stats_col3, stats_col4, stats_col5 = st.columns(5)
1032
+
1033
+ stats = [
1034
+ ("Mean", f"{df[selected_col].mean():.2f}", "#3b82f6"),
1035
+ ("Median", f"{df[selected_col].median():.2f}", "#8b5cf6"),
1036
+ ("Std Dev", f"{df[selected_col].std():.2f}", "#06b6d4"),
1037
+ ("Min", f"{df[selected_col].min():.2f}", "#10b981"),
1038
+ ("Max", f"{df[selected_col].max():.2f}", "#f59e0b")
1039
+ ]
1040
+
1041
+ for i, (label, value, color) in enumerate(stats):
1042
+ with [stats_col1, stats_col2, stats_col3, stats_col4, stats_col5][i]:
1043
+ st.markdown(f"""
1044
+ <div style="
1045
+ background: {color}15;
1046
+ border: 1px solid {color}30;
1047
+ border-radius: 12px;
1048
+ padding: 1rem;
1049
+ text-align: center;
1050
+ ">
1051
+ <div style="font-size: 1.4rem; font-weight: 700; color: {color};">{value}</div>
1052
+ <div style="font-size: 0.85rem; color: #64748b; margin-top: 0.25rem;">{label}</div>
1053
+ </div>
1054
+ """, unsafe_allow_html=True)
1055
+ else:
1056
+ st.info("πŸ“Š No numeric columns found for distribution analysis.")
1057
+
1058
+ with tab2:
1059
+ st.markdown("#### πŸ”— Correlation Analysis")
1060
+
1061
+ if len(numeric_cols) > 1:
1062
+ # Correlation matrix heatmap
1063
+ corr_matrix = df[numeric_cols].corr()
1064
+
1065
+ fig = px.imshow(
1066
+ corr_matrix,
1067
+ text_auto=True,
1068
+ aspect="auto",
1069
+ title="Correlation Matrix",
1070
+ color_continuous_scale="RdBu_r",
1071
+ zmin=-1,
1072
+ zmax=1
1073
+ )
1074
+ fig.update_layout(
1075
+ height=500,
1076
+ plot_bgcolor='rgba(0,0,0,0)',
1077
+ paper_bgcolor='rgba(0,0,0,0)'
1078
+ )
1079
+ st.plotly_chart(fig, use_container_width=True)
1080
+
1081
+ # Top correlations
1082
+ st.markdown("#### πŸ”— Strongest Correlations")
1083
+ correlations = []
1084
+ for i in range(len(corr_matrix.columns)):
1085
+ for j in range(i+1, len(corr_matrix.columns)):
1086
+ corr_val = corr_matrix.iloc[i, j]
1087
+ if not pd.isna(corr_val):
1088
+ correlations.append({
1089
+ 'Variable 1': corr_matrix.columns[i],
1090
+ 'Variable 2': corr_matrix.columns[j],
1091
+ 'Correlation': corr_val,
1092
+ 'Strength': abs(corr_val)
1093
+ })
1094
+
1095
+ if correlations:
1096
+ corr_df = pd.DataFrame(correlations)
1097
+ corr_df = corr_df.sort_values('Strength', ascending=False).head(10)
1098
+
1099
+ # Display as beautiful cards
1100
+ for _, row in corr_df.head(5).iterrows():
1101
+ strength = "Strong" if row['Strength'] > 0.7 else "Moderate" if row['Strength'] > 0.5 else "Weak"
1102
+ color = "#ef4444" if row['Strength'] > 0.7 else "#f59e0b" if row['Strength'] > 0.5 else "#10b981"
1103
+
1104
+ st.markdown(f"""
1105
+ <div style="
1106
+ background: {color}15;
1107
+ border-left: 4px solid {color};
1108
+ border-radius: 8px;
1109
+ padding: 1rem;
1110
+ margin: 0.5rem 0;
1111
+ ">
1112
+ <div style="font-weight: 600; color: #1e293b; margin-bottom: 0.5rem;">
1113
+ {row['Variable 1']} ↔ {row['Variable 2']}
1114
+ </div>
1115
+ <div style="color: #64748b;">
1116
+ Correlation: <strong style="color: {color};">{row['Correlation']:.3f}</strong>
1117
+ ({strength} relationship)
1118
+ </div>
1119
+ </div>
1120
+ """, unsafe_allow_html=True)
1121
+ else:
1122
+ st.info("πŸ”— Need at least 2 numeric columns for correlation analysis.")
1123
+
1124
+ with tab3:
1125
+ st.markdown("#### πŸ“ˆ Trends & Patterns")
1126
+
1127
+ date_cols = df.select_dtypes(include=['datetime64']).columns.tolist()
1128
+ cat_cols = df.select_dtypes(include=['object', 'category']).columns.tolist()
1129
+
1130
+ if len(date_cols) > 0 and len(numeric_cols) > 0:
1131
+ col1, col2 = st.columns(2)
1132
+ with col1:
1133
+ date_col = st.selectbox("Date column", date_cols, key="trend_date")
1134
+ with col2:
1135
+ value_col = st.selectbox("Value column", numeric_cols, key="trend_value")
1136
+
1137
+ df_sorted = df.sort_values(date_col)
1138
+ fig = px.line(
1139
+ df_sorted,
1140
+ x=date_col,
1141
+ y=value_col,
1142
+ title=f"{value_col} Over Time",
1143
+ color_discrete_sequence=['#3b82f6']
1144
+ )
1145
+ fig.update_layout(height=400)
1146
+ st.plotly_chart(fig, use_container_width=True)
1147
+
1148
+ elif cat_cols and numeric_cols:
1149
+ st.markdown("#### πŸ“Š Category-based Analysis")
1150
+
1151
+ col1, col2, col3 = st.columns(3)
1152
+ with col1:
1153
+ cat_col = st.selectbox("Category", cat_cols, key="cat_trend")
1154
+ with col2:
1155
+ num_col = st.selectbox("Numeric value", numeric_cols, key="num_trend")
1156
+ with col3:
1157
+ agg_func = st.selectbox("Aggregation", ["mean", "sum", "count", "median"])
1158
+
1159
+ if agg_func == "count":
1160
+ grouped = df.groupby(cat_col).size().reset_index(name='count')
1161
+ y_col = 'count'
1162
+ else:
1163
+ grouped = df.groupby(cat_col)[num_col].agg(agg_func).reset_index()
1164
+ y_col = num_col
1165
+
1166
+ fig = px.bar(
1167
+ grouped,
1168
+ x=cat_col,
1169
+ y=y_col,
1170
+ title=f"{agg_func.title()} of {num_col if agg_func != 'count' else 'Count'} by {cat_col}",
1171
+ color_discrete_sequence=['#8b5cf6']
1172
+ )
1173
+ fig.update_layout(height=400)
1174
+ st.plotly_chart(fig, use_container_width=True)
1175
+ else:
1176
+ st.info("πŸ“ˆ Upload data with date columns or categorical data to see trends.")
1177
+
1178
+ with tab4:
1179
+ st.markdown("#### 🎯 Custom Analysis Builder")
1180
+
1181
+ col1, col2 = st.columns([1, 2])
1182
+
1183
+ with col1:
1184
+ viz_type = st.selectbox(
1185
+ "Chart Type",
1186
+ ["Scatter Plot", "Bar Chart", "Pie Chart", "Sunburst", "Treemap"]
1187
+ )
1188
+
1189
+ if viz_type == "Scatter Plot" and len(numeric_cols) >= 2:
1190
+ x_col = st.selectbox("X-axis", numeric_cols, key="custom_x")
1191
+ y_col = st.selectbox("Y-axis", numeric_cols, key="custom_y")
1192
+ color_col = st.selectbox("Color by", ["None"] + list(df.columns), key="custom_color")
1193
+ size_col = st.selectbox("Size by", ["None"] + numeric_cols, key="custom_size")
1194
+
1195
+ elif viz_type in ["Bar Chart", "Pie Chart"] and cat_cols:
1196
+ cat_col = st.selectbox("Category", cat_cols, key="custom_cat")
1197
+ if numeric_cols:
1198
+ val_col = st.selectbox("Value (optional)", ["Count"] + numeric_cols, key="custom_val")
1199
+ else:
1200
+ val_col = "Count"
1201
+
1202
+ with col2:
1203
+ try:
1204
+ if viz_type == "Scatter Plot" and len(numeric_cols) >= 2:
1205
+ fig = px.scatter(
1206
+ df,
1207
+ x=x_col,
1208
+ y=y_col,
1209
+ color=None if color_col == "None" else color_col,
1210
+ size=None if size_col == "None" else size_col,
1211
+ title=f"{y_col} vs {x_col}",
1212
+ color_discrete_sequence=['#3b82f6'],
1213
+ hover_data=df.columns[:5].tolist()
1214
+ )
1215
+ fig.update_layout(height=500)
1216
+ st.plotly_chart(fig, use_container_width=True)
1217
+
1218
+ elif viz_type == "Pie Chart" and cat_cols:
1219
+ if val_col == "Count":
1220
+ value_counts = df[cat_col].value_counts().head(8)
1221
+ fig = px.pie(
1222
+ values=value_counts.values,
1223
+ names=value_counts.index,
1224
+ title=f"Distribution of {cat_col}"
1225
+ )
1226
+ else:
1227
+ grouped = df.groupby(cat_col)[val_col].sum().head(8)
1228
+ fig = px.pie(
1229
+ values=grouped.values,
1230
+ names=grouped.index,
1231
+ title=f"{val_col} by {cat_col}"
1232
+ )
1233
+ fig.update_layout(height=500)
1234
+ st.plotly_chart(fig, use_container_width=True)
1235
+
1236
+ except Exception as e:
1237
+ st.error(f"Error creating visualization: {str(e)}")
1238
+
1239
+ # Recommendations Section - Extract complete recommendations with headers and content combined
1240
+ st.markdown("### 🎯 AI-Generated Recommendations")
1241
+ recommendations = results.get('recommendations', [])
1242
+
1243
+ if recommendations:
1244
+ # Combine all recommendation text and parse properly
1245
+ full_text = ' '.join(str(item) for item in recommendations)
1246
+
1247
+ # Extract complete recommendations using regex
1248
+ import re
1249
+
1250
+ # Pattern to match recommendations (various formats)
1251
+ rec_patterns = [
1252
+ r'\*\*.*?(\d+):(.*?)(?=\*\*.*?\d+:|$)', # **Something 1:** format
1253
+ r'(\d+)\.\s+(.*?)(?=\d+\.|$)', # 1. format
1254
+ ]
1255
+
1256
+ processed_recommendations = []
1257
+ for pattern in rec_patterns:
1258
+ matches = re.findall(pattern, full_text, re.DOTALL)
1259
+ if matches:
1260
+ for match in matches:
1261
+ if len(match) == 2:
1262
+ rec_num, content = match
1263
+ clean_content = content.strip().rstrip('*')
1264
+ if len(clean_content) > 20:
1265
+ processed_recommendations.append(clean_content)
1266
+ break
1267
+
1268
+ # Take top 5 recommendations
1269
+ top_recommendations = processed_recommendations[:5]
1270
+
1271
+ if top_recommendations:
1272
+ st.markdown(f"**Top {len(top_recommendations)} actionable recommendations:**")
1273
+ st.markdown("<br>", unsafe_allow_html=True)
1274
+
1275
+ for i, rec in enumerate(top_recommendations):
1276
+ st.markdown(f"""
1277
+ <div class="recommendation-box animate-fade-in">
1278
+ <div style="display: flex; align-items: flex-start; gap: 1rem;">
1279
+ <div style="
1280
+ background: #22c55e;
1281
+ color: white;
1282
+ border-radius: 50%;
1283
+ width: 32px;
1284
+ height: 32px;
1285
+ display: flex;
1286
+ align-items: center;
1287
+ justify-content: center;
1288
+ font-weight: bold;
1289
+ font-size: 0.9rem;
1290
+ flex-shrink: 0;
1291
+ ">{i+1}</div>
1292
+ <div style="flex: 1;">
1293
+ <strong style="color: #1e293b;">🎯 Recommendation {i+1}:</strong><br>
1294
+ <span style="color: #475569; line-height: 1.6;">{rec}</span>
1295
+ </div>
1296
+ </div>
1297
+ </div>
1298
+ """, unsafe_allow_html=True)
1299
+ else:
1300
+ st.info("🎯 No recommendations could be extracted from the analysis.")
1301
+ else:
1302
+ st.info("🎯 No recommendations were generated.")
1303
+
1304
+ # Download Results Section
1305
+ st.markdown("### πŸ’Ύ Download Your Results")
1306
+
1307
+ col1, col2, col3 = st.columns(3)
1308
+
1309
+ download_items = [
1310
+ ("πŸ“„", "Analysis Report (JSON)", "Download complete analysis", "json"),
1311
+ ("πŸ“Š", "Enhanced Dataset (CSV)", "Download processed data", "csv"),
1312
+ ("πŸ“‹", "Executive Summary (MD)", "Download business report", "md")
1313
+ ]
1314
+
1315
+ for i, (icon, title, desc, file_type) in enumerate(download_items):
1316
+ with [col1, col2, col3][i]:
1317
+ st.markdown(f"""
1318
+ <div style="
1319
+ background: linear-gradient(135deg, #f8fafc 0%, #f1f5f9 100%);
1320
+ border: 2px solid #e2e8f0;
1321
+ border-radius: 16px;
1322
+ padding: 1.5rem;
1323
+ text-align: center;
1324
+ margin: 0.5rem 0;
1325
+ transition: all 0.3s ease;
1326
+ ">
1327
+ <div style="font-size: 2.5rem; margin-bottom: 1rem;">{icon}</div>
1328
+ <div style="font-size: 1.1rem; font-weight: 600; margin-bottom: 0.5rem; color: #1e293b;">{title}</div>
1329
+ <div style="font-size: 0.9rem; color: #64748b; margin-bottom: 1rem;">{desc}</div>
1330
+ """, unsafe_allow_html=True)
1331
+
1332
+ if file_type == "json":
1333
+ data = json.dumps(results, indent=2, default=str)
1334
+ filename = f"analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
1335
+ mime = "application/json"
1336
+ elif file_type == "csv":
1337
+ data = st.session_state.dataset.to_csv(index=False)
1338
+ filename = f"enhanced_dataset_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
1339
+ mime = "text/csv"
1340
+ else: # md
1341
+ data = generate_report(results)
1342
+ filename = f"executive_summary_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md"
1343
+ mime = "text/markdown"
1344
+
1345
+ st.download_button(
1346
+ label=f"Download {file_type.upper()}",
1347
+ data=data,
1348
+ file_name=filename,
1349
+ mime=mime,
1350
+ use_container_width=True
1351
+ )
1352
+
1353
+ st.markdown("</div>", unsafe_allow_html=True)
1354
+
1355
+ def generate_report(results):
1356
+ """Generate a beautiful markdown report"""
1357
+ filename = getattr(st.session_state, 'uploaded_filename', 'dataset')
1358
+
1359
+ report = f"""# πŸ€– AI Data Analysis Executive Summary
1360
+
1361
+ **Dataset:** {filename}
1362
+ **Generated:** {datetime.now().strftime('%B %d, %Y at %I:%M %p')}
1363
+ **Powered by:** Llama 3 & LangGraph AI Agents
1364
+
1365
+ ---
1366
+
1367
+ ## πŸ“Š Executive Overview
1368
+
1369
+ This report presents key findings from an AI-powered analysis of your dataset. Our advanced language models have identified patterns, trends, and opportunities that can drive business decisions.
1370
+
1371
+ ### Dataset Metrics
1372
+ - **Total Records:** {results.get('dataset_info', {}).get('shape', [0])[0]:,}
1373
+ - **Data Points:** {len(results.get('dataset_info', {}).get('columns', []))}
1374
+ - **Data Quality Score:** {max(0, 100 - (sum(results.get('dataset_info', {}).get('null_counts', {}).values()) / max(results.get('dataset_info', {}).get('shape', [1, 1])[0] * results.get('dataset_info', {}).get('shape', [1, 1])[1], 1) * 100)):.0f}%
1375
+
1376
+ ---
1377
+
1378
+ ## πŸ’‘ Strategic Insights
1379
+
1380
+ Our AI analysis has uncovered the following key insights:
1381
+
1382
+ """
1383
+
1384
+ insights = results.get('insights', [])
1385
+ if insights:
1386
+ for i, insight in enumerate(insights, 1):
1387
+ report += f"**{i}.** {insight}\n\n"
1388
+ else:
1389
+ report += "*No specific insights were generated for this dataset.*\n\n"
1390
+
1391
+ report += """---
1392
+
1393
+ ## 🎯 Recommended Actions
1394
+
1395
+ Based on the data analysis, we recommend the following strategic actions:
1396
+
1397
+ """
1398
+
1399
+ recommendations = results.get('recommendations', [])
1400
+ if recommendations:
1401
+ for i, rec in enumerate(recommendations, 1):
1402
+ report += f"**{i}.** {rec}\n\n"
1403
+ else:
1404
+ report += "*No specific recommendations were generated for this dataset.*\n\n"
1405
+
1406
+ report += f"""---
1407
+
1408
+ ## πŸ”§ Technical Summary
1409
+
1410
+ - **Analysis Completed:** {results.get('analysis_timestamp', 'N/A')}
1411
+ - **Visualizations Created:** {len(results.get('visualizations', []))}
1412
+ - **Processing Errors:** {len(results.get('errors', []))}
1413
+ - **AI Model Used:** Llama 3 (70B parameters)
1414
+
1415
+ ---
1416
+
1417
+ ## πŸ“ˆ Next Steps
1418
+
1419
+ 1. **Review Insights:** Analyze each insight for immediate actionable opportunities
1420
+ 2. **Implement Recommendations:** Prioritize recommendations based on business impact
1421
+ 3. **Monitor Progress:** Track key metrics identified in this analysis
1422
+ 4. **Iterate:** Regular re-analysis as new data becomes available
1423
+
1424
+ ---
1425
+
1426
+ *This report was generated automatically by our AI Data Analysis Agent. For questions or support, please contact your data team.*
1427
+ """
1428
+
1429
+ return report
1430
+
1431
+ def main():
1432
+ """Main application function with beautiful design"""
1433
+ initialize_session_state()
1434
+
1435
+ # Check if analysis is complete to show results immediately
1436
+ if st.session_state.analysis_complete and st.session_state.analysis_results:
1437
+ display_results()
1438
+
1439
+ # Add a "Start New Analysis" button
1440
+ st.markdown("---")
1441
+ col1, col2, col3 = st.columns([1, 1, 1])
1442
+ with col2:
1443
+ if st.button("πŸ”„ Start New Analysis", use_container_width=True):
1444
+ # Reset session state
1445
+ st.session_state.analysis_results = None
1446
+ st.session_state.analysis_complete = False
1447
+ st.session_state.dataset = None
1448
+ st.rerun()
1449
+ return
1450
+
1451
+ # Hero Section
1452
+ display_hero_section()
1453
+
1454
+ # Feature showcase
1455
+ display_features()
1456
+
1457
+ # Sidebar configuration
1458
+ api_configured = sidebar_config()
1459
+
1460
+ if not api_configured:
1461
+ # Beautiful warning with setup instructions
1462
+ st.markdown("""
1463
+ <div style="
1464
+ background: linear-gradient(135deg, #fef3c7 0%, #fde68a 100%);
1465
+ border: 2px solid #f59e0b;
1466
+ border-radius: 16px;
1467
+ padding: 2rem;
1468
+ margin: 2rem 0;
1469
+ text-align: center;
1470
+ ">
1471
+ <div style="font-size: 3rem; margin-bottom: 1rem;">πŸ”‘</div>
1472
+ <h3 style="color: #92400e; margin-bottom: 1rem;">API Key Required</h3>
1473
+ <p style="color: #78350f; margin-bottom: 1.5rem;">
1474
+ Please configure your Groq API key to unlock the power of AI analysis
1475
+ </p>
1476
+ </div>
1477
+ """, unsafe_allow_html=True)
1478
+
1479
+ # Expandable setup guide
1480
+ with st.expander("πŸš€ Quick Setup Guide", expanded=True):
1481
+ st.markdown("""
1482
+ ### Option 1: Environment Variable (Recommended)
1483
+ ```bash
1484
+ export GROQ_API_KEY="your_api_key_here"
1485
+ streamlit run web_app.py
1486
+ ```
1487
+
1488
+ ### Option 2: Manual Entry
1489
+ 1. Visit [Groq Console](https://console.groq.com/) πŸ”—
1490
+ 2. Create a free account and generate your API key
1491
+ 3. Enter the key in the sidebar ←
1492
+ 4. Upload your dataset and start analyzing!
1493
+
1494
+ ### Supported File Formats
1495
+ - **CSV files** (.csv) - Most common format
1496
+ - **Excel files** (.xlsx, .xls) - Spreadsheet data
1497
+ - **JSON files** (.json) - Structured data
1498
+
1499
+ ### Tips for Best Results
1500
+ - Ensure clean, well-structured data
1501
+ - Include meaningful column names
1502
+ - Mix of numeric and categorical columns works best
1503
+ - Date/time columns enable trend analysis
1504
+ """)
1505
+ return
1506
+
1507
+ # Main content area with beautiful layout
1508
+ st.markdown("---")
1509
+
1510
+ # Dataset upload section
1511
+ dataset_uploaded = upload_dataset()
1512
+
1513
+ # Analysis section
1514
+ if dataset_uploaded:
1515
+ st.markdown("---")
1516
+
1517
+ # Center the analyze button with beautiful styling
1518
+ col1, col2, col3 = st.columns([1, 2, 1])
1519
+ with col2:
1520
+ if st.button(
1521
+ "πŸš€ Analyze My Data with AI",
1522
+ type="primary",
1523
+ use_container_width=True,
1524
+ help="Start the AI-powered analysis of your dataset"
1525
+ ):
1526
+ run_analysis()
1527
+
1528
+ # Footer
1529
+ st.markdown("""
1530
+ <div class="footer">
1531
+ <div style="max-width: 800px; margin: 0 auto;">
1532
+ <div style="font-size: 1.5rem; margin-bottom: 1rem;">πŸ€–βœ¨</div>
1533
+ <p style="margin-bottom: 1rem;">
1534
+ <strong>AI Data Analysis Agent</strong> - Transform your data into actionable insights
1535
+ </p>
1536
+ <p style="font-size: 0.85rem; margin-bottom: 1rem;">
1537
+ Powered by <strong>Llama 3</strong> β€’ Built with <strong>LangGraph</strong> β€’
1538
+ Designed with <strong>Streamlit</strong>
1539
+ </p>
1540
+ <div style="display: flex; justify-content: center; gap: 2rem; font-size: 0.9rem;">
1541
+ <a href="#" style="color: #3b82f6; text-decoration: none;">πŸ“– Documentation</a>
1542
+ <a href="#" style="color: #3b82f6; text-decoration: none;">πŸ› Report Issues</a>
1543
+ <a href="#" style="color: #3b82f6; text-decoration: none;">⭐ Give Feedback</a>
1544
+ <a href="#" style="color: #3b82f6; text-decoration: none;">πŸ’‘ Feature Requests</a>
1545
+ </div>
1546
+ </div>
1547
+ </div>
1548
+ """, unsafe_allow_html=True)
1549
+
1550
+ if __name__ == "__main__":
1551
+ main()