niddijoris commited on
Commit
790e0e9
·
1 Parent(s): 456b98f

Upload Streamlit app

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ src/data/car_prices.csv filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,19 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: ChatWithData
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Streamlit template space
 
 
 
 
 
 
 
12
  ---
13
 
14
- # Welcome to Streamlit!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
 
17
 
18
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
19
- forums](https://discuss.streamlit.io).
 
1
+ # 🚗 Data Insights App
2
+
3
+ An AI-powered data analysis platform that allows users to explore a massive car auction dataset (558,000+ records) using natural language. The app combines **Streamlit** for the interface, **SQLite** for data management, and **OpenAI's GPT-4** for intelligent analysis and dynamic chart generation.
4
+
5
+ ---
6
+ [## Hugging Face](https://huggingface.co/spaces/niddijoris/ChatWithData)
7
+
8
+ ## � Application Gallery
9
+
10
+ | Dashboard Overview | AI Chat & Analytics |
11
+ | :---: | :---: |
12
+ | ![Dashboard Overview](/screenshots/1.png) | ![AI Chat & Analytics](/screenshots/2.png) |
13
+ | **Real-time Statistics & Insights** | **Intelligent Querying & Data Analysis** |
14
+
15
+ | Dynamic Chart Generation | Safety & Logs |
16
+ | :---: | :---: |
17
+ | ![Chart Generation](/screenshots/3.png) | ![Console Logs](/screenshots/4.png) |
18
+ | **AI-driven Visualizations** | **Security Guardrails & Activity Monitoring** |
19
+
20
  ---
21
+
22
+ ## 🌟 Key Features
23
+
24
+ ### 🤖 Intelligent AI Agent
25
+ - **Natural Language Querying**: Ask questions like "What is the average price of a BMW?" or "Compare prices between California and Florida".
26
+ - **Dynamic Chart Generation**: Ask for visualizations (bar, line, pie, scatter) and the AI will generate them instantly.
27
+ - **Context-Aware Support**: If the agent can't help, it offers to create a GitHub support ticket with the chat history.
28
+
29
+ ### 🛡️ Secure Data Management
30
+ - **ReadOnly Safety**: Strict SQL validation ensures only `SELECT` queries are executed. Dangerous operations (`DELETE`, `DROP`, `UPDATE`) are automatically blocked.
31
+ - **Privacy Guardrails**: The agent never communicates the full dataset, only relevant snippets (limited to 100 rows).
32
+
33
+ ### 📊 Business Intelligence
34
+ - **Real-time Stats**: Instantly see total inventory, average prices, and price/year ranges in the sidebar.
35
+ - **Automated Insights**: Interactive top-make comparisons and condition distribution charts.
36
+ - **Console Monitoring**: A live developer console in the sidebar shows every action the AI and database are taking.
37
+
38
  ---
39
 
40
+ ## 🚀 Quick Start
41
+
42
+ ### 1. Prerequisites
43
+ - Python 3.9+
44
+ - OpenAI API Key
45
+
46
+ ### 2. Setup
47
+ ```bash
48
+ # Clone the repository and enter directory
49
+ cd "Capstone folder"
50
+
51
+ # Create and activate virtual environment
52
+ python3 -m venv .venv
53
+ source .venv/bin/activate
54
+
55
+ # Install dependencies
56
+ pip install -r requirements.txt
57
+ ```
58
+
59
+ ### 3. Configuration
60
+ Copy `.env.example` to `.env` and fill in your keys:
61
+ ```bash
62
+ cp .env.example .env
63
+ ```
64
+ **Required**: `OPENAI_API_KEY`
65
+ **Optional**: `GITHUB_TOKEN` and `GITHUB_REPO` (for support tickets)
66
+
67
+ ### 4. Run the Application
68
+ Use the automated run script to ensure the correct environment is used:
69
+ ```bash
70
+ chmod +x run.sh
71
+ ./run.sh
72
+ ```
73
+
74
+ ---
75
+
76
+ ## �️ Project Architecture
77
+ - **`app.py`**: Main Streamlit interface.
78
+ - **`agent/`**: AI logic and tool definitions.
79
+ - **`database/`**: Safe SQL execution and CSV-to-SQLite ingestion.
80
+ - **`support/`**: GitHub API integration for support tickets.
81
+ - **`ui/`**: Chart generation (Plotly) and styling.
82
+ - **`utils/`**: Custom Streamlit-integrated logger.
83
+
84
+ ---
85
 
86
+ ## 🛡️ Security Policy
87
+ This application is designed with safety as a priority. The `SafetyValidator` provides a robust whitelist of allowed SQL operations, specifically protecting against SQL injection and unauthorized data modification.
88
 
89
+ 🛡️ **Active Protections**: Only SELECT | All dangerous keywords blocked | Data remains secure
 
requirements.txt CHANGED
@@ -1,3 +1,6 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
1
+ streamlit==1.31.0
2
+ openai>=2.17.0
3
+ pandas==2.2.0
4
+ plotly==5.18.0
5
+ PyGithub==2.1.1
6
+ python-dotenv==1.0.1
src/.env.example ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copy this file to .env and fill in your actual API keys
2
+
3
+ # OpenAI API Configuration (REQUIRED)
4
+ OPENAI_API_KEY=sk-your-openai-api-key-here
5
+
6
+ # GitHub Integration (OPTIONAL - for support tickets)
7
+ # Get a token from: https://github.com/settings/tokens
8
+ # Required scopes: repo
9
+ GITHUB_TOKEN=
10
+ GITHUB_REPO=
11
+
12
+ # Database Configuration (OPTIONAL - defaults work fine)
13
+ DATABASE_PATH=./database/car_prices.db
14
+
15
+ # Application Settings (OPTIONAL)
16
+ LOG_LEVEL=INFO
src/QUICKSTART.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quick Start Guide
2
+
3
+ ## Setup Steps
4
+
5
+ 1. **Create and activate virtual environment** (if not already done):
6
+ ```bash
7
+ python3 -m venv .venv
8
+ source .venv/bin/activate # On macOS/Linux
9
+ ```
10
+
11
+ 2. **Install dependencies**:
12
+ ```bash
13
+ pip install -r requirements.txt
14
+ ```
15
+
16
+ 3. **Configure environment variables**:
17
+ ```bash
18
+ cp .env.example .env
19
+ # Edit .env and add your OPENAI_API_KEY
20
+ ```
21
+
22
+ 4. **Run the application**:
23
+ ```bash
24
+ # Option 1: Use the run script
25
+ ./run.sh
26
+
27
+ # Option 2: Run directly with .venv
28
+ .venv/bin/streamlit run app.py
29
+ ```
30
+
31
+ **IMPORTANT**: Always use `.venv/bin/streamlit` or the `run.sh` script to ensure you're using the virtual environment's packages, not system-wide packages.
32
+
33
+ ## Troubleshooting
34
+
35
+ ### TypeError: __init__() got an unexpected keyword argument 'proxies'
36
+
37
+ If you encounter this error, reinstall OpenAI with compatible dependencies:
38
+
39
+ ```bash
40
+ .venv/bin/pip uninstall -y openai httpx httpcore
41
+ .venv/bin/pip install openai==1.54.0
42
+ ```
43
+
44
+ ### GitHub Integration Warning
45
+
46
+ If you see "GitHub initialization failed: 401 Bad credentials", this is normal if you haven't configured GitHub support. The app will use mock mode for support tickets. To enable real GitHub integration:
47
+
48
+ 1. Create a GitHub Personal Access Token at https://github.com/settings/tokens
49
+ 2. Add to your `.env` file:
50
+ ```
51
+ GITHUB_TOKEN=your_token_here
52
+ GITHUB_REPO=username/repo-name
53
+ ```
54
+
55
+ ## First Run
56
+
57
+ On first run, the app will:
58
+ 1. Create SQLite database from `data/car_prices.csv` (takes ~10-20 seconds)
59
+ 2. Load 558,837 car records
60
+ 3. Create indexes for faster queries
61
+ 4. Launch web interface at http://localhost:8501
62
+
63
+ ## Sample Queries to Try
64
+
65
+ - "What's the average price of BMW cars?"
66
+ - "Show me the top 5 most expensive models"
67
+ - "How many cars were sold in California?"
68
+ - "What's the price difference between automatic and manual transmission?"
69
+
70
+ Enjoy exploring your car data! 🚗
src/README.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚗 Data Insights App
2
+
3
+ An AI-powered data analysis platform that allows users to explore a massive car auction dataset (558,000+ records) using natural language. The app combines **Streamlit** for the interface, **SQLite** for data management, and **OpenAI's GPT-4** for intelligent analysis and dynamic chart generation.
4
+
5
+ ---
6
+ [## Hugging Face](https://huggingface.co/spaces/niddijoris/ChatWithData)
7
+
8
+ ## � Application Gallery
9
+
10
+ | Dashboard Overview | AI Chat & Analytics |
11
+ | :---: | :---: |
12
+ | ![Dashboard Overview](/screenshots/1.png) | ![AI Chat & Analytics](/screenshots/2.png) |
13
+ | **Real-time Statistics & Insights** | **Intelligent Querying & Data Analysis** |
14
+
15
+ | Dynamic Chart Generation | Safety & Logs |
16
+ | :---: | :---: |
17
+ | ![Chart Generation](/screenshots/3.png) | ![Console Logs](/screenshots/4.png) |
18
+ | **AI-driven Visualizations** | **Security Guardrails & Activity Monitoring** |
19
+
20
+ ---
21
+
22
+ ## 🌟 Key Features
23
+
24
+ ### 🤖 Intelligent AI Agent
25
+ - **Natural Language Querying**: Ask questions like "What is the average price of a BMW?" or "Compare prices between California and Florida".
26
+ - **Dynamic Chart Generation**: Ask for visualizations (bar, line, pie, scatter) and the AI will generate them instantly.
27
+ - **Context-Aware Support**: If the agent can't help, it offers to create a GitHub support ticket with the chat history.
28
+
29
+ ### 🛡️ Secure Data Management
30
+ - **ReadOnly Safety**: Strict SQL validation ensures only `SELECT` queries are executed. Dangerous operations (`DELETE`, `DROP`, `UPDATE`) are automatically blocked.
31
+ - **Privacy Guardrails**: The agent never communicates the full dataset, only relevant snippets (limited to 100 rows).
32
+
33
+ ### 📊 Business Intelligence
34
+ - **Real-time Stats**: Instantly see total inventory, average prices, and price/year ranges in the sidebar.
35
+ - **Automated Insights**: Interactive top-make comparisons and condition distribution charts.
36
+ - **Console Monitoring**: A live developer console in the sidebar shows every action the AI and database are taking.
37
+
38
+ ---
39
+
40
+ ## 🚀 Quick Start
41
+
42
+ ### 1. Prerequisites
43
+ - Python 3.9+
44
+ - OpenAI API Key
45
+
46
+ ### 2. Setup
47
+ ```bash
48
+ # Clone the repository and enter directory
49
+ cd "Capstone folder"
50
+
51
+ # Create and activate virtual environment
52
+ python3 -m venv .venv
53
+ source .venv/bin/activate
54
+
55
+ # Install dependencies
56
+ pip install -r requirements.txt
57
+ ```
58
+
59
+ ### 3. Configuration
60
+ Copy `.env.example` to `.env` and fill in your keys:
61
+ ```bash
62
+ cp .env.example .env
63
+ ```
64
+ **Required**: `OPENAI_API_KEY`
65
+ **Optional**: `GITHUB_TOKEN` and `GITHUB_REPO` (for support tickets)
66
+
67
+ ### 4. Run the Application
68
+ Use the automated run script to ensure the correct environment is used:
69
+ ```bash
70
+ chmod +x run.sh
71
+ ./run.sh
72
+ ```
73
+
74
+ ---
75
+
76
+ ## �️ Project Architecture
77
+ - **`app.py`**: Main Streamlit interface.
78
+ - **`agent/`**: AI logic and tool definitions.
79
+ - **`database/`**: Safe SQL execution and CSV-to-SQLite ingestion.
80
+ - **`support/`**: GitHub API integration for support tickets.
81
+ - **`ui/`**: Chart generation (Plotly) and styling.
82
+ - **`utils/`**: Custom Streamlit-integrated logger.
83
+
84
+ ---
85
+
86
+ ## 🛡️ Security Policy
87
+ This application is designed with safety as a priority. The `SafetyValidator` provides a robust whitelist of allowed SQL operations, specifically protecting against SQL injection and unauthorized data modification.
88
+
89
+ 🛡️ **Active Protections**: Only SELECT | All dangerous keywords blocked | Data remains secure
src/agent/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """Agent package initialization"""
2
+ from agent.ai_agent import AIAgent
3
+ from agent.tools import AgentTools
4
+
5
+ __all__ = ['AIAgent', 'AgentTools']
src/agent/ai_agent.py ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ AI Agent - OpenAI-powered assistant with function calling
3
+ """
4
+ import json
5
+ from typing import List, Dict, Any, Optional
6
+ import logging
7
+ from openai import OpenAI
8
+
9
+ from config import OPENAI_API_KEY, OPENAI_MODEL
10
+ from agent.tools import AgentTools
11
+
12
+
13
+ class AIAgent:
14
+ """AI Agent powered by OpenAI with function calling capabilities"""
15
+
16
+ SYSTEM_PROMPT = """You are a helpful data analyst assistant for a car auction/pricing database.
17
+ Your role is to help users understand and query car pricing data.
18
+
19
+ IMPORTANT GUIDELINES:
20
+ 1. **Data Privacy**: Never pass the entire dataset to your responses. Only use the tools to query specific data.
21
+ 2. **Safety**: You can only execute SELECT queries. Any attempt to modify data (DELETE, UPDATE, INSERT, DROP) will be blocked.
22
+ 3. **Tool Usage**:
23
+ - Use `query_database` for specific data queries
24
+ - Use `get_database_statistics` for general overviews and statistics
25
+ - Use `generate_chart` when the user asks for a chart, visualization, or trend analysis. Choose the most appropriate chart type (bar, column, line, pie, scatter).
26
+ - Use `create_support_ticket` when you cannot help or user requests human assistance
27
+ 4. **Support Escalation**: If you cannot answer a question or the user seems frustrated, proactively suggest creating a support ticket.
28
+ 5. **Clear Communication**: Explain your findings clearly with relevant numbers and insights.
29
+
30
+ DATABASE SCHEMA:
31
+ - Table: cars
32
+ - Columns: year, make, model, trim, body, transmission, vin, state, condition, odometer, color, interior, seller, mmr, sellingprice, saledate
33
+
34
+ Be concise, helpful, and data-driven in your responses."""
35
+
36
+ def __init__(self, tools: AgentTools):
37
+ self.tools = tools
38
+ self.client = OpenAI(api_key=OPENAI_API_KEY)
39
+ self.model = OPENAI_MODEL
40
+ self.logger = logging.getLogger(__name__)
41
+ self.conversation_history: List[Dict[str, Any]] = []
42
+
43
+ # Initialize with system prompt
44
+ self.conversation_history.append({
45
+ "role": "system",
46
+ "content": self.SYSTEM_PROMPT
47
+ })
48
+
49
+ def chat(self, user_message: str) -> Dict[str, Any]:
50
+ """
51
+ Process a user message and return AI response with metadata
52
+
53
+ Args:
54
+ user_message: User's question or request
55
+
56
+ Returns:
57
+ Dictionary with 'content' (str) and optional 'chart' (dict)
58
+ """
59
+ try:
60
+ # Add user message to history
61
+ self.conversation_history.append({
62
+ "role": "user",
63
+ "content": user_message
64
+ })
65
+
66
+ # Get AI response with function calling
67
+ return self._get_ai_response()
68
+
69
+ except Exception as e:
70
+ error_msg = f"Error processing message: {str(e)}"
71
+ self.logger.error(error_msg)
72
+ return {
73
+ "content": f"❌ {error_msg}",
74
+ "chart": None
75
+ }
76
+
77
+ def _get_ai_response(self, max_iterations: int = 5) -> Dict[str, Any]:
78
+ """
79
+ Get AI response with function calling loop
80
+
81
+ Args:
82
+ max_iterations: Maximum number of function calling iterations
83
+
84
+ Returns:
85
+ Dictionary with 'content' and optional 'chart'
86
+ """
87
+ iteration = 0
88
+ last_chart = None
89
+
90
+ while iteration < max_iterations:
91
+ iteration += 1
92
+
93
+ # Call OpenAI API
94
+ response = self.client.chat.completions.create(
95
+ model=self.model,
96
+ messages=self.conversation_history,
97
+ tools=AgentTools.get_tool_definitions(),
98
+ tool_choice="auto"
99
+ )
100
+
101
+ message = response.choices[0].message
102
+
103
+ # Check if AI wants to call a function
104
+ if message.tool_calls:
105
+ # Add assistant message to history
106
+ self.conversation_history.append({
107
+ "role": "assistant",
108
+ "content": message.content,
109
+ "tool_calls": [
110
+ {
111
+ "id": tc.id,
112
+ "type": tc.type,
113
+ "function": {
114
+ "name": tc.function.name,
115
+ "arguments": tc.function.arguments
116
+ }
117
+ }
118
+ for tc in message.tool_calls
119
+ ]
120
+ })
121
+
122
+ # Execute each tool call
123
+ for tool_call in message.tool_calls:
124
+ function_name = tool_call.function.name
125
+ function_args = json.loads(tool_call.function.arguments)
126
+
127
+ self.logger.info(f"AI calling function: {function_name}")
128
+
129
+ # Execute the tool
130
+ result = self.tools.execute_tool(function_name, function_args)
131
+
132
+ # Capture chart result if it's a chart
133
+ if result.get('is_chart'):
134
+ last_chart = result.get('chart_config')
135
+
136
+ # Add function result to history
137
+ self.conversation_history.append({
138
+ "role": "tool",
139
+ "tool_call_id": tool_call.id,
140
+ "content": json.dumps(result)
141
+ })
142
+
143
+ # Continue loop to get final response
144
+ continue
145
+
146
+ else:
147
+ # No more function calls, return final response
148
+ final_response = message.content or "I apologize, but I couldn't generate a response."
149
+
150
+ # Add to history
151
+ self.conversation_history.append({
152
+ "role": "assistant",
153
+ "content": final_response
154
+ })
155
+
156
+ return {
157
+ "content": final_response,
158
+ "chart": last_chart
159
+ }
160
+
161
+ # Max iterations reached
162
+ return {
163
+ "content": "I apologize, but I'm having trouble processing your request. Would you like me to create a support ticket for human assistance?",
164
+ "chart": None
165
+ }
166
+
167
+ def reset_conversation(self):
168
+ """Reset conversation history"""
169
+ self.conversation_history = [{
170
+ "role": "system",
171
+ "content": self.SYSTEM_PROMPT
172
+ }]
173
+ self.logger.info("Conversation history reset")
174
+
175
+ def get_conversation_context(self) -> str:
176
+ """Get conversation history as formatted string for support tickets"""
177
+ context = []
178
+ for msg in self.conversation_history:
179
+ if msg["role"] == "user":
180
+ context.append(f"User: {msg['content']}")
181
+ elif msg["role"] == "assistant" and msg.get("content"):
182
+ context.append(f"Assistant: {msg['content']}")
183
+
184
+ return "\n\n".join(context)
src/agent/tools.py ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ AI Agent Tools - Function definitions for OpenAI function calling
3
+ """
4
+ import json
5
+ from typing import Dict, Any, Optional
6
+ import logging
7
+
8
+ from database.db_manager import DatabaseManager
9
+ from support.github_integration import GitHubSupport
10
+
11
+
12
+ class AgentTools:
13
+ """Tools available to the AI agent via function calling"""
14
+
15
+ def __init__(self, db_manager: DatabaseManager, github_support: Optional[GitHubSupport] = None):
16
+ self.db_manager = db_manager
17
+ self.github_support = github_support
18
+ self.logger = logging.getLogger(__name__)
19
+
20
+ @staticmethod
21
+ def get_tool_definitions() -> list:
22
+ """
23
+ Get OpenAI function definitions for all available tools
24
+
25
+ Returns:
26
+ List of tool definitions in OpenAI format
27
+ """
28
+ return [
29
+ {
30
+ "type": "function",
31
+ "function": {
32
+ "name": "query_database",
33
+ "description": "Execute a SQL SELECT query on the car prices database. Use this to retrieve specific data based on user questions. Only SELECT queries are allowed for safety. The database contains car auction data with columns: year, make, model, trim, body, transmission, vin, state, condition, odometer, color, interior, seller, mmr, sellingprice, saledate.",
34
+ "parameters": {
35
+ "type": "object",
36
+ "properties": {
37
+ "sql_query": {
38
+ "type": "string",
39
+ "description": "The SQL SELECT query to execute. Must be a valid SELECT statement. Example: 'SELECT AVG(sellingprice) FROM cars WHERE make = \"BMW\"'"
40
+ }
41
+ },
42
+ "required": ["sql_query"]
43
+ }
44
+ }
45
+ },
46
+ {
47
+ "type": "function",
48
+ "function": {
49
+ "name": "get_database_statistics",
50
+ "description": "Get comprehensive statistics and aggregated information about the car prices database. Use this when user asks for general information, overview, or statistics about the data. Returns total records, price statistics, top makes/models, condition distribution, and year range.",
51
+ "parameters": {
52
+ "type": "object",
53
+ "properties": {},
54
+ "required": []
55
+ }
56
+ }
57
+ },
58
+ {
59
+ "type": "function",
60
+ "function": {
61
+ "name": "create_support_ticket",
62
+ "description": "Create a support ticket to reach a human for help. Use this when the user explicitly asks for human support, or when you cannot answer their question adequately. The ticket will be created as a GitHub issue.",
63
+ "parameters": {
64
+ "type": "object",
65
+ "properties": {
66
+ "title": {
67
+ "type": "string",
68
+ "description": "Brief title summarizing the support request"
69
+ },
70
+ "description": {
71
+ "type": "string",
72
+ "description": "Detailed description of the issue or question, including conversation context"
73
+ },
74
+ "priority": {
75
+ "type": "string",
76
+ "enum": ["low", "medium", "high"],
77
+ "description": "Priority level of the support request"
78
+ }
79
+ },
80
+ "required": ["title", "description"]
81
+ }
82
+ }
83
+ },
84
+ {
85
+ "type": "function",
86
+ "function": {
87
+ "name": "generate_chart",
88
+ "description": "Generate a dynamic chart based on a SQL query. Use this when the user asks for a chart, visualization, or comparison that would look better as a graph. You must provide a valid SQL SELECT query and chart configurations.",
89
+ "parameters": {
90
+ "type": "object",
91
+ "properties": {
92
+ "sql_query": {
93
+ "type": "string",
94
+ "description": "SQL SELECT query to get data for the chart. Example: 'SELECT make, AVG(sellingprice) FROM cars GROUP BY make'"
95
+ },
96
+ "chart_type": {
97
+ "type": "string",
98
+ "enum": ["bar", "column", "line", "pie", "scatter"],
99
+ "description": "Type of chart to generate"
100
+ },
101
+ "title": {
102
+ "type": "string",
103
+ "description": "Title of the chart"
104
+ },
105
+ "x_label": {
106
+ "type": "string",
107
+ "description": "Label for the X-axis (column name from query)"
108
+ },
109
+ "y_label": {
110
+ "type": "string",
111
+ "description": "Label for the Y-axis (column name from query)"
112
+ }
113
+ },
114
+ "required": ["sql_query", "chart_type", "title"]
115
+ }
116
+ }
117
+ }
118
+ ]
119
+
120
+ def execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
121
+ """
122
+ Execute a tool based on function call from OpenAI
123
+
124
+ Args:
125
+ tool_name: Name of the tool to execute
126
+ arguments: Arguments for the tool
127
+
128
+ Returns:
129
+ Result dictionary from the tool execution
130
+ """
131
+ self.logger.info(f"Executing tool: {tool_name} with args: {arguments}")
132
+
133
+ if tool_name == "query_database":
134
+ return self._query_database(arguments.get("sql_query", ""))
135
+
136
+ elif tool_name == "get_database_statistics":
137
+ return self._get_database_statistics()
138
+
139
+ elif tool_name == "create_support_ticket":
140
+ return self._create_support_ticket(
141
+ title=arguments.get("title", ""),
142
+ description=arguments.get("description", ""),
143
+ priority=arguments.get("priority", "medium")
144
+ )
145
+
146
+ elif tool_name == "generate_chart":
147
+ return self._generate_chart(
148
+ sql_query=arguments.get("sql_query", ""),
149
+ chart_type=arguments.get("chart_type", "bar"),
150
+ title=arguments.get("title", ""),
151
+ x_label=arguments.get("x_label"),
152
+ y_label=arguments.get("y_label")
153
+ )
154
+
155
+ else:
156
+ return {
157
+ "success": False,
158
+ "error": f"Unknown tool: {tool_name}"
159
+ }
160
+
161
+ def _query_database(self, sql_query: str) -> Dict[str, Any]:
162
+ """Execute a database query"""
163
+ self.logger.info(f"Executing query: {sql_query}")
164
+ result = self.db_manager.execute_query(sql_query)
165
+
166
+ # Format result for AI consumption
167
+ if result['success']:
168
+ # Limit data sent to AI to avoid token limits
169
+ data = result['data']
170
+ if len(data) > 100:
171
+ return {
172
+ "success": True,
173
+ "message": f"Query returned {len(data)} rows (showing first 100)",
174
+ "data": data[:100],
175
+ "row_count": len(data),
176
+ "truncated": True
177
+ }
178
+ else:
179
+ return {
180
+ "success": True,
181
+ "message": f"Query returned {len(data)} rows",
182
+ "data": data,
183
+ "row_count": len(data),
184
+ "truncated": False
185
+ }
186
+ else:
187
+ return {
188
+ "success": False,
189
+ "error": result['error']
190
+ }
191
+
192
+ def _get_database_statistics(self) -> Dict[str, Any]:
193
+ """Get database statistics"""
194
+ self.logger.info("Retrieving database statistics")
195
+ stats = self.db_manager.get_statistics()
196
+
197
+ if stats:
198
+ return {
199
+ "success": True,
200
+ "statistics": stats
201
+ }
202
+ else:
203
+ return {
204
+ "success": False,
205
+ "error": "Failed to retrieve statistics"
206
+ }
207
+
208
+ def _create_support_ticket(self, title: str, description: str, priority: str = "medium") -> Dict[str, Any]:
209
+ """Create a support ticket"""
210
+ self.logger.info(f"Creating support ticket: {title}")
211
+
212
+ if self.github_support:
213
+ result = self.github_support.create_issue(
214
+ title=title,
215
+ body=description,
216
+ labels=["support", f"priority-{priority}"]
217
+ )
218
+ return result
219
+ else:
220
+ # Mock support ticket if GitHub not configured
221
+ return {
222
+ "success": True,
223
+ "message": "Support ticket created (mock mode - GitHub not configured)",
224
+ "ticket_id": "MOCK-001",
225
+ "title": title,
226
+ "priority": priority
227
+ }
228
+
229
+ def _generate_chart(
230
+ self,
231
+ sql_query: str,
232
+ chart_type: str,
233
+ title: str,
234
+ x_label: Optional[str] = None,
235
+ y_label: Optional[str] = None
236
+ ) -> Dict[str, Any]:
237
+ """Execute query and return chart configuration"""
238
+ self.logger.info(f"Generating chart: {chart_type} - {title}")
239
+
240
+ # Execute query first
241
+ query_result = self._query_database(sql_query)
242
+
243
+ if query_result['success']:
244
+ data = query_result['data']
245
+ if not data:
246
+ return {
247
+ "success": False,
248
+ "error": "Query returned no data for the chart."
249
+ }
250
+
251
+ # Use provided labels or infer from data
252
+ cols = list(data[0].keys())
253
+ x_axis = x_label if x_label in cols else cols[0]
254
+ y_axis = y_label if y_label in cols else (cols[1] if len(cols) > 1 else cols[0])
255
+
256
+ return {
257
+ "success": True,
258
+ "is_chart": True,
259
+ "chart_config": {
260
+ "type": chart_type,
261
+ "title": title,
262
+ "x_label": x_axis,
263
+ "y_label": y_axis,
264
+ "data": data
265
+ },
266
+ "message": f"Successfully generated {chart_type} chart: {title}"
267
+ }
268
+ else:
269
+ return query_result
src/config.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Configuration management for Data Insights App
3
+ """
4
+ import os
5
+ from pathlib import Path
6
+ from dotenv import load_dotenv
7
+
8
+ # Load environment variables from .env file
9
+ load_dotenv()
10
+
11
+ # Base paths
12
+ BASE_DIR = Path(__file__).parent
13
+ DATA_DIR = BASE_DIR / "data"
14
+ DATABASE_DIR = BASE_DIR / "database"
15
+
16
+ # API Configuration
17
+ OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
18
+ OPENAI_MODEL = "gpt-4-turbo-preview" # Model with function calling support
19
+
20
+ # GitHub Configuration (Optional)
21
+ GITHUB_TOKEN = os.getenv("GITHUB_TOKEN", "")
22
+ GITHUB_REPO = os.getenv("GITHUB_REPO", "")
23
+ GITHUB_FOLDER = os.getenv("GITHUB_FOLDER", "") # Optional folder/project prefix for issues
24
+
25
+ # Database Configuration
26
+ DATABASE_PATH = os.getenv("DATABASE_PATH", str(DATABASE_DIR / "car_prices.db"))
27
+ CSV_DATA_PATH = DATA_DIR / "car_prices.csv"
28
+
29
+ # Application Settings
30
+ LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO")
31
+ MAX_LOG_ENTRIES = 100 # Maximum number of log entries to keep in sidebar
32
+
33
+ # Sample queries for user guidance
34
+ SAMPLE_QUERIES = [
35
+ "What's the average selling price of BMW cars?",
36
+ "Show me the top 5 most expensive car models",
37
+ "How many cars were sold in California?",
38
+ "What's the price difference between automatic and manual transmission?",
39
+ "Show statistics about cars in excellent condition",
40
+ "Which seller has the most cars in the database?",
41
+ ]
42
+
43
+ # Safety settings
44
+ ALLOWED_SQL_OPERATIONS = ["SELECT"]
45
+ DANGEROUS_SQL_KEYWORDS = [
46
+ "DELETE", "DROP", "TRUNCATE", "ALTER",
47
+ "UPDATE", "INSERT", "CREATE", "REPLACE"
48
+ ]
src/data/car_prices.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32ba3ce51664e6a12c0c927ed193b41e3c4743fdf18bc0317389892aed27f556
3
+ size 88047552
src/database/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """Database package initialization"""
2
+ from database.db_manager import DatabaseManager
3
+ from database.safety_validator import SafetyValidator
4
+
5
+ __all__ = ['DatabaseManager', 'SafetyValidator']
src/database/db_manager.py ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Database Manager - Handles SQLite database operations and CSV data ingestion
3
+ """
4
+ import sqlite3
5
+ import pandas as pd
6
+ from pathlib import Path
7
+ from typing import List, Dict, Any, Optional
8
+ import logging
9
+
10
+ from config import DATABASE_PATH, CSV_DATA_PATH
11
+ from database.safety_validator import SafetyValidator
12
+
13
+
14
+ class DatabaseManager:
15
+ """Manages database connections and operations"""
16
+
17
+ def __init__(self, db_path: str = DATABASE_PATH):
18
+ self.db_path = db_path
19
+ self.validator = SafetyValidator()
20
+ self.logger = logging.getLogger(__name__)
21
+
22
+ # Ensure database directory exists
23
+ Path(db_path).parent.mkdir(parents=True, exist_ok=True)
24
+
25
+ # Initialize database
26
+ self._initialize_database()
27
+
28
+ def _initialize_database(self):
29
+ """Initialize database and load data from CSV if needed"""
30
+ db_exists = Path(self.db_path).exists()
31
+
32
+ if not db_exists:
33
+ self.logger.info("Database not found. Creating new database from CSV...")
34
+ self._load_csv_to_database()
35
+ else:
36
+ self.logger.info(f"Database found at {self.db_path}")
37
+
38
+ def _load_csv_to_database(self):
39
+ """Load car_prices.csv into SQLite database"""
40
+ try:
41
+ # Check if CSV exists
42
+ if not CSV_DATA_PATH.exists():
43
+ raise FileNotFoundError(f"CSV file not found: {CSV_DATA_PATH}")
44
+
45
+ self.logger.info(f"Loading data from {CSV_DATA_PATH}...")
46
+
47
+ # Read CSV with pandas
48
+ df = pd.read_csv(CSV_DATA_PATH)
49
+
50
+ # Clean column names (remove spaces, lowercase)
51
+ df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')
52
+
53
+ # Connect to database
54
+ conn = sqlite3.connect(self.db_path)
55
+
56
+ # Write to SQLite
57
+ df.to_sql('cars', conn, if_exists='replace', index=False)
58
+
59
+ # Create indexes for common queries
60
+ cursor = conn.cursor()
61
+ cursor.execute("CREATE INDEX IF NOT EXISTS idx_make ON cars(make)")
62
+ cursor.execute("CREATE INDEX IF NOT EXISTS idx_model ON cars(model)")
63
+ cursor.execute("CREATE INDEX IF NOT EXISTS idx_year ON cars(year)")
64
+ cursor.execute("CREATE INDEX IF NOT EXISTS idx_state ON cars(state)")
65
+
66
+ conn.commit()
67
+ conn.close()
68
+
69
+ self.logger.info(f"Successfully loaded {len(df)} records into database")
70
+
71
+ except Exception as e:
72
+ self.logger.error(f"Error loading CSV to database: {e}")
73
+ raise
74
+
75
+ def execute_query(self, query: str, params: Optional[tuple] = None) -> Dict[str, Any]:
76
+ """
77
+ Execute a SQL query with safety validation
78
+
79
+ Args:
80
+ query: SQL query to execute
81
+ params: Optional parameters for parameterized queries
82
+
83
+ Returns:
84
+ Dictionary with 'success', 'data', 'error', and 'row_count' keys
85
+ """
86
+ # Validate query safety
87
+ is_valid, error_msg = self.validator.validate_query(query)
88
+
89
+ if not is_valid:
90
+ self.logger.warning(f"Blocked unsafe query: {query}")
91
+ return {
92
+ 'success': False,
93
+ 'data': None,
94
+ 'error': error_msg,
95
+ 'row_count': 0
96
+ }
97
+
98
+ try:
99
+ conn = sqlite3.connect(self.db_path)
100
+ conn.row_factory = sqlite3.Row # Enable column access by name
101
+ cursor = conn.cursor()
102
+
103
+ # Execute query
104
+ if params:
105
+ cursor.execute(query, params)
106
+ else:
107
+ cursor.execute(query)
108
+
109
+ # Fetch results
110
+ rows = cursor.fetchall()
111
+
112
+ # Convert to list of dictionaries
113
+ data = [dict(row) for row in rows]
114
+
115
+ conn.close()
116
+
117
+ self.logger.info(f"Query executed successfully. Returned {len(data)} rows.")
118
+
119
+ return {
120
+ 'success': True,
121
+ 'data': data,
122
+ 'error': None,
123
+ 'row_count': len(data)
124
+ }
125
+
126
+ except Exception as e:
127
+ error_msg = f"Database error: {str(e)}"
128
+ self.logger.error(error_msg)
129
+ return {
130
+ 'success': False,
131
+ 'data': None,
132
+ 'error': error_msg,
133
+ 'row_count': 0
134
+ }
135
+
136
+ def get_statistics(self) -> Dict[str, Any]:
137
+ """Get aggregated statistics about the database"""
138
+ try:
139
+ stats = {}
140
+
141
+ conn = sqlite3.connect(self.db_path)
142
+ cursor = conn.cursor()
143
+
144
+ # Total records
145
+ cursor.execute("SELECT COUNT(*) FROM cars")
146
+ stats['total_records'] = cursor.fetchone()[0]
147
+
148
+ # Price statistics
149
+ cursor.execute("""
150
+ SELECT
151
+ AVG(sellingprice) as avg_price,
152
+ MIN(sellingprice) as min_price,
153
+ MAX(sellingprice) as max_price
154
+ FROM cars
155
+ WHERE sellingprice IS NOT NULL AND sellingprice > 0
156
+ """)
157
+ price_stats = cursor.fetchone()
158
+ stats['avg_price'] = round(price_stats[0], 2) if price_stats[0] else 0
159
+ stats['min_price'] = price_stats[1] if price_stats[1] else 0
160
+ stats['max_price'] = price_stats[2] if price_stats[2] else 0
161
+
162
+ # Top 5 makes by count
163
+ cursor.execute("""
164
+ SELECT make, COUNT(*) as count
165
+ FROM cars
166
+ GROUP BY make
167
+ ORDER BY count DESC
168
+ LIMIT 5
169
+ """)
170
+ stats['top_makes'] = [
171
+ {'make': row[0], 'count': row[1]}
172
+ for row in cursor.fetchall()
173
+ ]
174
+
175
+ # Top 5 models by count
176
+ cursor.execute("""
177
+ SELECT model, COUNT(*) as count
178
+ FROM cars
179
+ GROUP BY model
180
+ ORDER BY count DESC
181
+ LIMIT 5
182
+ """)
183
+ stats['top_models'] = [
184
+ {'model': row[0], 'count': row[1]}
185
+ for row in cursor.fetchall()
186
+ ]
187
+
188
+ # Condition distribution
189
+ cursor.execute("""
190
+ SELECT condition, COUNT(*) as count
191
+ FROM cars
192
+ WHERE condition IS NOT NULL
193
+ GROUP BY condition
194
+ ORDER BY count DESC
195
+ """)
196
+ stats['condition_distribution'] = [
197
+ {'condition': row[0], 'count': row[1]}
198
+ for row in cursor.fetchall()
199
+ ]
200
+
201
+ # Year range
202
+ cursor.execute("SELECT MIN(year), MAX(year) FROM cars")
203
+ year_range = cursor.fetchone()
204
+ stats['year_range'] = {
205
+ 'min': year_range[0],
206
+ 'max': year_range[1]
207
+ }
208
+
209
+ conn.close()
210
+
211
+ self.logger.info("Statistics retrieved successfully")
212
+ return stats
213
+
214
+ except Exception as e:
215
+ self.logger.error(f"Error getting statistics: {e}")
216
+ return {}
217
+
218
+ def get_table_info(self) -> Dict[str, Any]:
219
+ """Get information about the database schema"""
220
+ try:
221
+ conn = sqlite3.connect(self.db_path)
222
+ cursor = conn.cursor()
223
+
224
+ # Get column information
225
+ cursor.execute("PRAGMA table_info(cars)")
226
+ columns = [
227
+ {'name': row[1], 'type': row[2]}
228
+ for row in cursor.fetchall()
229
+ ]
230
+
231
+ conn.close()
232
+
233
+ return {
234
+ 'table_name': 'cars',
235
+ 'columns': columns
236
+ }
237
+
238
+ except Exception as e:
239
+ self.logger.error(f"Error getting table info: {e}")
240
+ return {}
src/database/safety_validator.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ SQL Safety Validator - Prevents dangerous database operations
3
+ """
4
+ import re
5
+ from typing import Tuple
6
+ from config import ALLOWED_SQL_OPERATIONS, DANGEROUS_SQL_KEYWORDS
7
+
8
+
9
+ class SafetyValidator:
10
+ """Validates SQL queries to prevent dangerous operations"""
11
+
12
+ @staticmethod
13
+ def validate_query(query: str) -> Tuple[bool, str]:
14
+ """
15
+ Validate if a SQL query is safe to execute
16
+
17
+ Args:
18
+ query: SQL query string to validate
19
+
20
+ Returns:
21
+ Tuple of (is_valid, error_message)
22
+ - is_valid: True if query is safe, False otherwise
23
+ - error_message: Empty string if valid, error description if invalid
24
+ """
25
+ if not query or not query.strip():
26
+ return False, "Empty query provided"
27
+
28
+ # Normalize query for checking
29
+ normalized_query = query.strip().upper()
30
+
31
+ # Check for dangerous keywords
32
+ for keyword in DANGEROUS_SQL_KEYWORDS:
33
+ # Use word boundaries to avoid false positives
34
+ pattern = r'\b' + keyword + r'\b'
35
+ if re.search(pattern, normalized_query):
36
+ return False, (
37
+ f"🚫 BLOCKED: Query contains dangerous operation '{keyword}'. "
38
+ f"Only SELECT queries are allowed for safety reasons."
39
+ )
40
+
41
+ # Ensure query starts with SELECT
42
+ if not normalized_query.startswith('SELECT'):
43
+ return False, (
44
+ "🚫 BLOCKED: Only SELECT queries are allowed. "
45
+ "This application is read-only to prevent accidental data modification."
46
+ )
47
+
48
+ # Additional checks for SQL injection patterns
49
+ suspicious_patterns = [
50
+ r';.*?(DELETE|DROP|UPDATE|INSERT)', # Multiple statements
51
+ r'--', # SQL comments (potential injection)
52
+ r'/\*.*?\*/', # Block comments
53
+ ]
54
+
55
+ for pattern in suspicious_patterns:
56
+ if re.search(pattern, normalized_query, re.IGNORECASE | re.DOTALL):
57
+ return False, (
58
+ "🚫 BLOCKED: Query contains suspicious patterns that may indicate "
59
+ "SQL injection or multiple statements. Please use simple SELECT queries."
60
+ )
61
+
62
+ return True, ""
63
+
64
+ @staticmethod
65
+ def get_safety_message() -> str:
66
+ """Get a message explaining safety restrictions"""
67
+ return (
68
+ "🛡️ **Safety Features Active**\n\n"
69
+ f"✅ Allowed operations: {', '.join(ALLOWED_SQL_OPERATIONS)}\n"
70
+ f"❌ Blocked operations: {', '.join(DANGEROUS_SQL_KEYWORDS)}\n\n"
71
+ "This ensures your data remains safe from accidental modifications."
72
+ )
src/streamlit_app.py CHANGED
@@ -1,40 +1,369 @@
1
- import altair as alt
2
- import numpy as np
3
- import pandas as pd
4
  import streamlit as st
 
 
5
 
6
- """
7
- # Welcome to Streamlit!
 
 
 
 
 
 
 
 
 
 
8
 
9
- Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
10
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
11
- forums](https://discuss.streamlit.io).
12
 
13
- In the meantime, below is an example of what you can do with just a few lines of code:
14
- """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
17
- num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
18
-
19
- indices = np.linspace(0, 1, num_points)
20
- theta = 2 * np.pi * num_turns * indices
21
- radius = indices
22
-
23
- x = radius * np.cos(theta)
24
- y = radius * np.sin(theta)
25
-
26
- df = pd.DataFrame({
27
- "x": x,
28
- "y": y,
29
- "idx": indices,
30
- "rand": np.random.randn(num_points),
31
- })
32
-
33
- st.altair_chart(alt.Chart(df, height=700, width=700)
34
- .mark_point(filled=True)
35
- .encode(
36
- x=alt.X("x", axis=None),
37
- y=alt.Y("y", axis=None),
38
- color=alt.Color("idx", legend=None, scale=alt.Scale()),
39
- size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
40
- ))
 
1
+ """
2
+ Data Insights App - Main Streamlit Application
3
+ """
4
  import streamlit as st
5
+ import pandas as pd
6
+ from datetime import datetime
7
 
8
+ from config import OPENAI_API_KEY, SAMPLE_QUERIES, MAX_LOG_ENTRIES
9
+ from database import DatabaseManager
10
+ from agent import AIAgent, AgentTools
11
+ from support import GitHubSupport
12
+ from utils import setup_logging, get_logs, clear_logs
13
+ from ui import (
14
+ create_price_distribution_chart,
15
+ create_top_makes_chart,
16
+ create_condition_pie_chart,
17
+ create_price_by_make_chart,
18
+ create_dynamic_chart
19
+ )
20
 
 
 
 
21
 
22
+ # Page configuration
23
+ st.set_page_config(
24
+ page_title="Data Insights App",
25
+ page_icon="🚗",
26
+ layout="wide",
27
+ initial_sidebar_state="expanded"
28
+ )
29
+
30
+ # Custom CSS for better styling
31
+ st.markdown("""
32
+ <style>
33
+ .main-header {
34
+ font-size: 2.5rem;
35
+ font-weight: bold;
36
+ color: #1f77b4;
37
+ margin-bottom: 0.5rem;
38
+ }
39
+ .sub-header {
40
+ font-size: 1.2rem;
41
+ color: #666;
42
+ margin-bottom: 2rem;
43
+ }
44
+ .stat-card {
45
+ background-color: #f0f2f6;
46
+ padding: 1rem;
47
+ border-radius: 0.5rem;
48
+ margin-bottom: 1rem;
49
+ }
50
+ .stat-value {
51
+ font-size: 1.8rem;
52
+ font-weight: bold;
53
+ color: #1f77b4;
54
+ }
55
+ .stat-label {
56
+ font-size: 0.9rem;
57
+ color: #666;
58
+ }
59
+ .log-entry {
60
+ font-family: monospace;
61
+ font-size: 0.85rem;
62
+ padding: 0.3rem;
63
+ margin: 0.2rem 0;
64
+ border-left: 3px solid #ddd;
65
+ padding-left: 0.5rem;
66
+ }
67
+ .log-info {
68
+ border-left-color: #2ca02c;
69
+ }
70
+ .log-warning {
71
+ border-left-color: #ff7f0e;
72
+ }
73
+ .log-error {
74
+ border-left-color: #d62728;
75
+ }
76
+ </style>
77
+ """, unsafe_allow_html=True)
78
+
79
+
80
+ def initialize_session_state():
81
+ """Initialize Streamlit session state variables"""
82
+ if 'initialized' not in st.session_state:
83
+ # Set up logging
84
+ setup_logging(level="INFO", max_entries=MAX_LOG_ENTRIES)
85
+
86
+ # Initialize database
87
+ st.session_state.db_manager = DatabaseManager()
88
+
89
+ # Initialize GitHub support
90
+ st.session_state.github_support = GitHubSupport()
91
+
92
+ # Initialize agent tools and AI agent
93
+ st.session_state.tools = AgentTools(
94
+ db_manager=st.session_state.db_manager,
95
+ github_support=st.session_state.github_support
96
+ )
97
+ st.session_state.agent = AIAgent(tools=st.session_state.tools)
98
+
99
+ # Chat history
100
+ st.session_state.messages = []
101
+
102
+ # Statistics cache
103
+ st.session_state.stats = None
104
+ st.session_state.stats_loaded = False
105
+
106
+ st.session_state.initialized = True
107
+
108
+
109
+ def load_statistics():
110
+ """Load database statistics (cached)"""
111
+ if not st.session_state.stats_loaded:
112
+ st.session_state.stats = st.session_state.db_manager.get_statistics()
113
+ st.session_state.stats_loaded = True
114
+ return st.session_state.stats
115
+
116
+
117
+ def render_sidebar():
118
+ """Render sidebar with logs, stats, and charts"""
119
+ with st.sidebar:
120
+ st.markdown("### 🎛️ Control Panel")
121
+
122
+ # API Key check
123
+ if not OPENAI_API_KEY:
124
+ st.error("⚠️ OPENAI_API_KEY not set! Please configure your .env file.")
125
+ st.stop()
126
+ else:
127
+ st.success("✅ OpenAI API Connected")
128
+
129
+ st.divider()
130
+
131
+ # Database Statistics
132
+ st.markdown("### 📊 Database Overview")
133
+ stats = load_statistics()
134
+
135
+ if stats:
136
+ col1, col2 = st.columns(2)
137
+ with col1:
138
+ st.markdown(f"""
139
+ <div class="stat-card">
140
+ <div class="stat-value">{stats.get('total_records', 0):,}</div>
141
+ <div class="stat-label">Total Cars</div>
142
+ </div>
143
+ """, unsafe_allow_html=True)
144
+
145
+ with col2:
146
+ avg_price = stats.get('avg_price', 0)
147
+ st.markdown(f"""
148
+ <div class="stat-card">
149
+ <div class="stat-value">${avg_price:,.0f}</div>
150
+ <div class="stat-label">Avg Price</div>
151
+ </div>
152
+ """, unsafe_allow_html=True)
153
+
154
+ # Price range
155
+ min_price = stats.get('min_price', 0)
156
+ max_price = stats.get('max_price', 0)
157
+ st.markdown(f"**Price Range:** ${min_price:,} - ${max_price:,}")
158
+
159
+ # Year range
160
+ year_range = stats.get('year_range', {})
161
+ st.markdown(f"**Year Range:** {year_range.get('min', 'N/A')} - {year_range.get('max', 'N/A')}")
162
+
163
+ st.divider()
164
+
165
+ # Charts
166
+ st.markdown("### 📈 Insights")
167
+
168
+ if stats:
169
+ # Top makes chart
170
+ with st.expander("🏆 Top Makes", expanded=False):
171
+ fig = create_top_makes_chart(stats)
172
+ st.plotly_chart(fig, use_container_width=True)
173
+
174
+ # Condition distribution
175
+ with st.expander("🔍 Condition Distribution", expanded=False):
176
+ fig = create_condition_pie_chart(stats)
177
+ st.plotly_chart(fig, use_container_width=True)
178
+
179
+ # Average price by make
180
+ with st.expander("💰 Avg Price by Make", expanded=False):
181
+ fig = create_price_by_make_chart(st.session_state.db_manager)
182
+ st.plotly_chart(fig, use_container_width=True)
183
+
184
+ st.divider()
185
+
186
+ # Sample Queries
187
+ st.markdown("### 💡 Sample Queries")
188
+ for i, query in enumerate(SAMPLE_QUERIES[:4]):
189
+ if st.button(f"📝 {query[:40]}...", key=f"sample_{i}", use_container_width=True):
190
+ st.session_state.sample_query = query
191
+ st.rerun()
192
+
193
+ st.divider()
194
+
195
+ # Console Logs
196
+ st.markdown("### 🖥️ Console Logs")
197
+
198
+ col1, col2 = st.columns([3, 1])
199
+ with col2:
200
+ if st.button("🗑️ Clear", use_container_width=True):
201
+ clear_logs()
202
+ st.rerun()
203
+
204
+ # Display logs
205
+ logs = get_logs()
206
+
207
+ if logs:
208
+ log_container = st.container(height=300)
209
+ with log_container:
210
+ for log in reversed(logs[-50:]): # Show last 50 logs
211
+ level = log['level'].lower()
212
+ css_class = f"log-{level}"
213
+
214
+ st.markdown(f"""
215
+ <div class="log-entry {css_class}">
216
+ <strong>[{log['timestamp']}]</strong> {log['level']}: {log['message']}
217
+ </div>
218
+ """, unsafe_allow_html=True)
219
+ else:
220
+ st.info("No logs yet. Start chatting to see activity!")
221
+
222
+
223
+ def render_chat_interface():
224
+ """Render main chat interface"""
225
+ # Header
226
+ st.markdown('<div class="main-header">🚗 Car Data Insights Assistant</div>', unsafe_allow_html=True)
227
+ st.markdown('<div class="sub-header">Ask questions about car auction data powered by AI</div>', unsafe_allow_html=True)
228
+
229
+ # Display chat messages
230
+ for message in st.session_state.messages:
231
+ with st.chat_message(message["role"]):
232
+ st.markdown(message["content"])
233
+ if message.get("chart"):
234
+ chart_config = message["chart"]
235
+ fig = create_dynamic_chart(
236
+ data=chart_config['data'],
237
+ chart_type=chart_config['type'],
238
+ title=chart_config['title'],
239
+ x_label=chart_config['x_label'],
240
+ y_label=chart_config['y_label']
241
+ )
242
+ st.plotly_chart(fig, use_container_width=True)
243
+
244
+ # Handle sample query selection
245
+ if 'sample_query' in st.session_state:
246
+ user_input = st.session_state.sample_query
247
+ del st.session_state.sample_query
248
+ else:
249
+ user_input = st.chat_input("Ask me anything about the car data...")
250
+ # Process user input
251
+ if user_input:
252
+ # Add user message to chat
253
+ st.session_state.messages.append({"role": "user", "content": user_input})
254
+
255
+ with st.chat_message("user"):
256
+ st.markdown(user_input)
257
+
258
+ # Get AI response
259
+ with st.chat_message("assistant"):
260
+ with st.spinner("Thinking..."):
261
+ response_data = st.session_state.agent.chat(user_input)
262
+ content = response_data["content"]
263
+ chart = response_data.get("chart")
264
+
265
+ st.markdown(content)
266
+ if chart:
267
+ fig = create_dynamic_chart(
268
+ data=chart['data'],
269
+ chart_type=chart['type'],
270
+ title=chart['title'],
271
+ x_label=chart['x_label'],
272
+ y_label=chart['y_label']
273
+ )
274
+ st.plotly_chart(fig, use_container_width=True)
275
+
276
+ # Add assistant response to chat
277
+ st.session_state.messages.append({
278
+ "role": "assistant",
279
+ "content": content,
280
+ "chart": chart
281
+ })
282
+
283
+ st.rerun()
284
+
285
+
286
+ def render_support_section():
287
+ """Render support ticket creation section"""
288
+ st.divider()
289
+
290
+ with st.expander("🎫 Need Human Support?", expanded=False):
291
+ st.markdown("""
292
+ If the AI assistant can't help you, create a support ticket to reach a human expert.
293
+ Your conversation history will be included automatically.
294
+ """)
295
+
296
+ col1, col2 = st.columns([3, 1])
297
+
298
+ with col1:
299
+ ticket_title = st.text_input(
300
+ "Issue Summary",
301
+ placeholder="Brief description of your issue..."
302
+ )
303
+
304
+ with col2:
305
+ priority = st.selectbox("Priority", ["low", "medium", "high"])
306
+
307
+ ticket_description = st.text_area(
308
+ "Details",
309
+ placeholder="Provide more details about your issue...",
310
+ height=100
311
+ )
312
+
313
+ if st.button("📤 Create Support Ticket", type="primary"):
314
+ if not ticket_title:
315
+ st.error("Please provide a ticket title")
316
+ else:
317
+ # Get conversation context
318
+ context = st.session_state.agent.get_conversation_context()
319
+
320
+ # Create full description with context
321
+ full_description = f"{ticket_description}\n\n---\n\n**Conversation History:**\n\n{context}"
322
+
323
+ # Create ticket
324
+ result = st.session_state.tools.execute_tool(
325
+ "create_support_ticket",
326
+ {
327
+ "title": ticket_title,
328
+ "description": full_description,
329
+ "priority": priority
330
+ }
331
+ )
332
+
333
+ if result.get('success'):
334
+ st.success(f"✅ {result.get('message')}")
335
+ if 'issue_url' in result:
336
+ st.markdown(f"**Issue URL:** {result['issue_url']}")
337
+ elif 'ticket_id' in result:
338
+ st.markdown(f"**Ticket ID:** {result['ticket_id']}")
339
+ else:
340
+ st.error(f"❌ {result.get('error')}")
341
+
342
+
343
+ def main():
344
+ """Main application entry point"""
345
+ # Initialize
346
+ initialize_session_state()
347
+
348
+ # Render sidebar
349
+ render_sidebar()
350
+
351
+ # Render main chat interface
352
+ render_chat_interface()
353
+
354
+ # Render support section
355
+ render_support_section()
356
+
357
+ # Footer
358
+ st.divider()
359
+ st.markdown("""
360
+ <div style="text-align: center; color: #666; font-size: 0.9rem;">
361
+ 🛡️ <strong>Safety Features Active:</strong> Only SELECT queries allowed |
362
+ All dangerous operations blocked |
363
+ Data remains secure
364
+ </div>
365
+ """, unsafe_allow_html=True)
366
+
367
 
368
+ if __name__ == "__main__":
369
+ main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/support/__init__.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ """Support package initialization"""
2
+ from support.github_integration import GitHubSupport
3
+
4
+ __all__ = ['GitHubSupport']
src/support/github_integration.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GitHub Integration for Support Tickets
3
+ """
4
+ import os
5
+ import logging
6
+ from typing import Dict, Any, Optional, List
7
+ from github import Github, GithubException, Auth
8
+
9
+ from config import GITHUB_TOKEN, GITHUB_REPO
10
+
11
+
12
+ class GitHubSupport:
13
+ """Handles GitHub issue creation for support tickets"""
14
+
15
+ def __init__(self, token: Optional[str] = None, repo_name: Optional[str] = None, folder: Optional[str] = None):
16
+ self.token = token or GITHUB_TOKEN
17
+ self.repo_name = repo_name or GITHUB_REPO
18
+ self.folder = folder or os.getenv("GITHUB_FOLDER", "")
19
+ self.logger = logging.getLogger(__name__)
20
+
21
+ self.github = None
22
+ self.repo = None
23
+
24
+ if self.token and self.repo_name:
25
+ try:
26
+ # Use Auth.Token for authentication
27
+ auth = Auth.Token(self.token)
28
+ self.github = Github(auth=auth)
29
+ self.repo = self.github.get_repo(self.repo_name)
30
+ self.logger.info(f"GitHub integration initialized for repo: {self.repo_name}")
31
+ except Exception as e:
32
+ self.logger.warning(f"GitHub initialization failed: {e}")
33
+
34
+ def is_configured(self) -> bool:
35
+ """Check if GitHub integration is properly configured"""
36
+ return self.github is not None and self.repo is not None
37
+
38
+ def _ensure_label_exists(self, label_name: str, color: str = "0075ca"):
39
+ """Ensures a label exists in the repository, creates it if it doesn't"""
40
+ if not self.repo:
41
+ return
42
+ try:
43
+ self.repo.get_label(label_name)
44
+ except GithubException:
45
+ try:
46
+ self.repo.create_label(name=label_name, color=color)
47
+ self.logger.info(f"Created new label: {label_name}")
48
+ except Exception as e:
49
+ self.logger.warning(f"Could not create label {label_name}: {e}")
50
+
51
+ def create_issue(
52
+ self,
53
+ title: str,
54
+ body: str,
55
+ labels: Optional[List[str]] = None
56
+ ) -> Dict[str, Any]:
57
+ """
58
+ Create a GitHub issue as a support ticket
59
+
60
+ Args:
61
+ title: Issue title
62
+ body: Issue description
63
+ labels: Optional list of labels to add
64
+
65
+ Returns:
66
+ Dictionary with success status and issue details
67
+ """
68
+ if not self.is_configured():
69
+ self.logger.warning("GitHub not configured, using mock mode")
70
+ return self._create_mock_ticket(title, body, labels)
71
+
72
+ try:
73
+ issue_labels = []
74
+
75
+ # 1. Handle folder label and title decoration
76
+ if self.folder:
77
+ folder_label = self.folder.lower().replace("/", "-").replace(" ", "-")
78
+ self._ensure_label_exists(folder_label, "0075ca") # Blue
79
+ issue_labels.append(folder_label)
80
+
81
+ # Decorate title
82
+ prefixed_title = f"[{self.folder}] {title}"
83
+ # Add details to body
84
+ full_body = f"**Project:** {self.folder}\n\n**Description:**\n{body}"
85
+ else:
86
+ prefixed_title = title
87
+ full_body = body
88
+
89
+ # 2. Add customer-support label (ensure it exists)
90
+ self._ensure_label_exists("customer-support", "d73a4a") # Reddish
91
+ issue_labels.append("customer-support")
92
+
93
+ # 3. Add any additional labels provided
94
+ if labels:
95
+ for label in labels:
96
+ if label not in issue_labels:
97
+ # Ensure these labels exist too
98
+ self._ensure_label_exists(label, "e6e6e6") # Light gray
99
+ issue_labels.append(label)
100
+
101
+ # Create the issue
102
+ issue = self.repo.create_issue(
103
+ title=prefixed_title,
104
+ body=full_body,
105
+ labels=issue_labels
106
+ )
107
+
108
+ self.logger.info(f"Created GitHub issue #{issue.number}: {prefixed_title}")
109
+
110
+ return {
111
+ "success": True,
112
+ "message": f"Ticket created successfully! Ticket ID: #{issue.number}",
113
+ "issue_number": issue.number,
114
+ "issue_url": issue.html_url,
115
+ "title": prefixed_title
116
+ }
117
+
118
+ except GithubException as e:
119
+ # Handle Validation Failed with more detail
120
+ error_data = getattr(e, 'data', {})
121
+ error_msg = error_data.get('message', str(e))
122
+ errors = error_data.get('errors', [])
123
+
124
+ full_error = f"GitHub API error: {error_msg}"
125
+ if errors:
126
+ full_error += f" - Details: {str(errors)}"
127
+
128
+ self.logger.error(full_error)
129
+ return {
130
+ "success": False,
131
+ "error": full_error
132
+ }
133
+ except Exception as e:
134
+ error_msg = f"Error creating issue: {str(e)}"
135
+ self.logger.error(error_msg)
136
+ return {
137
+ "success": False,
138
+ "error": error_msg
139
+ }
140
+ except Exception as e:
141
+ error_msg = f"Error creating issue: {str(e)}"
142
+ self.logger.error(error_msg)
143
+ return {
144
+ "success": False,
145
+ "error": error_msg
146
+ }
147
+
148
+ def _create_mock_ticket(
149
+ self,
150
+ title: str,
151
+ body: str,
152
+ labels: Optional[List[str]] = None
153
+ ) -> Dict[str, Any]:
154
+ """Create a mock support ticket when GitHub is not configured"""
155
+ import random
156
+
157
+ mock_id = f"MOCK-{random.randint(1000, 9999)}"
158
+
159
+ self.logger.info(f"Created mock support ticket: {mock_id}")
160
+
161
+ return {
162
+ "success": True,
163
+ "message": "Support ticket created (Mock Mode - GitHub not configured)",
164
+ "ticket_id": mock_id,
165
+ "title": title,
166
+ "labels": labels or [],
167
+ "note": "To enable real GitHub integration, set GITHUB_TOKEN and GITHUB_REPO in your .env file"
168
+ }
src/tests/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Tests package initialization"""
src/tests/test_safety_validator.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tests for Safety Validator
3
+ """
4
+ import pytest
5
+ from database.safety_validator import SafetyValidator
6
+
7
+
8
+ class TestSafetyValidator:
9
+ """Test cases for SQL safety validation"""
10
+
11
+ def setup_method(self):
12
+ """Set up test fixtures"""
13
+ self.validator = SafetyValidator()
14
+
15
+ def test_valid_select_query(self):
16
+ """Test that valid SELECT queries pass validation"""
17
+ query = "SELECT * FROM cars WHERE make = 'BMW'"
18
+ is_valid, error = self.validator.validate_query(query)
19
+ assert is_valid is True
20
+ assert error == ""
21
+
22
+ def test_delete_query_blocked(self):
23
+ """Test that DELETE queries are blocked"""
24
+ query = "DELETE FROM cars WHERE id = 1"
25
+ is_valid, error = self.validator.validate_query(query)
26
+ assert is_valid is False
27
+ assert "DELETE" in error
28
+
29
+ def test_drop_query_blocked(self):
30
+ """Test that DROP queries are blocked"""
31
+ query = "DROP TABLE cars"
32
+ is_valid, error = self.validator.validate_query(query)
33
+ assert is_valid is False
34
+ assert "DROP" in error
35
+
36
+ def test_update_query_blocked(self):
37
+ """Test that UPDATE queries are blocked"""
38
+ query = "UPDATE cars SET price = 0"
39
+ is_valid, error = self.validator.validate_query(query)
40
+ assert is_valid is False
41
+ assert "UPDATE" in error
42
+
43
+ def test_insert_query_blocked(self):
44
+ """Test that INSERT queries are blocked"""
45
+ query = "INSERT INTO cars VALUES (1, 'test')"
46
+ is_valid, error = self.validator.validate_query(query)
47
+ assert is_valid is False
48
+ assert "INSERT" in error
49
+
50
+ def test_truncate_query_blocked(self):
51
+ """Test that TRUNCATE queries are blocked"""
52
+ query = "TRUNCATE TABLE cars"
53
+ is_valid, error = self.validator.validate_query(query)
54
+ assert is_valid is False
55
+ assert "TRUNCATE" in error
56
+
57
+ def test_alter_query_blocked(self):
58
+ """Test that ALTER queries are blocked"""
59
+ query = "ALTER TABLE cars ADD COLUMN test VARCHAR(50)"
60
+ is_valid, error = self.validator.validate_query(query)
61
+ assert is_valid is False
62
+ assert "ALTER" in error
63
+
64
+ def test_empty_query(self):
65
+ """Test that empty queries are rejected"""
66
+ query = ""
67
+ is_valid, error = self.validator.validate_query(query)
68
+ assert is_valid is False
69
+ assert "Empty query" in error
70
+
71
+ def test_non_select_query(self):
72
+ """Test that non-SELECT queries are rejected"""
73
+ query = "SHOW TABLES"
74
+ is_valid, error = self.validator.validate_query(query)
75
+ assert is_valid is False
76
+ assert "Only SELECT" in error
77
+
78
+ def test_sql_injection_attempt(self):
79
+ """Test that SQL injection patterns are detected"""
80
+ query = "SELECT * FROM cars; DELETE FROM cars"
81
+ is_valid, error = self.validator.validate_query(query)
82
+ assert is_valid is False
83
+
84
+ def test_complex_select_query(self):
85
+ """Test that complex SELECT queries pass"""
86
+ query = """
87
+ SELECT make, model, AVG(sellingprice) as avg_price
88
+ FROM cars
89
+ WHERE year > 2010
90
+ GROUP BY make, model
91
+ ORDER BY avg_price DESC
92
+ LIMIT 10
93
+ """
94
+ is_valid, error = self.validator.validate_query(query)
95
+ assert is_valid is True
96
+ assert error == ""
97
+
98
+ def test_case_insensitive_blocking(self):
99
+ """Test that dangerous keywords are blocked regardless of case"""
100
+ queries = [
101
+ "delete from cars",
102
+ "DELETE FROM cars",
103
+ "DeLeTe FrOm cars"
104
+ ]
105
+
106
+ for query in queries:
107
+ is_valid, error = self.validator.validate_query(query)
108
+ assert is_valid is False
109
+ assert "DELETE" in error.upper()
110
+
111
+
112
+ if __name__ == "__main__":
113
+ pytest.main([__file__, "-v"])
src/ui/__init__.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """UI package initialization"""
2
+ from ui.charts import (
3
+ create_price_distribution_chart,
4
+ create_top_makes_chart,
5
+ create_condition_pie_chart,
6
+ create_price_by_make_chart,
7
+ create_dynamic_chart
8
+ )
9
+
10
+ __all__ = [
11
+ 'create_price_distribution_chart',
12
+ 'create_top_makes_chart',
13
+ 'create_condition_pie_chart',
14
+ 'create_price_by_make_chart'
15
+ ]
src/ui/charts.py ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Chart generation for business insights visualization
3
+ """
4
+ import plotly.express as px
5
+ import plotly.graph_objects as go
6
+ import pandas as pd
7
+ from typing import Dict, Any
8
+
9
+
10
+ def create_price_distribution_chart(data: pd.DataFrame) -> go.Figure:
11
+ """
12
+ Create a histogram showing price distribution
13
+
14
+ Args:
15
+ data: DataFrame with sellingprice column
16
+
17
+ Returns:
18
+ Plotly figure
19
+ """
20
+ fig = px.histogram(
21
+ data,
22
+ x='sellingprice',
23
+ nbins=50,
24
+ title='Car Price Distribution',
25
+ labels={'sellingprice': 'Selling Price ($)', 'count': 'Number of Cars'},
26
+ color_discrete_sequence=['#1f77b4']
27
+ )
28
+
29
+ fig.update_layout(
30
+ showlegend=False,
31
+ height=300,
32
+ margin=dict(l=20, r=20, t=40, b=20)
33
+ )
34
+
35
+ return fig
36
+
37
+
38
+ def create_top_makes_chart(stats: Dict[str, Any]) -> go.Figure:
39
+ """
40
+ Create a bar chart showing top car makes
41
+
42
+ Args:
43
+ stats: Statistics dictionary with top_makes data
44
+
45
+ Returns:
46
+ Plotly figure
47
+ """
48
+ top_makes = stats.get('top_makes', [])
49
+
50
+ if not top_makes:
51
+ return go.Figure()
52
+
53
+ makes = [item['make'] for item in top_makes]
54
+ counts = [item['count'] for item in top_makes]
55
+
56
+ fig = go.Figure(data=[
57
+ go.Bar(
58
+ x=makes,
59
+ y=counts,
60
+ marker_color='#2ca02c',
61
+ text=counts,
62
+ textposition='auto'
63
+ )
64
+ ])
65
+
66
+ fig.update_layout(
67
+ title='Top 5 Car Makes',
68
+ xaxis_title='Make',
69
+ yaxis_title='Number of Cars',
70
+ height=300,
71
+ margin=dict(l=20, r=20, t=40, b=20)
72
+ )
73
+
74
+ return fig
75
+
76
+
77
+ def create_condition_pie_chart(stats: Dict[str, Any]) -> go.Figure:
78
+ """
79
+ Create a pie chart showing condition distribution
80
+
81
+ Args:
82
+ stats: Statistics dictionary with condition_distribution data
83
+
84
+ Returns:
85
+ Plotly figure
86
+ """
87
+ condition_dist = stats.get('condition_distribution', [])
88
+
89
+ if not condition_dist:
90
+ return go.Figure()
91
+
92
+ # Take top 10 conditions
93
+ condition_dist = condition_dist[:10]
94
+
95
+ conditions = [str(item['condition']) for item in condition_dist]
96
+ counts = [item['count'] for item in condition_dist]
97
+
98
+ fig = go.Figure(data=[
99
+ go.Pie(
100
+ labels=conditions,
101
+ values=counts,
102
+ hole=0.3
103
+ )
104
+ ])
105
+
106
+ fig.update_layout(
107
+ title='Car Condition Distribution',
108
+ height=300,
109
+ margin=dict(l=20, r=20, t=40, b=20)
110
+ )
111
+
112
+ return fig
113
+
114
+
115
+ def create_price_by_make_chart(db_manager) -> go.Figure:
116
+ """Create a bar chart of average price by make (top 10)"""
117
+ query = """
118
+ SELECT make, AVG(sellingprice) as avg_price
119
+ FROM cars
120
+ GROUP BY make
121
+ ORDER BY avg_price DESC
122
+ LIMIT 10
123
+ """
124
+ result = db_manager.execute_query(query)
125
+
126
+ if result['success'] and result['data']:
127
+ df = pd.DataFrame(result['data'])
128
+ fig = px.bar(
129
+ df,
130
+ x='make',
131
+ y='avg_price',
132
+ title='Top 10 Average Prices by Make',
133
+ labels={'make': 'Make', 'avg_price': 'Average Price ($)'},
134
+ template='plotly_white',
135
+ color='avg_price',
136
+ color_continuous_scale='Blues'
137
+ )
138
+ return fig
139
+ else:
140
+ # Return empty figure if data fails
141
+ return go.Figure()
142
+
143
+
144
+ def create_dynamic_chart(data: list, chart_type: str, title: str, x_label: str, y_label: str) -> go.Figure:
145
+ """
146
+ Create a dynamic chart based on data and configuration provided by the AI agent.
147
+
148
+ Args:
149
+ data: List of dictionaries containing the data
150
+ chart_type: Type of chart ('bar', 'column', 'line', 'pie', 'scatter')
151
+ title: Chart title
152
+ x_label: Name of the column for X axis
153
+ y_label: Name of the column for Y axis (or value for pie)
154
+
155
+ Returns:
156
+ Plotly Figure object
157
+ """
158
+ if not data:
159
+ return go.Figure()
160
+
161
+ df = pd.DataFrame(data)
162
+
163
+ # Ensure labels exist in dataframe, if not, use first columns
164
+ if x_label not in df.columns:
165
+ x_label = df.columns[0]
166
+ if y_label not in df.columns and len(df.columns) > 1:
167
+ y_label = df.columns[1]
168
+ elif y_label not in df.columns:
169
+ y_label = x_label
170
+
171
+ if chart_type.lower() in ['bar', 'column']:
172
+ fig = px.bar(
173
+ df,
174
+ x=x_label,
175
+ y=y_label,
176
+ title=title,
177
+ template='plotly_white',
178
+ color=y_label if y_label != x_label else None
179
+ )
180
+ elif chart_type.lower() == 'line':
181
+ fig = px.line(
182
+ df,
183
+ x=x_label,
184
+ y=y_label,
185
+ title=title,
186
+ template='plotly_white',
187
+ markers=True
188
+ )
189
+ elif chart_type.lower() == 'pie':
190
+ fig = px.pie(
191
+ df,
192
+ names=x_label,
193
+ values=y_label,
194
+ title=title,
195
+ template='plotly_white'
196
+ )
197
+ elif chart_type.lower() == 'scatter':
198
+ fig = px.scatter(
199
+ df,
200
+ x=x_label,
201
+ y=y_label,
202
+ title=title,
203
+ template='plotly_white',
204
+ color=y_label if y_label != x_label else None
205
+ )
206
+ else:
207
+ # Fallback to bar chart
208
+ fig = px.bar(df, x=x_label, y=y_label, title=title, template='plotly_white')
209
+
210
+ fig.update_layout(
211
+ margin=dict(l=20, r=20, t=40, b=20),
212
+ xaxis_title=x_label,
213
+ yaxis_title=y_label if chart_type.lower() != 'pie' else ""
214
+ )
215
+
216
+ return fig
src/utils/__init__.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ """Utils package initialization"""
2
+ from utils.logger import setup_logging, get_logs, clear_logs, add_log
3
+
4
+ __all__ = ['setup_logging', 'get_logs', 'clear_logs', 'add_log']
src/utils/logger.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Utility functions for logging in Streamlit
3
+ """
4
+ import logging
5
+ from datetime import datetime
6
+ from typing import List, Dict
7
+ import streamlit as st
8
+
9
+
10
+ class StreamlitLogHandler(logging.Handler):
11
+ """Custom logging handler that stores logs in Streamlit session state"""
12
+
13
+ def __init__(self, max_entries: int = 100):
14
+ super().__init__()
15
+ self.max_entries = max_entries
16
+
17
+ # Initialize session state for logs if not exists
18
+ if 'console_logs' not in st.session_state:
19
+ st.session_state.console_logs = []
20
+
21
+ def emit(self, record: logging.LogRecord):
22
+ """Emit a log record to session state"""
23
+ try:
24
+ log_entry = {
25
+ 'timestamp': datetime.now().strftime('%H:%M:%S'),
26
+ 'level': record.levelname,
27
+ 'message': self.format(record),
28
+ 'logger': record.name
29
+ }
30
+
31
+ # Add to session state
32
+ st.session_state.console_logs.append(log_entry)
33
+
34
+ # Keep only last N entries
35
+ if len(st.session_state.console_logs) > self.max_entries:
36
+ st.session_state.console_logs = st.session_state.console_logs[-self.max_entries:]
37
+
38
+ except Exception:
39
+ self.handleError(record)
40
+
41
+
42
+ def setup_logging(level: str = "INFO", max_entries: int = 100):
43
+ """
44
+ Set up logging with Streamlit handler
45
+
46
+ Args:
47
+ level: Logging level (DEBUG, INFO, WARNING, ERROR)
48
+ max_entries: Maximum number of log entries to keep
49
+ """
50
+ # Create streamlit handler
51
+ streamlit_handler = StreamlitLogHandler(max_entries=max_entries)
52
+ streamlit_handler.setLevel(getattr(logging, level))
53
+
54
+ # Create formatter
55
+ formatter = logging.Formatter('%(name)s - %(message)s')
56
+ streamlit_handler.setFormatter(formatter)
57
+
58
+ # Configure root logger
59
+ root_logger = logging.getLogger()
60
+ root_logger.setLevel(getattr(logging, level))
61
+
62
+ # Remove existing handlers and add streamlit handler
63
+ root_logger.handlers = []
64
+ root_logger.addHandler(streamlit_handler)
65
+
66
+ # Also add console handler for debugging
67
+ console_handler = logging.StreamHandler()
68
+ console_handler.setFormatter(formatter)
69
+ root_logger.addHandler(console_handler)
70
+
71
+
72
+ def get_logs() -> List[Dict]:
73
+ """Get all logs from session state"""
74
+ return st.session_state.get('console_logs', [])
75
+
76
+
77
+ def clear_logs():
78
+ """Clear all logs from session state"""
79
+ st.session_state.console_logs = []
80
+
81
+
82
+ def add_log(level: str, message: str, logger_name: str = "app"):
83
+ """
84
+ Manually add a log entry
85
+
86
+ Args:
87
+ level: Log level (INFO, WARNING, ERROR, DEBUG)
88
+ message: Log message
89
+ logger_name: Name of the logger
90
+ """
91
+ logger = logging.getLogger(logger_name)
92
+ log_method = getattr(logger, level.lower(), logger.info)
93
+ log_method(message)