# 📱 Data Insights App with AI Agent An intelligent data insights application that helps users query and analyze mobile phone data using an AI agent. The application features function calling, safety mechanisms, Jira integration for support tickets, and a modern Streamlit UI. ## 🌟 Features - **AI-Powered Assistant**: Uses OpenAI GPT-4o with function calling to intelligently query data - **Business Dashboard**: Visual metrics and charts showing key business insights - **Safety Features**: Blocks all write operations (INSERT, UPDATE, DELETE, DROP, TRUNCATE, ALTER) - **Support Ticket Integration**: Jira integration for creating support tickets - **Comprehensive Logging**: Detailed console logging for all operations - **Data Privacy**: Only limited data chunks are sent to LLM, not the entire dataset - **Interactive UI**: Built with Streamlit for a modern, responsive interface ## 📋 Requirements - Python 3.12+ - OpenAI API key (GPT-5-nano access) - Jira account and API token (for support ticket feature) - CSV data file: `Global Mobile Prices 2025.csv` ## 🚀 Installation ### Step 1: Clone or Download the Project ```bash cd /path/to/project/GenerativeAI ``` ### Step 2: Install Dependencies ```bash uv venv python 3.12 uv sync ``` The `pyproject.toml` includes: - `streamlit` - Web UI framework - `openai` - OpenAI API client - `pandas` - Data manipulation - `plotly` - Interactive charts - `python-dotenv` - Environment variable management - `jira` - Jira API integration - `numpy` - Numerical operations ### Step 3: Configure Environment Variables Create a `.env` file in the project root: ```bash cp .env.example .env ``` Edit `.env` and add your credentials: ```env # OpenAI API Configuration OPENAI_API_KEY=sk-your-openai-api-key-here # Jira Configuration JIRA_SERVER=https://your-domain.atlassian.net JIRA_EMAIL=your_email@example.com JIRA_API_TOKEN=your_jira_api_token_here JIRA_PROJECT_KEY=SUP # Application Configuration CSV_FILE_PATH=Global Mobile Prices 2025.csv MAX_RESULTS_TO_LLM=20 ``` #### Getting API Credentials: **OpenAI API Key:** 1. Go to https://platform.openai.com/api-keys 2. Create a new API key 3. Copy and paste into `.env` **Jira API Token:** 1. Go to https://id.atlassian.com/manage-profile/security/api-tokens 2. Create a new API token 3. Copy the token to `.env` 4. Set your Jira server URL and email ### Step 4: Prepare Data File Ensure `Global Mobile Prices 2025.csv` is in the project root directory. The CSV should have the following columns: - brand, model, price_usd, ram_gb, storage_gb, camera_mp, battery_mah - display_size_inch, charging_watt, 5g_support, os, processor - rating, release_month, year ## 🎯 Usage ### Running the Application ```bash streamlit run app.py ``` The application will open in your default browser at `http://localhost:8501` ### Console Logs The application logs all operations to the console. You'll see: - Function calls made by the AI agent - Query execution and results - Safety checks and blocked operations - Error messages and warnings Example log output: ``` 2025-11-16 10:30:15 - database - INFO - DataManager initialized with 500 rows 2025-11-16 10:30:20 - agent - INFO - User message: What are the top 5 most expensive phones? 2025-11-16 10:30:21 - tools - INFO - Function called: search_phones_by_criteria 2025-11-16 10:30:21 - tools - INFO - Parameters: {'limit': 5} 2025-11-16 10:30:21 - database - INFO - Filter returned 5 results (limit: 5) ``` ## 📖 Workflow Example ### 1. Application Startup When you launch the app, you'll see: - **Business Overview Dashboard**: Shows total phones, average price, price range, and number of brands - **Data Visualizations**: Charts showing price distribution, top brands, OS distribution, and 5G support - **Sample Queries**: Pre-defined example queries to help you get started ![App Startup - Dashboard with metrics and charts](screenshots/UI_screenshot_1.png) ### 2. Using Sample Queries Click on any sample query button (e.g., "What are the top 5 most expensive phones?"): ![Sample Query Click](screenshots/UI_screenshot_2.png) ### 3. AI Agent Processing The agent processes your query using function calling: - Agent analyzes the user's intent - Selects appropriate tool(s) to use - Executes function calls with parameters - Returns formatted results ![Agent Response with Results](screenshots/UI_screenshot_3.png) **Console Output:** ``` 2025-11-16 10:35:42 - agent - INFO - User message: What are the top 5 most expensive phones? 2025-11-16 10:35:43 - agent - INFO - Agent making 1 tool call(s) 2025-11-16 10:35:43 - tools - INFO - Function called: search_phones_by_criteria 2025-11-16 10:35:43 - tools - INFO - Parameters: {'limit': 5} 2025-11-16 10:35:43 - database - INFO - Filtering data with criteria: {} 2025-11-16 10:35:43 - database - INFO - Filter returned 5 results (limit: 5) 2025-11-16 10:35:44 - agent - INFO - Agent final response: Here are the top 5 most expensive phones... ``` ### 4. Custom Queries Type your own questions in the chat input: - "Show me Apple phones under $1000" - "What's the average price by brand?" - "Find phones with 5G support and 12GB RAM" - "Compare average prices between Android and iOS" ![Custom Query Example](screenshots/most_expensive_phones_function_call.png) ### 5. Viewing Function Calls Expand the "View Function Calls" section to see technical details: ![Function Call Details](screenshots/terminal_log_1.png) ### 6. Creating Support Tickets When the agent suggests human help or you explicitly request it: 1. The agent detects keywords like "help", "support", "bug", "issue" 2. A suggestion to create a ticket appears 3. Click "Create Support Ticket" 4. Fill in the form with your issue details 5. Submit to create a Jira ticket ![Support Ticket Creation](screenshots/support_ticket_creation_flow.png) **Console Output:** ``` 2025-11-16 10:40:12 - support_ticket - INFO - Creating support ticket: Unable to find specific data 2025-11-16 10:40:13 - support_ticket - INFO - Successfully connected to Jira: https://your-domain.atlassian.net 2025-11-16 10:40:14 - support_ticket - INFO - Successfully created ticket: SUP-123 ``` ![Ticket Success Message](screenshots/jira_ticket.png) ### 7. Safety Features in Action If someone tries dangerous operations (simulation): ```python # Example blocked operation query = "DELETE FROM phones WHERE price > 1000" ``` **Console Output:** ``` 2025-11-16 10:45:30 - utils - WARNING - BLOCKED OPERATION: DELETE FROM phones WHERE price > 1000 2025-11-16 10:45:30 - utils - WARNING - Reason: Contains dangerous keyword: DELETE ``` ## 🛡️ Safety Features The application implements multiple safety layers: ### 1. Read-Only Operations - Only SELECT/read operations are allowed - All write operations are blocked ### 2. Blocked SQL Keywords The following operations are prevented: - `INSERT` - Cannot add new records - `UPDATE` - Cannot modify existing records - `DELETE` - Cannot delete records - `DROP` - Cannot drop tables - `TRUNCATE` - Cannot truncate tables - `ALTER` - Cannot alter table structure - `CREATE` - Cannot create new tables - `REPLACE` - Cannot replace data - `MERGE` - Cannot merge data - `EXEC/EXECUTE` - Cannot execute arbitrary code ### 3. Data Chunking - Maximum 20 results sent to LLM by default (configurable) - Prevents overwhelming the LLM context - Protects data privacy ### 4. Logging - All operations logged to console - Blocked attempts are logged with warnings - Full audit trail for security review ## 🔧 Architecture ### Components ``` ┌─────────────────────────────────────────────────────────┐ │ Streamlit UI (app.py) │ │ - Business Dashboard │ │ - Chat Interface │ │ - Support Ticket Form │ └────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ AI Agent (agent.py) │ │ - OpenAI GPT-4o Integration │ │ - Function Calling Logic │ │ - Conversation Management │ └────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Agent Tools (tools.py) │ │ - search_phones_by_criteria() │ │ - get_aggregated_statistics() │ │ - get_price_analysis() │ │ - get_available_brands() │ │ - get_dataset_overview() │ └────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Data Manager (database.py) │ │ - Pandas DataFrame Operations │ │ - Safety Checks │ │ - Query Filtering │ └────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ CSV Data Source │ │ Global Mobile Prices 2025.csv │ └─────────────────────────────────────────────────────────┘ Support Ticket Manager (support_ticket.py) │ ▼ Jira API ``` ### File Structure ``` GenerativeAI/ ├── app.py # Main Streamlit application ├── agent.py # AI agent with function calling ├── database.py # Data management with safety checks ├── tools.py # Function tools for agent ├── support_ticket.py # Jira ticket creation ├── utils.py # Utility functions and logging ├── config.py # Configuration management ├── requirements.txt # Python dependencies ├── .env # Environment variables (create from .env.example) ├── .env.example # Example environment variables ├── Global Mobile Prices 2025.csv # Data source └── README.md # This file ``` ## 🎨 Available Tools (Function Calling) The AI agent has access to 5 tools: ### 1. search_phones_by_criteria Search for phones matching specific criteria. **Parameters:** - `brand` (string): Phone brand - `min_price`, `max_price` (number): Price range - `ram_gb` (integer): RAM size - `storage_gb` (integer): Storage size - `os` (string): Operating system - `has_5g` (boolean): 5G support - `min_rating` (number): Minimum rating - `limit` (integer): Max results **Example Usage:** ``` User: "Find me Samsung phones under $800 with 5G" Agent calls: search_phones_by_criteria(brand="Samsung", max_price=800, has_5g=True) ``` ### 2. get_aggregated_statistics Get aggregated statistics by grouping. **Parameters:** - `group_by` (string): Column to group by - `metric` (string): Column to calculate - `aggregation` (string): Type (mean, sum, count, min, max) **Example Usage:** ``` User: "What's the average price by brand?" Agent calls: get_aggregated_statistics(group_by="brand", metric="price_usd", aggregation="mean") ``` ### 3. get_price_analysis Get detailed price analysis. **Parameters:** - `brand` (string, optional): Filter by brand - `category` (string, optional): Group by category **Example Usage:** ``` User: "Analyze Apple phone prices" Agent calls: get_price_analysis(brand="Apple") ``` ### 4. get_available_brands Get list of all brands. **Example Usage:** ``` User: "What brands are available?" Agent calls: get_available_brands() ``` ### 5. get_dataset_overview Get overview statistics. **Example Usage:** ``` User: "Tell me about the dataset" Agent calls: get_dataset_overview() ``` ## 🐛 Troubleshooting ### Issue: "Failed to load CSV" **Solution:** Ensure `Global Mobile Prices 2025.csv` exists in the project root directory. ### Issue: "OpenAI API Error" **Solution:** - Check your API key is correct in `.env` or sidebar - Ensure you have GPT-4o access - Check your OpenAI account has credits ### Issue: "Failed to connect to Jira" **Solution:** - Verify Jira server URL is correct (https://your-domain.atlassian.net) - Check API token is valid - Ensure project key exists in your Jira instance ### Issue: No data showing in dashboard **Solution:** Check console logs for errors during CSV loading ### Issue: Agent not responding **Solution:** - Check console for error messages - Verify API key is entered - Check internet connection ## 📊 Data Schema The CSV file should contain the following columns: | Column | Type | Description | |--------|------|-------------| | brand | string | Phone manufacturer | | model | string | Phone model name | | price_usd | float | Price in USD | | ram_gb | integer | RAM size in GB | | storage_gb | integer | Storage size in GB | | camera_mp | integer | Camera megapixels | | battery_mah | integer | Battery capacity | | display_size_inch | float | Screen size | | charging_watt | integer | Charging power | | 5g_support | string | "Yes" or "No" | | os | string | Operating system | | processor | string | Processor model | | rating | float | User rating (0-5) | | release_month | string | Release month | | year | integer | Release year | ## 🤝 Contributing To extend the application: 1. **Add new tools**: Add methods to `tools.py` and update `get_tool_definitions()` 2. **Enhance UI**: Modify `app.py` Streamlit components 3. **Add data sources**: Extend `database.py` to support multiple CSVs 4. **Custom safety rules**: Update `DANGEROUS_SQL_KEYWORDS` in `config.py` ## 📝 License This project is for educational and demonstration purposes. ## 🙏 Acknowledgments - Built with [Streamlit](https://streamlit.io/) - Charts by [Plotly](https://plotly.com/) - Issue tracking by [Jira](https://www.atlassian.com/software/jira) ## 📧 Support For issues or questions: 1. Use the in-app support ticket feature 2. Check console logs for detailed error messages 3. Review this README for troubleshooting steps ---