Spaces:
Build error
Build error
Vlad Bastina commited on
Commit ·
1d55012
0
Parent(s):
program files
Browse files- .gitattributes +1 -0
- .gitignore +2 -0
- .streamlit/config.toml +2 -0
- README.md +110 -0
- SalesData.csv +3 -0
- app_generated.py +481 -0
- app_hardcoded.py +511 -0
- packages.txt +1 -0
- requirements.txt +235 -0
- zega_logo.png +0 -0
.gitattributes
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
*.csv filter=lfs diff=lfs merge=lfs -text
|
.gitignore
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.env
|
| 2 |
+
.streamlit/secrets.toml
|
.streamlit/config.toml
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[theme]
|
| 2 |
+
base="light"
|
README.md
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Data Analysis Agent Interface with Streamlit
|
| 2 |
+
|
| 3 |
+
This Streamlit application provides an interface for interacting with a data analysis agent powered by OpenAI's language models. It allows users to ask questions about data in a CSV file and receive answers in the form of Pandas code, data tables, and visualizations. The application also supports generating a PDF report of the analysis.
|
| 4 |
+
|
| 5 |
+
## Features
|
| 6 |
+
|
| 7 |
+
* **Natural Language Queries:** Ask questions in plain English (or Romanian) about the data.
|
| 8 |
+
* **Automatic Code Generation:** The agent generates Pandas code to answer the query.
|
| 9 |
+
* **Data Display:** Results are displayed as interactive DataFrames.
|
| 10 |
+
* **Visualization:** Generates various plots (bar charts, pie charts, histograms, heatmaps, scatter plots, line plots, box plots, violin plots, area charts, and radar charts) based on the query and data.
|
| 11 |
+
* **PDF Report Generation:** Download a PDF report containing the query, generated code, data table, and plots.
|
| 12 |
+
* **Syntax-Highlighted Code:** The generated Python code is displayed in a scrollable, syntax-highlighted code block for easy readability.
|
| 13 |
+
* **Collapsible Code Display:** The generated code is hidden by default, with an expander to reveal it on demand.
|
| 14 |
+
* **Sample Questions:** Provides a set of sample questions to get started.
|
| 15 |
+
* **Powered by ZEGA.ai:** Includes ZEGA.ai branding.
|
| 16 |
+
|
| 17 |
+
## Getting Started
|
| 18 |
+
|
| 19 |
+
### Prerequisites
|
| 20 |
+
|
| 21 |
+
* Python 3.7+
|
| 22 |
+
* An OpenAI API key
|
| 23 |
+
* pdfkit: you need to have wkhtmltopdf installed on your system.
|
| 24 |
+
* **Windows**: Download and install from [wkhtmltopdf.org](https://wkhtmltopdf.org/downloads.html). Add the `wkhtmltopdf/bin` directory to your system's PATH.
|
| 25 |
+
* **macOS**: `brew install wkhtmltopdf`
|
| 26 |
+
* **Linux (Debian/Ubuntu)**: `sudo apt-get install wkhtmltopdf`
|
| 27 |
+
* **Linux (CentOS/RHEL)**: `sudo yum install wkhtmltopdf`
|
| 28 |
+
|
| 29 |
+
### Installation
|
| 30 |
+
|
| 31 |
+
1. **Clone the repository:**
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
git clone <your_repository_url>
|
| 35 |
+
cd <your_repository_directory>
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
2. **Install dependencies:**
|
| 39 |
+
|
| 40 |
+
```bash
|
| 41 |
+
pip install -r requirements.txt
|
| 42 |
+
```
|
| 43 |
+
Create the `requirements.txt` and place this in:
|
| 44 |
+
```
|
| 45 |
+
streamlit
|
| 46 |
+
pandas
|
| 47 |
+
matplotlib
|
| 48 |
+
plotly
|
| 49 |
+
python-dotenv
|
| 50 |
+
langchain
|
| 51 |
+
langchain-experimental
|
| 52 |
+
langchain-openai
|
| 53 |
+
seaborn
|
| 54 |
+
pdfkit
|
| 55 |
+
openai
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
3. **Create a `.env` file:**
|
| 59 |
+
|
| 60 |
+
Create a file named `.env` in the root directory of your project. Add your OpenAI API key to this file:
|
| 61 |
+
|
| 62 |
+
```
|
| 63 |
+
OPENAI_API_KEY=your_openai_api_key_here
|
| 64 |
+
```
|
| 65 |
+
Replace `your_openai_api_key_here` with your actual API key.
|
| 66 |
+
|
| 67 |
+
4. **Place the CSV data file:**
|
| 68 |
+
|
| 69 |
+
Place the `asig_sales_31012025.csv` file in the same directory as your script. If you use a different CSV file, update the `csv_path` variable in the script.
|
| 70 |
+
|
| 71 |
+
5. **Place Zega logo**
|
| 72 |
+
Place the `zega_logo.png` into the folder.
|
| 73 |
+
|
| 74 |
+
### Usage
|
| 75 |
+
|
| 76 |
+
1. **Run the Streamlit app:**
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
streamlit run your_script_name.py
|
| 80 |
+
```
|
| 81 |
+
Replace `your_script_name.py` with the name of your Python script.
|
| 82 |
+
|
| 83 |
+
2. **Interact with the app:**
|
| 84 |
+
|
| 85 |
+
* Select a sample question from the sidebar or enter your own question in the text area. Ensure you ask only one question at a time.
|
| 86 |
+
* Click the "Submit" button.
|
| 87 |
+
* The results (data table and plots) will be displayed.
|
| 88 |
+
* Click the "Show the code" expander to view the generated Pandas code.
|
| 89 |
+
* Click the "Download PDF" button to generate a PDF report.
|
| 90 |
+
|
| 91 |
+
## File Structure
|
| 92 |
+
|
| 93 |
+
* **`your_script_name.py`:** The main Streamlit application script.
|
| 94 |
+
* **`.env`:** Contains your OpenAI API key (should *not* be committed to Git).
|
| 95 |
+
* **`requirements.txt`:** Lists the required Python packages.
|
| 96 |
+
* **`asig_sales_31012025.csv`:** The CSV data file (or your custom data file).
|
| 97 |
+
* **`zega_logo.png`:** Zega logo.
|
| 98 |
+
* **`exported_pdfs/`:** A directory (created automatically) where generated PDF reports are saved.
|
| 99 |
+
* **`README.md`:** This file.
|
| 100 |
+
|
| 101 |
+
## Important Notes
|
| 102 |
+
|
| 103 |
+
* **Date Format:** The script is specifically configured to handle dates in the European DD/MM/YYYY format. Ensure your CSV data uses this format. The `parse_dates` argument in `pd.read_csv` is crucial for correct date handling.
|
| 104 |
+
* **OpenAI API Key:** Keep your OpenAI API key secure. Do *not* commit the `.env` file to your Git repository. Add `.env` to your `.gitignore` file.
|
| 105 |
+
* **Error Handling:** The script includes basic error handling (checking for the CSV file), but you might want to add more robust error handling for production use.
|
| 106 |
+
* **wkhtmltopdf:** Ensure `wkhtmltopdf` is correctly installed and accessible in your system's PATH for PDF generation to work.
|
| 107 |
+
* **Prompt Engineering:** The quality of the generated code depends heavily on the prompt used in the `generate_code` function. The provided prompt is highly detailed and includes specific instructions for the agent. You may need to adjust the prompt if you encounter issues or use a different CSV file with different column names or data structures.
|
| 108 |
+
* **One Question:** The app is designed to process one question at a time. Asking multiple questions in a single input may lead to unexpected behavior.
|
| 109 |
+
|
| 110 |
+
|
SalesData.csv
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:05349b4feb225c6d0f0899ab7465d9346c052de0e21f07bec7b56bb6c4b27565
|
| 3 |
+
size 22441174
|
app_generated.py
ADDED
|
@@ -0,0 +1,481 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import streamlit as st
|
| 2 |
+
import pandas as pd
|
| 3 |
+
import matplotlib.pyplot as plt
|
| 4 |
+
import plotly.express as px
|
| 5 |
+
from dotenv import load_dotenv
|
| 6 |
+
from langchain.agents.agent_types import AgentType
|
| 7 |
+
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
|
| 8 |
+
from langchain_openai import ChatOpenAI
|
| 9 |
+
import os
|
| 10 |
+
import seaborn as sns
|
| 11 |
+
import plotly.graph_objects as go
|
| 12 |
+
import json
|
| 13 |
+
import pdfkit
|
| 14 |
+
import io
|
| 15 |
+
import base64
|
| 16 |
+
from matplotlib.backends.backend_agg import FigureCanvasAgg
|
| 17 |
+
import html
|
| 18 |
+
import re
|
| 19 |
+
from openai import OpenAI
|
| 20 |
+
from io import StringIO
|
| 21 |
+
|
| 22 |
+
load_dotenv()
|
| 23 |
+
|
| 24 |
+
# --- Configuration ---
|
| 25 |
+
OPENAI_API_KEY=os.getenv("OPENAI_API_KEY") or st.secrets.get("OPENAI_API_KEY")
|
| 26 |
+
|
| 27 |
+
client = OpenAI(api_key=OPENAI_API_KEY)
|
| 28 |
+
csv_path = "SalesData.csv"
|
| 29 |
+
|
| 30 |
+
if not os.path.exists(csv_path):
|
| 31 |
+
print(f"Error: CSV file '{csv_path}' not found.")
|
| 32 |
+
exit(1)
|
| 33 |
+
|
| 34 |
+
def get_csv_sample(csv_path, sample_size=5):
|
| 35 |
+
"""Reads a CSV file and returns column info, a sample, and the DataFrame."""
|
| 36 |
+
df = pd.read_csv(csv_path)
|
| 37 |
+
sample_df = df.sample(n=min(sample_size, len(df)), random_state=42)
|
| 38 |
+
return df.dtypes.to_string(), sample_df.to_string(index=False), df
|
| 39 |
+
|
| 40 |
+
column_info, sample_str, _ = get_csv_sample(csv_path)
|
| 41 |
+
|
| 42 |
+
# @observe()
|
| 43 |
+
def chat(response_text):
|
| 44 |
+
return json.loads(response_text) # Directly parse the JSON
|
| 45 |
+
|
| 46 |
+
def generate_code(question, column_info, sample_str, csv_path, model_name="gpt-4o"):
|
| 47 |
+
"""Asks OpenAI to generate Pandas code for a given question."""
|
| 48 |
+
prompt = f"""You are a highly skilled Python data analyst with expert-level proficiency in Pandas. Your task is to write **concise, correct, and efficient** Pandas code to answer a specific question about data contained within a CSV file. The code you generate must be self-contained, directly executable, and produce the correct numerical output or DataFrame structure.
|
| 49 |
+
|
| 50 |
+
**CSV File Information:**
|
| 51 |
+
|
| 52 |
+
* **Path:** '{csv_path}'
|
| 53 |
+
* **Column Information:** (This tells you the names and data types of the columns)
|
| 54 |
+
```
|
| 55 |
+
{column_info}
|
| 56 |
+
```
|
| 57 |
+
* **Sample Data:** (This gives you a glimpse of the data's structure. Note the European date format DD/MM/YYYY)
|
| 58 |
+
```
|
| 59 |
+
{sample_str}
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
**Strict Requirements (Follow these EXACTLY):**
|
| 63 |
+
0. **Multi-part Questions:**
|
| 64 |
+
* If the user asks a multi-part question, **reformat it** to process each part correctly while maintaining the original meaning. **Do not change the intent** of the question.
|
| 65 |
+
* **For multi-part questions**, the code should reflect how each part of the question is handled. You must ensure that each part is processed and combined correctly at the end.
|
| 66 |
+
* **Print a statement** explaining how you processed the multi-part question, e.g., `print("Question was split into parts for processing.")`.
|
| 67 |
+
|
| 68 |
+
1. **Load Data and Parse Dates:** Your code *MUST* begin with the following line to load the data, correctly parsing *ALL* potential date columns:
|
| 69 |
+
```python
|
| 70 |
+
import pandas as pd
|
| 71 |
+
df = pd.read_csv('{csv_path}', parse_dates=['Order Date'])
|
| 72 |
+
```
|
| 73 |
+
Do *NOT* modify this line. The `parse_dates` argument is *critical* for correct date handling.
|
| 74 |
+
|
| 75 |
+
2. **Imports:** Do *NOT* import any libraries other than pandas (which is already imported as `pd`). Do *NOT* use `numpy` or `datetime` directly, unless it is used within the context of parsing in read_csv. Pandas is sufficient for all tasks.
|
| 76 |
+
|
| 77 |
+
3. **Output:**
|
| 78 |
+
* Store your final answer in a variable named `result`.
|
| 79 |
+
* Print the `result` variable using `print(result)`.
|
| 80 |
+
* Do *NOT* use `display()`.
|
| 81 |
+
* The output must be a Pandas DataFrame, Series, or a single value, as appropriate for the question. If it's a DataFrame or Series, ensure the index is reset where appropriate (e.g., after a `groupby()` followed by `.size()`).
|
| 82 |
+
|
| 83 |
+
4. **Conciseness and Style:**
|
| 84 |
+
* Write the *most concise* and efficient Pandas code possible.
|
| 85 |
+
* Use method chaining (e.g., `df.groupby(...).sum().sort_values().head()`) whenever possible and appropriate.
|
| 86 |
+
* Avoid unnecessary intermediate variables unless they *significantly* improve readability.
|
| 87 |
+
* Use clear and understandable variable names for filtered dataframes, (for example: df_2019, df_filtered etc)
|
| 88 |
+
* If calculating a percentage or distribution, combine operations efficiently, ideally in a single chained expression.
|
| 89 |
+
|
| 90 |
+
5. **Correctness:** Your code *MUST* be syntactically correct Python and *MUST* produce the correct answer to the question. Double-check your logic, especially when grouping and aggregating. Pay close attention to the wording of the question.
|
| 91 |
+
|
| 92 |
+
6. **Date and Time Conditions (Implicit Filtering):**
|
| 93 |
+
* **Any question that refers to dates, time periods, months, years, or uses phrases like "issued in," "policies from," "between [dates]," etc., *MUST* filter the data using the `DATA_SEM_OFERTA` column.** This is the *implied* date column for policy issuance. Do *NOT* ask the user which column to use; assume `DATA_SEM_OFERTA`.
|
| 94 |
+
* When filtering dates, use combined boolean conditions for efficiency, e.g., `df[(df['Order Date'].dt.year == 2019) & (df['Order Date'].dt.month == 12)]` rather than separate filtering steps.
|
| 95 |
+
|
| 96 |
+
7. **Column Names:** Use the *exact* column names provided in the "CSV Column Information." Pay close attention to capitalization, spaces, and any special characters.
|
| 97 |
+
|
| 98 |
+
8. **No Explanations:** Output *ONLY* the Python code. Do *NOT* include any comments, explanations, surrounding text, or markdown formatting (like ```python). Just the code.
|
| 99 |
+
|
| 100 |
+
9. **Aggregation (VERY IMPORTANT):** When the question asks for:
|
| 101 |
+
* "top N" or "first N"
|
| 102 |
+
* "most frequent"
|
| 103 |
+
* "highest/lowest" (after grouping)
|
| 104 |
+
* "average/sum/count per [group]"
|
| 105 |
+
* **Calculate Percentage**: When percentage is asked, compute the correct percentage value
|
| 106 |
+
|
| 107 |
+
You *MUST* perform a `groupby()` operation *BEFORE* sorting or selecting the top N values. The correct order is:
|
| 108 |
+
1. Filter the DataFrame (if needed, using boolean indexing).
|
| 109 |
+
2. Group by the appropriate column(s) using `.groupby()`.
|
| 110 |
+
3. Apply an aggregation function (e.g., `.sum()`, `.mean()`, `.size()`, `.count()`, `.median()`).
|
| 111 |
+
4. *Then*, sort (if needed) using `.sort_values()` and/or select the top N (if needed) using `.nlargest()` or `.head()`.
|
| 112 |
+
|
| 113 |
+
10. **Error Handling:** Assume the CSV file exists and is correctly formatted. You do *not* need to write any explicit error handling code.
|
| 114 |
+
|
| 115 |
+
11. **Clarity:** Use clear and meaningful variable names if you create intermediate dataframes, but prioritize conciseness.
|
| 116 |
+
**Column Usage Guidance:**
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
13. primele means .nlargest and ultimele means .nsmallest
|
| 120 |
+
* Use *Product* when referring to specific items sold (e.g., "most popular product," "top-selling product").
|
| 121 |
+
* Use *City* when grouping or summarizing sales by location (e.g., "which city had the highest revenue?").
|
| 122 |
+
* Use *Order* Date for any time-based filtering (e.g., "sales in December," "transactions between January and March").
|
| 123 |
+
* Use *Sales* for financial aggregations (e.g., total revenue, average sale per transaction).
|
| 124 |
+
* Use *Quantity* Ordered when analyzing product demand (e.g., "most sold product in terms of units").
|
| 125 |
+
* Use *Hour* to analyze time-based trends (e.g., "which hour has the highest number of purchases?").
|
| 126 |
+
|
| 127 |
+
**Question:**
|
| 128 |
+
{question}
|
| 129 |
+
"""
|
| 130 |
+
|
| 131 |
+
response = client.chat.completions.create(model=model_name,
|
| 132 |
+
temperature=0, # Keep temperature at 0 for consistent, deterministic code
|
| 133 |
+
messages=[
|
| 134 |
+
{"role": "system", "content": "You are a helpful assistant that generates Python code."},
|
| 135 |
+
{"role": "user", "content": prompt}
|
| 136 |
+
])
|
| 137 |
+
|
| 138 |
+
code_to_execute = response.choices[0].message.content.strip()
|
| 139 |
+
code_to_execute = code_to_execute.replace("```python", "").replace("```", "").strip()
|
| 140 |
+
|
| 141 |
+
return code_to_execute
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
def execute_code(generated_code, csv_path):
|
| 145 |
+
"""Executes the generated Pandas code and captures the output."""
|
| 146 |
+
local_vars = {"pd": pd, "__file__": csv_path}
|
| 147 |
+
exec(generated_code, {}, local_vars)
|
| 148 |
+
return local_vars.get("result")
|
| 149 |
+
|
| 150 |
+
def generate_plot_code(question, dataframe, model_name="gpt-4o"):
|
| 151 |
+
"""Asks OpenAI to generate plotting code based on the question and dataframe."""
|
| 152 |
+
|
| 153 |
+
# Convert dataframe to string representation
|
| 154 |
+
df_str = dataframe.to_string(index=False)
|
| 155 |
+
df_json = dataframe.to_json(orient="records")
|
| 156 |
+
|
| 157 |
+
prompt = f"""You are a data visualization expert. Create Python code to visualize the data below based on the user's question. The visualizations must comprehensively represent *all* the information returned by the query to effectively answer the question.
|
| 158 |
+
|
| 159 |
+
**User Question:**
|
| 160 |
+
{question}
|
| 161 |
+
|
| 162 |
+
**Data (first few rows):**
|
| 163 |
+
```
|
| 164 |
+
{df_str}
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
**Data (JSON format):**
|
| 168 |
+
```json
|
| 169 |
+
{df_json}
|
| 170 |
+
```
|
| 171 |
+
|
| 172 |
+
**Requirements:**
|
| 173 |
+
1. Create 4-7 different, meaningful visualizations that collectively represent all aspects of the data returned by the query, ensuring no key information is omitted.
|
| 174 |
+
2. Ensure each visualization is simple, clear, and directly tied to a specific part of the data or question, while together they cover the full scope of the result.
|
| 175 |
+
3. Use ONLY Matplotlib and Seaborn (avoid Plotly to prevent compatibility issues).
|
| 176 |
+
4. Include proper titles, labels, and legends for clarity, reflecting the specific data being visualized.
|
| 177 |
+
5. Use appropriate color schemes that are visually appealing and accessible (e.g., colorblind-friendly palettes like Seaborn's 'colorblind').
|
| 178 |
+
6. Return a list of tuples containing the plot title and the base64-encoded image.
|
| 179 |
+
7. Make sure to close all plt figures with plt.close() after adding each to the plots list to prevent memory issues.
|
| 180 |
+
8. If the data includes categories (e.g., sucursale, produse, pachete), ensure these are fully represented across the plots (e.g., bar charts, pie charts, or grouped visuals).
|
| 181 |
+
9. If the data includes numerical values (e.g., sales, totals), use appropriate plot types (e.g., bar, line, or scatter) to show trends, comparisons, or distributions.
|
| 182 |
+
10. If the question involves time periods, ensure at least one visualization reflects the temporal aspect using the relevant date information.
|
| 183 |
+
|
| 184 |
+
**Output Format:**
|
| 185 |
+
Your code should ONLY include a function called `create_plots(data)` that takes a pandas DataFrame as input and returns a list of tuples containing the plot titles and the base64-encoded images.
|
| 186 |
+
|
| 187 |
+
Return only the function definition without any explanations, imports, or additional code. Do NOT include any Streamlit-specific code.
|
| 188 |
+
"""
|
| 189 |
+
|
| 190 |
+
response = client.chat.completions.create(model=model_name,
|
| 191 |
+
temperature=0.2, # Slightly higher temperature for creative visualizations
|
| 192 |
+
messages=[
|
| 193 |
+
{"role": "system", "content": "You are a data visualization expert who creates Python code for plotting data."},
|
| 194 |
+
{"role": "user", "content": prompt}
|
| 195 |
+
])
|
| 196 |
+
|
| 197 |
+
plot_code = response.choices[0].message.content.strip()
|
| 198 |
+
plot_code = plot_code.replace("```python", "").replace("```", "").strip()
|
| 199 |
+
|
| 200 |
+
return plot_code
|
| 201 |
+
|
| 202 |
+
def execute_plot_code(plot_code, result_df):
|
| 203 |
+
"""Executes the generated plotting code and captures the outputs."""
|
| 204 |
+
try:
|
| 205 |
+
# Create a dictionary with all the necessary imports
|
| 206 |
+
globals_dict = {
|
| 207 |
+
"pd": pd,
|
| 208 |
+
"plt": plt,
|
| 209 |
+
"px": px,
|
| 210 |
+
"sns": sns,
|
| 211 |
+
"go": go,
|
| 212 |
+
"io": io,
|
| 213 |
+
"base64": base64,
|
| 214 |
+
"np": __import__('numpy'),
|
| 215 |
+
"plotly": __import__('plotly')
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
# Create a local variables dictionary with the data
|
| 219 |
+
local_vars = {
|
| 220 |
+
"data": result_df
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
# Define the helper functions first
|
| 224 |
+
helper_code = """
|
| 225 |
+
def fig_to_base64(fig):
|
| 226 |
+
buf = io.BytesIO()
|
| 227 |
+
fig.savefig(buf, format="png", bbox_inches="tight")
|
| 228 |
+
buf.seek(0)
|
| 229 |
+
img_str = base64.b64encode(buf.getvalue()).decode("utf-8")
|
| 230 |
+
buf.close()
|
| 231 |
+
return img_str
|
| 232 |
+
|
| 233 |
+
def plotly_to_base64(fig):
|
| 234 |
+
# For Plotly figures, convert to image bytes and then to base64
|
| 235 |
+
img_bytes = fig.to_image(format="png", scale=2)
|
| 236 |
+
img_str = base64.b64encode(img_bytes).decode("utf-8")
|
| 237 |
+
return img_str
|
| 238 |
+
"""
|
| 239 |
+
|
| 240 |
+
# Execute the helper functions first
|
| 241 |
+
exec(helper_code, globals_dict, local_vars)
|
| 242 |
+
|
| 243 |
+
# Then execute the plot code
|
| 244 |
+
exec(plot_code, globals_dict, local_vars)
|
| 245 |
+
|
| 246 |
+
# Get the plots from the create_plots function
|
| 247 |
+
if "create_plots" in local_vars:
|
| 248 |
+
plots = local_vars["create_plots"](result_df)
|
| 249 |
+
return plots
|
| 250 |
+
elif "plots" in local_vars:
|
| 251 |
+
return local_vars["plots"]
|
| 252 |
+
else:
|
| 253 |
+
return []
|
| 254 |
+
except Exception as e:
|
| 255 |
+
st.error(f"Error executing plot code: {str(e)}")
|
| 256 |
+
import traceback
|
| 257 |
+
st.error(traceback.format_exc())
|
| 258 |
+
return []
|
| 259 |
+
|
| 260 |
+
def sanitize_filename(filename):
|
| 261 |
+
return re.sub(r'[^a-zA-Z0-9]', '_', filename)
|
| 262 |
+
|
| 263 |
+
def generate_pdf(query, response_text, chat_response, plots):
|
| 264 |
+
query = html.unescape(query)
|
| 265 |
+
response_text = html.unescape(response_text)
|
| 266 |
+
escaped_query = html.escape(query)
|
| 267 |
+
escaped_response_text = html.escape(response_text)
|
| 268 |
+
|
| 269 |
+
html_content = f"""
|
| 270 |
+
<!DOCTYPE html>
|
| 271 |
+
<html lang="ro">
|
| 272 |
+
<head>
|
| 273 |
+
<title>Data Analysis Report</title>
|
| 274 |
+
<meta charset="UTF-8">
|
| 275 |
+
<style>
|
| 276 |
+
body {{ font-family: Arial, sans-serif; margin: 20px; background-color: #f9f9f9; color: #333; }}
|
| 277 |
+
h1 {{ color: #1f77b4; text-align: center; }}
|
| 278 |
+
h3 {{ color: #2c3e50; border-bottom: 2px solid #ddd; padding-bottom: 5px; }}
|
| 279 |
+
h4 {{ color: #2980b9; }}
|
| 280 |
+
p {{ line-height: 1.6; background-color: #fff; padding: 10px; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1); }}
|
| 281 |
+
pre {{ background-color: #ecf0f1; padding: 10px; border-radius: 5px; font-size: 12px; }}
|
| 282 |
+
table {{ border-collapse: collapse; width: 100%; margin: 10px 0; page-break-inside: avoid; }}
|
| 283 |
+
th, td {{ border: 1px solid #bdc3c7; padding: 10px; text-align: left; }}
|
| 284 |
+
th {{ background-color: #3498db; color: white; }}
|
| 285 |
+
td {{ background-color: #fff; }}
|
| 286 |
+
img {{ max-width: 100%; height: auto; margin: 10px 0; page-break-inside: avoid; }}
|
| 287 |
+
.section {{ margin-bottom: 20px; }}
|
| 288 |
+
.no-break {{ page-break-inside: avoid; }}
|
| 289 |
+
.powered-by {{ text-align: center; margin-top: 20px; font-size: 10px; color: #777; }}
|
| 290 |
+
.logo {{ height: 100px; }}
|
| 291 |
+
</style>
|
| 292 |
+
</head>
|
| 293 |
+
<body>
|
| 294 |
+
<h1>Data Analysis Agent Interface</h1>
|
| 295 |
+
<div class="section no-break"><h3>Query</h3><p>{escaped_query}</p></div>
|
| 296 |
+
<div class="section no-break"><h3>Response</h3><p>{escaped_response_text}</p></div>
|
| 297 |
+
<div class="section no-break">
|
| 298 |
+
<h3>Raw Structured Response</h3>
|
| 299 |
+
<h4>Metadata</h4><pre>{json.dumps(chat_response["metadata"], indent=2, ensure_ascii=False)}</pre>
|
| 300 |
+
<h4>Data</h4>{pd.DataFrame(chat_response["data"]).to_html(index=False, classes="no-break", escape=False)}
|
| 301 |
+
</div>
|
| 302 |
+
<div class="section"><h3>Plots</h3>{"".join([f'<div class="no-break"><h4>{name}</h4><img src="data:image/png;base64,{base64}"/></div>' for name, base64 in plots])}</div>
|
| 303 |
+
<div class="powered-by">Powered by <img src="data:image/png;base64,{get_zega_logo_base64()}" class="logo"></div>
|
| 304 |
+
</body></html>
|
| 305 |
+
"""
|
| 306 |
+
|
| 307 |
+
html_file = "temp.html"
|
| 308 |
+
sanitized_query = sanitize_filename(query)
|
| 309 |
+
os.makedirs("./exported_pdfs", exist_ok=True)
|
| 310 |
+
pdf_file = f"./exported_pdfs/{sanitized_query}.pdf"
|
| 311 |
+
|
| 312 |
+
try:
|
| 313 |
+
with open(html_file, "w", encoding="utf-8") as f:
|
| 314 |
+
f.write(html_content)
|
| 315 |
+
options = {'encoding': "UTF-8", 'custom-header': [('Content-Type', 'text/html; charset=UTF-8')], 'no-outline': None}
|
| 316 |
+
pdfkit.from_file(html_file, pdf_file, options=options)
|
| 317 |
+
os.remove(html_file)
|
| 318 |
+
except Exception as e:
|
| 319 |
+
raise
|
| 320 |
+
return pdf_file
|
| 321 |
+
|
| 322 |
+
def get_zega_logo_base64():
|
| 323 |
+
try:
|
| 324 |
+
with open("zega_logo.png", "rb") as image_file:
|
| 325 |
+
encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
|
| 326 |
+
return encoded_string
|
| 327 |
+
except Exception as e:
|
| 328 |
+
raise
|
| 329 |
+
|
| 330 |
+
# Streamlit Interface
|
| 331 |
+
st.title("Data Analysis Agent Interface")
|
| 332 |
+
|
| 333 |
+
st.sidebar.markdown(
|
| 334 |
+
f"""
|
| 335 |
+
<div style="text-align: center;">
|
| 336 |
+
Powered by <img src="data:image/png;base64,{get_zega_logo_base64()}" style="height: 100px;">
|
| 337 |
+
</div>
|
| 338 |
+
""",
|
| 339 |
+
unsafe_allow_html=True,
|
| 340 |
+
)
|
| 341 |
+
st.sidebar.header("Sample Questions")
|
| 342 |
+
|
| 343 |
+
sample_questions = [
|
| 344 |
+
"Top 5 cities with the highest sales?",
|
| 345 |
+
"Bottom 3 products by total sales?",
|
| 346 |
+
"Top 10 products with reference to items sold?",
|
| 347 |
+
"Top 10 products with reference to total sums sold?"
|
| 348 |
+
]
|
| 349 |
+
|
| 350 |
+
selected_question = st.sidebar.selectbox("Select a sample question:", sample_questions)
|
| 351 |
+
user_query = st.text_area("Please write one question at a time.", value=selected_question, height=100)
|
| 352 |
+
|
| 353 |
+
def process_query():
|
| 354 |
+
try:
|
| 355 |
+
# Step 1: Generate and execute code to get the data
|
| 356 |
+
generated_code = generate_code(user_query, column_info, sample_str, csv_path)
|
| 357 |
+
result = execute_code(generated_code, csv_path)
|
| 358 |
+
|
| 359 |
+
# Convert result to DataFrame if it's not already
|
| 360 |
+
if isinstance(result, pd.DataFrame):
|
| 361 |
+
result_df = result
|
| 362 |
+
elif isinstance(result, pd.Series):
|
| 363 |
+
result_df = result.reset_index()
|
| 364 |
+
elif isinstance(result, list):
|
| 365 |
+
if all(isinstance(item, dict) for item in result):
|
| 366 |
+
result_df = pd.DataFrame(result)
|
| 367 |
+
else:
|
| 368 |
+
result_df = pd.DataFrame({"value": result})
|
| 369 |
+
else:
|
| 370 |
+
result_df = pd.DataFrame({"value": [result]})
|
| 371 |
+
|
| 372 |
+
# Step 2: Generate and execute plotting code
|
| 373 |
+
plot_code = generate_plot_code(user_query, result_df)
|
| 374 |
+
plots = execute_plot_code(plot_code, result_df)
|
| 375 |
+
|
| 376 |
+
# Prepare the chat response
|
| 377 |
+
if isinstance(result, pd.DataFrame):
|
| 378 |
+
chat_response = {
|
| 379 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 380 |
+
"data": result.to_dict(orient='records'),
|
| 381 |
+
"csv_data": result.to_dict(orient='records'),
|
| 382 |
+
}
|
| 383 |
+
elif isinstance(result, pd.Series):
|
| 384 |
+
result = result.reset_index()
|
| 385 |
+
chat_response = {
|
| 386 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 387 |
+
"data": result.to_dict(orient='records'),
|
| 388 |
+
"csv_data": result.to_dict(orient='records'),
|
| 389 |
+
}
|
| 390 |
+
elif isinstance(result, list):
|
| 391 |
+
if all(isinstance(item, (int, float)) for item in result):
|
| 392 |
+
chat_response = {
|
| 393 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 394 |
+
"data": [{"category": str(i), "value": v} for i, v in enumerate(result)],
|
| 395 |
+
"csv_data": [{"category": str(i), "value": v} for i, v in enumerate(result)],
|
| 396 |
+
}
|
| 397 |
+
elif all(isinstance(item, dict) for item in result):
|
| 398 |
+
chat_response = {
|
| 399 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 400 |
+
"data": result,
|
| 401 |
+
"csv_data": result,
|
| 402 |
+
}
|
| 403 |
+
else:
|
| 404 |
+
st.warning("Result is a list with mixed data types. Please inspect.")
|
| 405 |
+
return
|
| 406 |
+
else:
|
| 407 |
+
chat_response = {
|
| 408 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 409 |
+
"data": [{"category": "Result", "value": result}],
|
| 410 |
+
"csv_data": [{"category": "Result", "value": result}],
|
| 411 |
+
}
|
| 412 |
+
|
| 413 |
+
# Display the query and data
|
| 414 |
+
st.markdown(f"<h3 style='color: #2e86de;'>Question:</h3>", unsafe_allow_html=True)
|
| 415 |
+
st.markdown(f"<p style='color: #2e86de;'>{user_query}</p>", unsafe_allow_html=True)
|
| 416 |
+
st.write("-" * 200)
|
| 417 |
+
|
| 418 |
+
# Initially hide the code
|
| 419 |
+
with st.expander("Show the generated data code"):
|
| 420 |
+
st.code(generated_code, language="python")
|
| 421 |
+
|
| 422 |
+
with st.expander("Show the generated plotting code"):
|
| 423 |
+
st.code(plot_code, language="python")
|
| 424 |
+
|
| 425 |
+
st.write("-" * 200)
|
| 426 |
+
|
| 427 |
+
# Display the data
|
| 428 |
+
st.markdown("### Data:")
|
| 429 |
+
st.dataframe(result_df)
|
| 430 |
+
st.write("-" * 200)
|
| 431 |
+
|
| 432 |
+
# Display the plots
|
| 433 |
+
st.markdown("### Visualizations:")
|
| 434 |
+
for name, base64_img in plots:
|
| 435 |
+
st.markdown(f"#### {name}")
|
| 436 |
+
st.markdown(f'<img src="data:image/png;base64,{base64_img}" style="max-width:100%">', unsafe_allow_html=True)
|
| 437 |
+
st.write("-" * 100)
|
| 438 |
+
|
| 439 |
+
# Store the data for PDF generation
|
| 440 |
+
st.session_state["query"] = user_query
|
| 441 |
+
st.session_state["response_text"] = str(result)
|
| 442 |
+
st.session_state["chat_response"] = chat_response
|
| 443 |
+
st.session_state["plots"] = plots
|
| 444 |
+
st.session_state["generated_code"] = generated_code
|
| 445 |
+
st.session_state["plot_code"] = plot_code
|
| 446 |
+
|
| 447 |
+
except Exception as e:
|
| 448 |
+
st.error(f"An error occurred: {e}")
|
| 449 |
+
import traceback
|
| 450 |
+
st.error(traceback.format_exc())
|
| 451 |
+
|
| 452 |
+
if st.button("Submit"):
|
| 453 |
+
with st.spinner("Processing query..."):
|
| 454 |
+
try:
|
| 455 |
+
process_query()
|
| 456 |
+
except Exception as e:
|
| 457 |
+
st.error(f"An error occurred: {e}")
|
| 458 |
+
import traceback
|
| 459 |
+
st.error(traceback.format_exc())
|
| 460 |
+
|
| 461 |
+
if "chat_response" in st.session_state:
|
| 462 |
+
if st.button("Download PDF"):
|
| 463 |
+
with st.spinner("Generating PDF..."):
|
| 464 |
+
try:
|
| 465 |
+
pdf_file = generate_pdf(
|
| 466 |
+
st.session_state["query"],
|
| 467 |
+
st.session_state["response_text"],
|
| 468 |
+
st.session_state["chat_response"],
|
| 469 |
+
st.session_state["plots"]
|
| 470 |
+
)
|
| 471 |
+
with open(pdf_file, "rb") as f:
|
| 472 |
+
pdf_data = f.read()
|
| 473 |
+
sanitized_query = sanitize_filename(st.session_state["query"])
|
| 474 |
+
st.download_button(
|
| 475 |
+
label="Click Here to Download PDF",
|
| 476 |
+
data=pdf_data,
|
| 477 |
+
file_name=f"{sanitized_query}.pdf",
|
| 478 |
+
mime="application/pdf",
|
| 479 |
+
)
|
| 480 |
+
except Exception as e:
|
| 481 |
+
st.error(f"PDF generation failed: {e}")
|
app_hardcoded.py
ADDED
|
@@ -0,0 +1,511 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import streamlit as st
|
| 2 |
+
import pandas as pd
|
| 3 |
+
import matplotlib.pyplot as plt
|
| 4 |
+
import plotly.express as px
|
| 5 |
+
from dotenv import load_dotenv
|
| 6 |
+
from langchain.agents.agent_types import AgentType
|
| 7 |
+
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
|
| 8 |
+
from langchain_openai import ChatOpenAI
|
| 9 |
+
import os
|
| 10 |
+
import seaborn as sns
|
| 11 |
+
import plotly.graph_objects as go
|
| 12 |
+
import json
|
| 13 |
+
import pdfkit
|
| 14 |
+
import io
|
| 15 |
+
import base64
|
| 16 |
+
from matplotlib.backends.backend_agg import FigureCanvasAgg
|
| 17 |
+
import html
|
| 18 |
+
import re
|
| 19 |
+
from openai import OpenAI
|
| 20 |
+
from io import StringIO
|
| 21 |
+
|
| 22 |
+
load_dotenv()
|
| 23 |
+
|
| 24 |
+
# --- Configuration ---
|
| 25 |
+
OPENAI_API_KEY=os.getenv("OPENAI_API_KEY") or st.secrets.get("OPENAI_API_KEY")
|
| 26 |
+
|
| 27 |
+
client = OpenAI(api_key=OPENAI_API_KEY)
|
| 28 |
+
csv_path = "asig_sales_31012025.csv"
|
| 29 |
+
|
| 30 |
+
if not os.path.exists(csv_path):
|
| 31 |
+
print(f"Error: CSV file '{csv_path}' not found.")
|
| 32 |
+
exit(1)
|
| 33 |
+
|
| 34 |
+
def get_csv_sample(csv_path, sample_size=5):
|
| 35 |
+
"""Reads a CSV file and returns column info, a sample, and the DataFrame."""
|
| 36 |
+
df = pd.read_csv(csv_path)
|
| 37 |
+
sample_df = df.sample(n=min(sample_size, len(df)), random_state=42)
|
| 38 |
+
return df.dtypes.to_string(), sample_df.to_string(index=False), df
|
| 39 |
+
|
| 40 |
+
column_info, sample_str, _ = get_csv_sample(csv_path)
|
| 41 |
+
|
| 42 |
+
# @observe()
|
| 43 |
+
def chat(response_text):
|
| 44 |
+
return json.loads(response_text) # Directly parse the JSON
|
| 45 |
+
|
| 46 |
+
def generate_code(question, column_info, sample_str, csv_path, model_name="gpt-4o"):
|
| 47 |
+
"""Asks OpenAI to generate Pandas code for a given question."""
|
| 48 |
+
prompt = f"""You are a highly skilled Python data analyst with expert-level proficiency in Pandas. Your task is to write **concise, correct, and efficient** Pandas code to answer a specific question about data contained within a CSV file. The code you generate must be self-contained, directly executable, and produce the correct numerical output or DataFrame structure.
|
| 49 |
+
|
| 50 |
+
**CSV File Information:**
|
| 51 |
+
|
| 52 |
+
* **Path:** '{csv_path}'
|
| 53 |
+
* **Column Information:** (This tells you the names and data types of the columns)
|
| 54 |
+
```
|
| 55 |
+
{column_info}
|
| 56 |
+
```
|
| 57 |
+
* **Sample Data:** (This gives you a glimpse of the data's structure. Note the European date format DD/MM/YYYY)
|
| 58 |
+
```
|
| 59 |
+
{sample_str}
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
**Strict Requirements (Follow these EXACTLY):**
|
| 63 |
+
0. **Multi-part Questions:**
|
| 64 |
+
* If the user asks a multi-part question, **reformat it** to process each part correctly while maintaining the original meaning. **Do not change the intent** of the question.
|
| 65 |
+
* **For multi-part questions**, the code should reflect how each part of the question is handled. You must ensure that each part is processed and combined correctly at the end.
|
| 66 |
+
* **Print a statement** explaining how you processed the multi-part question, e.g., `print("Question was split into parts for processing.")`.
|
| 67 |
+
|
| 68 |
+
1. **Load Data and Parse Dates:** Your code *MUST* begin with the following line to load the data, correctly parsing *ALL* potential date columns:
|
| 69 |
+
```python
|
| 70 |
+
import pandas as pd
|
| 71 |
+
df = pd.read_csv('{csv_path}', parse_dates=['HIST_DATE', 'DATA_SEM_OFERTA', 'DATA_STARE_CERERE', 'DATA_IN_OFERTA', 'CTR_DATA_START', 'CTR_DATA_STATUS'], dayfirst=True)
|
| 72 |
+
```
|
| 73 |
+
Do *NOT* modify this line. The `parse_dates` argument is *critical* for correct date handling, and `dayfirst=True` is absolutely required because dates are in European DD/MM/YYYY format.
|
| 74 |
+
|
| 75 |
+
2. **Imports:** Do *NOT* import any libraries other than pandas (which is already imported as `pd`). Do *NOT* use `numpy` or `datetime` directly, unless it is used within the context of parsing in read_csv. Pandas is sufficient for all tasks.
|
| 76 |
+
|
| 77 |
+
3. **Output:**
|
| 78 |
+
* Store your final answer in a variable named `result`.
|
| 79 |
+
* Print the `result` variable using `print(result)`.
|
| 80 |
+
* Do *NOT* use `display()`.
|
| 81 |
+
* The output must be a Pandas DataFrame, Series, or a single value, as appropriate for the question. If it's a DataFrame or Series, ensure the index is reset where appropriate (e.g., after a `groupby()` followed by `.size()`).
|
| 82 |
+
|
| 83 |
+
4. **Conciseness and Style:**
|
| 84 |
+
* Write the *most concise* and efficient Pandas code possible.
|
| 85 |
+
* Use method chaining (e.g., `df.groupby(...).sum().sort_values().head()`) whenever possible and appropriate.
|
| 86 |
+
* Avoid unnecessary intermediate variables unless they *significantly* improve readability.
|
| 87 |
+
* Use clear and understandable variable names for filtered dataframes, (for example: df_2010, df_filtered etc)
|
| 88 |
+
* If calculating a percentage or distribution, combine operations efficiently, ideally in a single chained expression.
|
| 89 |
+
|
| 90 |
+
5. **Correctness:** Your code *MUST* be syntactically correct Python and *MUST* produce the correct answer to the question. Double-check your logic, especially when grouping and aggregating. Pay close attention to the wording of the question.
|
| 91 |
+
|
| 92 |
+
6. **Date and Time Conditions (Implicit Filtering):**
|
| 93 |
+
* **Any question that refers to dates, time periods, months, years, or uses phrases like "issued in," "policies from," "between [dates]," etc., *MUST* filter the data using the `DATA_SEM_OFERTA` column.** This is the *implied* date column for policy issuance. Do *NOT* ask the user which column to use; assume `DATA_SEM_OFERTA`.
|
| 94 |
+
* When filtering dates, use combined boolean conditions for efficiency, e.g., `df[(df['DATA_SEM_OFERTA'].dt.year == 2010) & (df['DATA_SEM_OFERTA'].dt.month == 12)]` rather than separate filtering steps.
|
| 95 |
+
|
| 96 |
+
7. **Column Names:** Use the *exact* column names provided in the "CSV Column Information." Pay close attention to capitalization, spaces, and any special characters.
|
| 97 |
+
|
| 98 |
+
8. **No Explanations:** Output *ONLY* the Python code. Do *NOT* include any comments, explanations, surrounding text, or markdown formatting (like ```python). Just the code.
|
| 99 |
+
|
| 100 |
+
9. **Aggregation (VERY IMPORTANT):** When the question asks for:
|
| 101 |
+
* "top N" or "first N"
|
| 102 |
+
* "most frequent"
|
| 103 |
+
* "highest/lowest" (after grouping)
|
| 104 |
+
* "average/sum/count per [group]"
|
| 105 |
+
* **Calculate Percentage**: When percentage is asked, compute the correct percentage value
|
| 106 |
+
|
| 107 |
+
You *MUST* perform a `groupby()` operation *BEFORE* sorting or selecting the top N values. The correct order is:
|
| 108 |
+
1. Filter the DataFrame (if needed, using boolean indexing).
|
| 109 |
+
2. Group by the appropriate column(s) using `.groupby()`.
|
| 110 |
+
3. Apply an aggregation function (e.g., `.sum()`, `.mean()`, `.size()`, `.count()`, `.median()`).
|
| 111 |
+
4. *Then*, sort (if needed) using `.sort_values()` and/or select the top N (if needed) using `.nlargest()` or `.head()`.
|
| 112 |
+
|
| 113 |
+
10. **Error Handling:** Assume the CSV file exists and is correctly formatted. You do *not* need to write any explicit error handling code.
|
| 114 |
+
|
| 115 |
+
11. **Clarity:** Use clear and meaningful variable names if you create intermediate dataframes, but prioritize conciseness.
|
| 116 |
+
**Column Usage Guidance:**
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
13. primele means .nlargest and ultimele means .nsmallest
|
| 120 |
+
* Use `CTR_STATUS` when a concise or coded representation of the contract status is needed (e.g., for technical filtering or matching with system data).
|
| 121 |
+
* Use `CTR_DESCRIERE_STATUS` when a human-readable description is required (e.g., for distributions, summaries, or grouping by status type, such as "Activ", "Reziliat"). Default to `CTR_DESCRIERE_STATUS` for questions involving totals, distributions, or descriptive analysis unless the question specifies a coded status.
|
| 122 |
+
* Use `COD_SUCURSALA` for numerical branch identification (e.g., filtering or joining with other datasets); use `DENUMIRE_SUCURSALA` for human-readable branch names (e.g., grouping or summarizing by branch name).
|
| 123 |
+
* Use `COD_AGENTIE` for numerical agency identification; use `DENUMIRE_AGENTIE` for human-readable agency names, preferring the latter for summaries or rankings.
|
| 124 |
+
* Use `DATA_SEM_OFERTA` as the implied date column for policy issuance or time-based filtering (e.g., "issued in", "per month"), unless the question specifies another date column.
|
| 125 |
+
* Use `PBA_BAZA`, `PBA_ASIG_SUPLIM`, `PBA_TOTAL_SEMNARE_CERERE`, and `PBA_TOTAL_EMITERE_CERERE` for financial aggregations (e.g., sum, mean) based on the specific PBA type mentioned in the question.
|
| 126 |
+
|
| 127 |
+
**Question:**
|
| 128 |
+
{question}
|
| 129 |
+
"""
|
| 130 |
+
|
| 131 |
+
response = client.chat.completions.create(model=model_name,
|
| 132 |
+
temperature=0, # Keep temperature at 0 for consistent, deterministic code
|
| 133 |
+
messages=[
|
| 134 |
+
{"role": "system", "content": "You are a helpful assistant that generates Python code."},
|
| 135 |
+
{"role": "user", "content": prompt}
|
| 136 |
+
])
|
| 137 |
+
|
| 138 |
+
code_to_execute = response.choices[0].message.content.strip()
|
| 139 |
+
code_to_execute = code_to_execute.replace("```python", "").replace("```", "").strip()
|
| 140 |
+
|
| 141 |
+
return code_to_execute
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
def execute_code(generated_code, csv_path):
|
| 145 |
+
"""Executes the generated Pandas code and captures the output."""
|
| 146 |
+
local_vars = {"pd": pd, "__file__": csv_path}
|
| 147 |
+
exec(generated_code, {}, local_vars)
|
| 148 |
+
return local_vars.get("result")
|
| 149 |
+
|
| 150 |
+
def fig_to_base64(fig):
|
| 151 |
+
buf = io.BytesIO()
|
| 152 |
+
fig.savefig(buf, format="png", bbox_inches="tight")
|
| 153 |
+
buf.seek(0)
|
| 154 |
+
img_str = base64.b64encode(buf.getvalue()).decode("utf-8")
|
| 155 |
+
buf.close()
|
| 156 |
+
return img_str
|
| 157 |
+
|
| 158 |
+
def plotly_to_base64(fig):
|
| 159 |
+
img_bytes = fig.to_image(format="png", scale=2)
|
| 160 |
+
img_str = base64.b64encode(img_bytes).decode("utf-8")
|
| 161 |
+
return img_str
|
| 162 |
+
|
| 163 |
+
def generate_plots(metadata, categories, values):
|
| 164 |
+
# Filter numeric values and categories
|
| 165 |
+
numeric_values = [v for v in values if isinstance(v, (int, float))]
|
| 166 |
+
numeric_categories = [c for c, v in zip(categories, values) if isinstance(v, (int, float))]
|
| 167 |
+
|
| 168 |
+
if not numeric_values:
|
| 169 |
+
st.warning("No numeric data to plot for this query.")
|
| 170 |
+
return []
|
| 171 |
+
|
| 172 |
+
sorted_categories, sorted_values = zip(*sorted(zip(numeric_categories, numeric_values), key=lambda x: x[1], reverse=True))
|
| 173 |
+
plots = []
|
| 174 |
+
|
| 175 |
+
if all(isinstance(c, str) for c in categories) and all(isinstance(v, (int, float)) for v in values):
|
| 176 |
+
sorted_categories, sorted_values = zip(*sorted(zip(categories, values), key=lambda x: x[1], reverse=True))
|
| 177 |
+
|
| 178 |
+
# Bar Plot (Main plot for string categories and numeric values)
|
| 179 |
+
fig_bar = px.bar(x=sorted_values, y=sorted_categories, orientation="h",
|
| 180 |
+
labels={"x": "Value", "y": "Category"},
|
| 181 |
+
title=f"{metadata['query']} (Bar Chart)",
|
| 182 |
+
color=sorted_values, color_continuous_scale="blues")
|
| 183 |
+
fig_bar.update_layout(yaxis=dict(categoryorder="total ascending"))
|
| 184 |
+
st.plotly_chart(fig_bar)
|
| 185 |
+
plots.append(("Bar Chart (Plotly)", plotly_to_base64(fig_bar)))
|
| 186 |
+
|
| 187 |
+
# Numeric plots (only if there are numeric values)
|
| 188 |
+
if any(isinstance(v, (int, float)) for v in values):
|
| 189 |
+
numeric_values = [v for v in values if isinstance(v, (int, float))]
|
| 190 |
+
numeric_categories = [c for c, v in zip(categories, values) if isinstance(v, (int, float))]
|
| 191 |
+
|
| 192 |
+
if numeric_values:
|
| 193 |
+
sorted_categories, sorted_values = zip(*sorted(zip(numeric_categories, numeric_values), key=lambda x: x[1], reverse=True))
|
| 194 |
+
|
| 195 |
+
# Bar Plot (Plotly)
|
| 196 |
+
fig1 = px.bar(x=sorted_categories, y=sorted_values, labels={"x": "Category", "y": metadata.get("unit", "Value")},
|
| 197 |
+
title=f"{metadata['query']} (Plotly Bar)", color=sorted_values, color_continuous_scale="blues")
|
| 198 |
+
st.plotly_chart(fig1)
|
| 199 |
+
plots.append(("Bar Plot (Plotly)", plotly_to_base64(fig1)))
|
| 200 |
+
|
| 201 |
+
# Pie Chart
|
| 202 |
+
fig2, ax2 = plt.subplots(figsize=(10, 8))
|
| 203 |
+
cmap = plt.get_cmap("tab20c")
|
| 204 |
+
colors = [cmap(i) for i in range(len(sorted_categories))]
|
| 205 |
+
wedges, texts = ax2.pie(sorted_values, labels=None, autopct=None, startangle=140, colors=colors, wedgeprops=dict(width=0.4))
|
| 206 |
+
legend_labels = [f"{cat} ({val / sum(sorted_values):.1%})" for cat, val in zip(sorted_categories, sorted_values)]
|
| 207 |
+
ax2.legend(wedges, legend_labels, title="Categories", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1), fontsize=10)
|
| 208 |
+
ax2.axis("equal")
|
| 209 |
+
ax2.set_title(f"{metadata['query']} (Pie)", fontsize=16)
|
| 210 |
+
st.pyplot(fig2)
|
| 211 |
+
plots.append(("Pie Chart", fig_to_base64(fig2)))
|
| 212 |
+
plt.close(fig2)
|
| 213 |
+
|
| 214 |
+
# Histogram
|
| 215 |
+
fig3, ax3 = plt.subplots(figsize=(10, 6))
|
| 216 |
+
ax3.hist(sorted_values, bins=10, color="skyblue", edgecolor="black")
|
| 217 |
+
ax3.set_title(f"Distribution of {metadata['query']} (Histogram)", fontsize=16)
|
| 218 |
+
st.pyplot(fig3)
|
| 219 |
+
plots.append(("Histogram", fig_to_base64(fig3)))
|
| 220 |
+
plt.close(fig3)
|
| 221 |
+
|
| 222 |
+
# Heatmap
|
| 223 |
+
fig4, ax4 = plt.subplots(figsize=(10, 6))
|
| 224 |
+
data_matrix = pd.DataFrame({metadata.get("unit", "Value"): sorted_values}, index=sorted_categories)
|
| 225 |
+
sns.heatmap(data_matrix, annot=True, cmap="Blues", ax=ax4, fmt=".1f")
|
| 226 |
+
ax4.set_title(f"{metadata['query']} (Heatmap)", fontsize=16)
|
| 227 |
+
st.pyplot(fig4)
|
| 228 |
+
plots.append(("Heatmap", fig_to_base64(fig4)))
|
| 229 |
+
plt.close(fig4)
|
| 230 |
+
|
| 231 |
+
# Scatter Plot
|
| 232 |
+
fig5 = px.scatter(x=sorted_categories, y=sorted_values, title=f"{metadata['query']} (Scatter Plot)",
|
| 233 |
+
labels={"x": "Category", "y": metadata.get("unit", "Value")})
|
| 234 |
+
st.plotly_chart(fig5)
|
| 235 |
+
plots.append(("Scatter Plot (Plotly)", plotly_to_base64(fig5)))
|
| 236 |
+
|
| 237 |
+
# Line Plot
|
| 238 |
+
fig6 = px.line(x=sorted_categories, y=sorted_values, title=f"{metadata['query']} (Line Plot)",
|
| 239 |
+
labels={"x": "Category", "y": metadata.get("unit", "Value")})
|
| 240 |
+
st.plotly_chart(fig6)
|
| 241 |
+
plots.append(("Line Plot (Plotly)", plotly_to_base64(fig6)))
|
| 242 |
+
|
| 243 |
+
# Box Plot
|
| 244 |
+
fig7, ax7 = plt.subplots(figsize=(10, 6))
|
| 245 |
+
ax7.boxplot(sorted_values, vert=False, tick_labels=["Data"], patch_artist=True)
|
| 246 |
+
ax7.set_title(f"{metadata['query']} (Box Plot)", fontsize=16)
|
| 247 |
+
st.pyplot(fig7)
|
| 248 |
+
plots.append(("Box Plot", fig_to_base64(fig7)))
|
| 249 |
+
plt.close(fig7)
|
| 250 |
+
|
| 251 |
+
# Violin Plot
|
| 252 |
+
fig8, ax8 = plt.subplots(figsize=(10, 6))
|
| 253 |
+
ax8.violinplot(sorted_values, vert=False, showmeans=True, showextrema=True)
|
| 254 |
+
ax8.set_title(f"{metadata['query']} (Violin Plot)", fontsize=16)
|
| 255 |
+
st.pyplot(fig8)
|
| 256 |
+
plots.append(("Violin Plot", fig_to_base64(fig8)))
|
| 257 |
+
plt.close(fig8)
|
| 258 |
+
|
| 259 |
+
# Area Chart
|
| 260 |
+
fig9 = px.area(x=sorted_categories, y=sorted_values, title=f"{metadata['query']} (Area Chart)", labels={"x": "Category", "y": metadata.get("unit", "Value")})
|
| 261 |
+
st.plotly_chart(fig9)
|
| 262 |
+
plots.append(("Area Chart (Plotly)", plotly_to_base64(fig9)))
|
| 263 |
+
|
| 264 |
+
# Radar Chart
|
| 265 |
+
fig10 = go.Figure(data=go.Scatterpolar(r=sorted_values, theta=sorted_categories, fill='toself', name=metadata['query']))
|
| 266 |
+
fig10.update_layout(polar=dict(radialaxis=dict(visible=True)), showlegend=True, title=f"{metadata['query']} (Radar Chart)")
|
| 267 |
+
|
| 268 |
+
st.plotly_chart(fig10)
|
| 269 |
+
plots.append(("Radar Chart (Plotly)", plotly_to_base64(fig10)))
|
| 270 |
+
|
| 271 |
+
else:
|
| 272 |
+
st.warning("No numeric data to plot for this query.")
|
| 273 |
+
|
| 274 |
+
return plots
|
| 275 |
+
|
| 276 |
+
def sanitize_filename(filename):
|
| 277 |
+
return re.sub(r'[^a-zA-Z0-9]', '_', filename)
|
| 278 |
+
|
| 279 |
+
def generate_pdf(query, response_text, chat_response, plots):
|
| 280 |
+
query = html.unescape(query)
|
| 281 |
+
response_text = html.unescape(response_text)
|
| 282 |
+
escaped_query = html.escape(query)
|
| 283 |
+
escaped_response_text = html.escape(response_text)
|
| 284 |
+
|
| 285 |
+
html_content = f"""
|
| 286 |
+
<!DOCTYPE html>
|
| 287 |
+
<html lang="ro">
|
| 288 |
+
<head>
|
| 289 |
+
<title>Data Analysis Report</title>
|
| 290 |
+
<meta charset="UTF-8">
|
| 291 |
+
<style>
|
| 292 |
+
body {{ font-family: Arial, sans-serif; margin: 20px; background-color: #f9f9f9; color: #333; }}
|
| 293 |
+
h1 {{ color: #1f77b4; text-align: center; }}
|
| 294 |
+
h3 {{ color: #2c3e50; border-bottom: 2px solid #ddd; padding-bottom: 5px; }}
|
| 295 |
+
h4 {{ color: #2980b9; }}
|
| 296 |
+
p {{ line-height: 1.6; background-color: #fff; padding: 10px; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1); }}
|
| 297 |
+
pre {{ background-color: #ecf0f1; padding: 10px; border-radius: 5px; font-size: 12px; }}
|
| 298 |
+
table {{ border-collapse: collapse; width: 100%; margin: 10px 0; page-break-inside: avoid; }}
|
| 299 |
+
th, td {{ border: 1px solid #bdc3c7; padding: 10px; text-align: left; }}
|
| 300 |
+
th {{ background-color: #3498db; color: white; }}
|
| 301 |
+
td {{ background-color: #fff; }}
|
| 302 |
+
img {{ max-width: 100%; height: auto; margin: 10px 0; page-break-inside: avoid; }}
|
| 303 |
+
.section {{ margin-bottom: 20px; }}
|
| 304 |
+
.no-break {{ page-break-inside: avoid; }}
|
| 305 |
+
.powered-by {{ text-align: center; margin-top: 20px; font-size: 10px; color: #777; }}
|
| 306 |
+
.logo {{ height: 100px; }}
|
| 307 |
+
</style>
|
| 308 |
+
</head>
|
| 309 |
+
<body>
|
| 310 |
+
<h1>Data Analysis Agent Interface</h1>
|
| 311 |
+
<div class="section no-break"><h3>Query</h3><p>{escaped_query}</p></div>
|
| 312 |
+
<div class="section no-break"><h3>Response</h3><p>{escaped_response_text}</p></div>
|
| 313 |
+
<div class="section no-break">
|
| 314 |
+
<h3>Raw Structured Response</h3>
|
| 315 |
+
<h4>Metadata</h4><pre>{json.dumps(chat_response["metadata"], indent=2, ensure_ascii=False)}</pre>
|
| 316 |
+
<h4>Data</h4>{pd.DataFrame(chat_response["data"]).to_html(index=False, classes="no-break", escape=False)}
|
| 317 |
+
</div>
|
| 318 |
+
<div class="section"><h3>Plots</h3>{"".join([f'<div class="no-break"><h4>{name}</h4><img src="data:image/png;base64,{base64}"/></div>' for name, base64 in plots])}</div>
|
| 319 |
+
<div class="powered-by">Powered by <img src="data:image/png;base64,{get_zega_logo_base64()}" class="logo"></div>
|
| 320 |
+
</body></html>
|
| 321 |
+
"""
|
| 322 |
+
|
| 323 |
+
html_file = "temp.html"
|
| 324 |
+
sanitized_query = sanitize_filename(query)
|
| 325 |
+
os.makedirs("./exported_pdfs", exist_ok=True)
|
| 326 |
+
pdf_file = f"./exported_pdfs/{sanitized_query}.pdf"
|
| 327 |
+
|
| 328 |
+
try:
|
| 329 |
+
with open(html_file, "w", encoding="utf-8") as f:
|
| 330 |
+
f.write(html_content)
|
| 331 |
+
options = {'encoding': "UTF-8", 'custom-header': [('Content-Type', 'text/html; charset=UTF-8')], 'no-outline': None}
|
| 332 |
+
pdfkit.from_file(html_file, pdf_file, options=options)
|
| 333 |
+
os.remove(html_file)
|
| 334 |
+
except Exception as e:
|
| 335 |
+
raise
|
| 336 |
+
return pdf_file
|
| 337 |
+
|
| 338 |
+
def get_zega_logo_base64():
|
| 339 |
+
try:
|
| 340 |
+
with open("zega_logo.png", "rb") as image_file:
|
| 341 |
+
encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
|
| 342 |
+
return encoded_string
|
| 343 |
+
except Exception as e:
|
| 344 |
+
raise
|
| 345 |
+
|
| 346 |
+
# Streamlit Interface
|
| 347 |
+
st.title("Data Analysis Agent Interface")
|
| 348 |
+
|
| 349 |
+
st.sidebar.markdown(
|
| 350 |
+
f"""
|
| 351 |
+
<div style="text-align: center;">
|
| 352 |
+
Powered by <img src="data:image/png;base64,{get_zega_logo_base64()}" style="height: 100px;">
|
| 353 |
+
</div>
|
| 354 |
+
""",
|
| 355 |
+
unsafe_allow_html=True,
|
| 356 |
+
)
|
| 357 |
+
st.sidebar.header("Sample Questions")
|
| 358 |
+
|
| 359 |
+
sample_questions = [
|
| 360 |
+
"Da-mi top cinci sucursale cu vânzări în perioada 01.03.2024-01.04.2024.",
|
| 361 |
+
"Da-mi vânzările defalcate pe produse pentru top cinci sucursale cu vânzări în perioada 01.03.2024-01.04.2024.",
|
| 362 |
+
"Da-mi vânzările defalcate pe pachete pentru top cinci sucursale cu vânzări în perioada 01.03.2024-01.04.2024.",
|
| 363 |
+
]
|
| 364 |
+
|
| 365 |
+
selected_question = st.sidebar.selectbox("Select a sample question:", sample_questions)
|
| 366 |
+
user_query = st.text_area("Please write one question at a time.", value=selected_question, height=100)
|
| 367 |
+
|
| 368 |
+
def process_query():
|
| 369 |
+
try:
|
| 370 |
+
generated_code = generate_code(user_query, column_info, sample_str, csv_path)
|
| 371 |
+
result = execute_code(generated_code, csv_path)
|
| 372 |
+
|
| 373 |
+
if isinstance(result, pd.DataFrame):
|
| 374 |
+
chat_response = {
|
| 375 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 376 |
+
"data": result.to_dict(orient='records'),
|
| 377 |
+
"csv_data": result.to_dict(orient='records'),
|
| 378 |
+
}
|
| 379 |
+
|
| 380 |
+
elif isinstance(result, pd.Series):
|
| 381 |
+
result = result.reset_index()
|
| 382 |
+
chat_response = {
|
| 383 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 384 |
+
"data": result.to_dict(orient='records'),
|
| 385 |
+
"csv_data": result.to_dict(orient='records'),
|
| 386 |
+
}
|
| 387 |
+
|
| 388 |
+
elif isinstance(result, list):
|
| 389 |
+
if all(isinstance(item, (int, float)) for item in result):
|
| 390 |
+
chat_response = {
|
| 391 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 392 |
+
"data": [{"category": str(i), "value": v} for i, v in enumerate(result)],
|
| 393 |
+
"csv_data": [{"category": str(i), "value": v} for i, v in enumerate(result)],
|
| 394 |
+
}
|
| 395 |
+
elif all(isinstance(item, dict) for item in result):
|
| 396 |
+
chat_response = {
|
| 397 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 398 |
+
"data": result,
|
| 399 |
+
"csv_data": result,
|
| 400 |
+
}
|
| 401 |
+
else:
|
| 402 |
+
st.warning("Result is a list with mixed data types. Please inspect.")
|
| 403 |
+
return
|
| 404 |
+
|
| 405 |
+
else:
|
| 406 |
+
chat_response = {
|
| 407 |
+
"metadata": {"query": user_query, "unit": "", "plot_types": []},
|
| 408 |
+
"data": [{"category": "Result", "value": result}],
|
| 409 |
+
"csv_data": [{"category": "Result", "value": result}],
|
| 410 |
+
}
|
| 411 |
+
|
| 412 |
+
st.markdown(f"<h3 style='color: #2e86de;'>Question:</h3>", unsafe_allow_html=True)
|
| 413 |
+
st.markdown(f"<p style='color: #2e86de;'>{user_query}</p>", unsafe_allow_html=True)
|
| 414 |
+
st.write("-" * 200)
|
| 415 |
+
|
| 416 |
+
# Initially hide the code.
|
| 417 |
+
with st.expander("Show the code"):
|
| 418 |
+
st.code(generated_code, language="python")
|
| 419 |
+
st.write("-" * 200)
|
| 420 |
+
|
| 421 |
+
|
| 422 |
+
st.markdown("### Data:")
|
| 423 |
+
st.dataframe(pd.DataFrame(chat_response["data"]))
|
| 424 |
+
|
| 425 |
+
metadata = chat_response["metadata"]
|
| 426 |
+
data = chat_response["data"]
|
| 427 |
+
|
| 428 |
+
if data and isinstance(data, list) and isinstance(data[0], dict):
|
| 429 |
+
if len(data[0]) == 1:
|
| 430 |
+
categories = [item[list(item.keys())[0]] for item in data]
|
| 431 |
+
values = categories
|
| 432 |
+
else:
|
| 433 |
+
categories = list(data[0].keys())
|
| 434 |
+
if len(categories) == 1:
|
| 435 |
+
values = [item[categories[0]] for item in data]
|
| 436 |
+
categories = values
|
| 437 |
+
else:
|
| 438 |
+
prioritized_columns = ["DENUMIRE_SUCURSALA", "NUMAR_CERERE", "size", "HIST_DATE", "COD_SUCURSALA", "COD_AGENTIE",
|
| 439 |
+
"DENUMIRE_AGENTIE", "PRODUS", "DATA_SEM_OFERTA", "DATA_STARE_CERERE", "STATUS_CERERE",
|
| 440 |
+
"DESCRIERE_STARE_CERERE", "DATA_IN_OFERTA", "PBA_BAZA", "PBA_ASIG_SUM",
|
| 441 |
+
"PBA_TOTAL_SEMNARE_CERERE", "PBA_CTR_ASOC", "PBA_TOTAL_EMITERE_CERERE", "FRECVENTA_PLATA"]
|
| 442 |
+
|
| 443 |
+
for col in prioritized_columns:
|
| 444 |
+
if all(col in item for item in data):
|
| 445 |
+
categories = [str(item[col]) for item in data]
|
| 446 |
+
if col != "NUMAR_CERERE" and col != "size":
|
| 447 |
+
if all("NUMAR_CERERE" in item for item in data):
|
| 448 |
+
values = [item.get("NUMAR_CERERE", 0) for item in data]
|
| 449 |
+
elif all("size" in item for item in data):
|
| 450 |
+
values = [item.get("size", 0) for item in data]
|
| 451 |
+
|
| 452 |
+
else:
|
| 453 |
+
numeric_col = next((c for c in data[0] if isinstance(data[0][c], (int, float))), None)
|
| 454 |
+
if numeric_col:
|
| 455 |
+
values = [item.get(numeric_col, 0) for item in data]
|
| 456 |
+
else:
|
| 457 |
+
values = [str(list(item.values())[1]) for item in data]
|
| 458 |
+
break
|
| 459 |
+
else:
|
| 460 |
+
values = [str(list(item.values())[1]) for item in data]
|
| 461 |
+
|
| 462 |
+
elif isinstance(data, list) and all(isinstance(item, (int, float)) for item in data):
|
| 463 |
+
categories = list(range(len(data)))
|
| 464 |
+
values = data
|
| 465 |
+
elif isinstance(data, (int, float, str)):
|
| 466 |
+
categories = ["Result"]
|
| 467 |
+
values = [data]
|
| 468 |
+
else:
|
| 469 |
+
categories = []
|
| 470 |
+
values = []
|
| 471 |
+
st.warning("Unexpected data format. Check the query and data.")
|
| 472 |
+
|
| 473 |
+
plots = generate_plots(metadata, categories, values)
|
| 474 |
+
|
| 475 |
+
st.session_state["query"] = user_query
|
| 476 |
+
st.session_state["response_text"] = result
|
| 477 |
+
st.session_state["chat_response"] = chat_response
|
| 478 |
+
st.session_state["plots"] = plots
|
| 479 |
+
st.session_state["generated_code"] = generated_code # Store the generated code
|
| 480 |
+
|
| 481 |
+
except Exception as e:
|
| 482 |
+
st.error(f"An error occurred: {e}")
|
| 483 |
+
|
| 484 |
+
if st.button("Submit"):
|
| 485 |
+
with st.spinner("Processing query..."):
|
| 486 |
+
try:
|
| 487 |
+
process_query()
|
| 488 |
+
except Exception as e:
|
| 489 |
+
st.error(f"An error occurred: {e}")
|
| 490 |
+
|
| 491 |
+
if "chat_response" in st.session_state:
|
| 492 |
+
if st.button("Download PDF"):
|
| 493 |
+
with st.spinner("Generating PDF..."):
|
| 494 |
+
try:
|
| 495 |
+
pdf_file = generate_pdf(
|
| 496 |
+
st.session_state["query"],
|
| 497 |
+
st.session_state["response_text"],
|
| 498 |
+
st.session_state["chat_response"],
|
| 499 |
+
st.session_state["plots"]
|
| 500 |
+
)
|
| 501 |
+
with open(pdf_file, "rb") as f:
|
| 502 |
+
pdf_data = f.read()
|
| 503 |
+
sanitized_query = sanitize_filename(st.session_state["query"])
|
| 504 |
+
st.download_button(
|
| 505 |
+
label="Click Here to Download PDF",
|
| 506 |
+
data=pdf_data,
|
| 507 |
+
file_name=f"{sanitized_query}.pdf",
|
| 508 |
+
mime="application/pdf",
|
| 509 |
+
)
|
| 510 |
+
except Exception as e:
|
| 511 |
+
st.error(f"PDF generation failed: {e}")
|
packages.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
wkhtmltopdf
|
requirements.txt
ADDED
|
@@ -0,0 +1,235 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
aiohappyeyeballs==2.4.8
|
| 2 |
+
aiohttp==3.11.13
|
| 3 |
+
aiosignal==1.3.2
|
| 4 |
+
altair==5.5.0
|
| 5 |
+
annotated-types==0.7.0
|
| 6 |
+
anyio==4.8.0
|
| 7 |
+
attrs==25.1.0
|
| 8 |
+
backoff==2.2.1
|
| 9 |
+
blinker==1.9.0
|
| 10 |
+
cachetools==5.5.2
|
| 11 |
+
certifi==2025.1.31
|
| 12 |
+
charset-normalizer==3.4.1
|
| 13 |
+
click==8.1.8
|
| 14 |
+
contourpy==1.3.1
|
| 15 |
+
cycler==0.12.1
|
| 16 |
+
dataclasses-json==0.6.7
|
| 17 |
+
distro==1.9.0
|
| 18 |
+
dotenv==0.9.9
|
| 19 |
+
fonttools==4.56.0
|
| 20 |
+
frozenlist==1.5.0
|
| 21 |
+
gitdb==4.0.12
|
| 22 |
+
GitPython==3.1.44
|
| 23 |
+
greenlet==3.1.1
|
| 24 |
+
h11==0.14.0
|
| 25 |
+
httpcore==1.0.7
|
| 26 |
+
httpx==0.28.1
|
| 27 |
+
httpx-sse==0.4.0
|
| 28 |
+
idna==3.10
|
| 29 |
+
Jinja2==3.1.5
|
| 30 |
+
jiter==0.8.2
|
| 31 |
+
jsonpatch==1.33
|
| 32 |
+
jsonpointer==3.0.0
|
| 33 |
+
jsonschema==4.23.0
|
| 34 |
+
jsonschema-specifications==2024.10.1
|
| 35 |
+
kaleido==0.2.1
|
| 36 |
+
kiwisolver==1.4.8
|
| 37 |
+
langchain==0.3.19
|
| 38 |
+
langchain-community==0.3.18
|
| 39 |
+
langchain-core==0.3.40
|
| 40 |
+
langchain-experimental==0.3.4
|
| 41 |
+
langchain-openai==0.3.7
|
| 42 |
+
langchain-text-splitters==0.3.6
|
| 43 |
+
langfuse==2.59.7
|
| 44 |
+
langsmith==0.3.11
|
| 45 |
+
markdown-it-py==3.0.0
|
| 46 |
+
MarkupSafe==3.0.2
|
| 47 |
+
marshmallow==3.26.1
|
| 48 |
+
matplotlib==3.10.1
|
| 49 |
+
mdurl==0.1.2
|
| 50 |
+
multidict==6.1.0
|
| 51 |
+
mypy-extensions==1.0.0
|
| 52 |
+
narwhals==1.29.0
|
| 53 |
+
numpy==2.2.3
|
| 54 |
+
openai==1.65.2
|
| 55 |
+
orjson==3.10.15
|
| 56 |
+
packaging==24.2
|
| 57 |
+
pandas==2.2.3
|
| 58 |
+
pdfkit==1.0.0
|
| 59 |
+
pillow==11.1.0
|
| 60 |
+
plotly==6.0.0
|
| 61 |
+
propcache==0.3.0
|
| 62 |
+
protobuf==5.29.3
|
| 63 |
+
pyarrow==19.0.1
|
| 64 |
+
pydantic==2.10.6
|
| 65 |
+
pydantic-settings==2.8.1
|
| 66 |
+
pydantic_core==2.27.2
|
| 67 |
+
pydeck==0.9.1
|
| 68 |
+
Pygments==2.19.1
|
| 69 |
+
pyparsing==3.2.1
|
| 70 |
+
python-dateutil==2.9.0.post0
|
| 71 |
+
python-dotenv==1.0.1
|
| 72 |
+
pytz==2025.1
|
| 73 |
+
PyYAML==6.0.2
|
| 74 |
+
referencing==0.36.2
|
| 75 |
+
regex==2024.11.6
|
| 76 |
+
requests==2.32.3
|
| 77 |
+
requests-toolbelt==1.0.0
|
| 78 |
+
rich==13.9.4
|
| 79 |
+
rpds-py==0.23.1
|
| 80 |
+
seaborn==0.13.2
|
| 81 |
+
setuptools==75.8.0
|
| 82 |
+
six==1.17.0
|
| 83 |
+
smmap==5.0.2
|
| 84 |
+
sniffio==1.3.1
|
| 85 |
+
SQLAlchemy==2.0.38
|
| 86 |
+
streamlit==1.42.2
|
| 87 |
+
tabulate==0.9.0
|
| 88 |
+
tenacity==9.0.0
|
| 89 |
+
tiktoken==0.9.0
|
| 90 |
+
toml==0.10.2
|
| 91 |
+
tornado==6.4.2
|
| 92 |
+
tqdm==4.67.1
|
| 93 |
+
typing-inspect==0.9.0
|
| 94 |
+
typing_extensions==4.12.2
|
| 95 |
+
tzdata==2025.1
|
| 96 |
+
urllib3==2.3.0
|
| 97 |
+
watchdog==6.0.0
|
| 98 |
+
wheel==0.45.1
|
| 99 |
+
wkhtmltopdf==0.2
|
| 100 |
+
wrapt==1.17.2
|
| 101 |
+
yarl==1.18.3
|
| 102 |
+
zstandard==0.23.0
|
| 103 |
+
aiohappyeyeballs==2.4.8
|
| 104 |
+
aiohttp==3.11.13
|
| 105 |
+
aiosignal==1.3.2
|
| 106 |
+
altair==5.5.0
|
| 107 |
+
annotated-types==0.7.0
|
| 108 |
+
anyio==4.8.0
|
| 109 |
+
attrs==25.1.0
|
| 110 |
+
backoff==2.2.1
|
| 111 |
+
blinker==1.9.0
|
| 112 |
+
cachetools==5.5.2
|
| 113 |
+
certifi==2025.1.31
|
| 114 |
+
charset-normalizer==3.4.1
|
| 115 |
+
click==8.1.8
|
| 116 |
+
contourpy==1.3.1
|
| 117 |
+
cycler==0.12.1
|
| 118 |
+
dataclasses-json==0.6.7
|
| 119 |
+
distro==1.9.0
|
| 120 |
+
dotenv==0.9.9
|
| 121 |
+
fonttools==4.56.0
|
| 122 |
+
frozenlist==1.5.0
|
| 123 |
+
gitdb==4.0.12
|
| 124 |
+
GitPython==3.1.44
|
| 125 |
+
greenlet==3.1.1
|
| 126 |
+
grpcio==1.70.0
|
| 127 |
+
grpcio-tools==1.70.0
|
| 128 |
+
h11==0.14.0
|
| 129 |
+
h2==4.2.0
|
| 130 |
+
hpack==4.1.0
|
| 131 |
+
httpcore==1.0.7
|
| 132 |
+
httpx==0.28.1
|
| 133 |
+
httpx-sse==0.4.0
|
| 134 |
+
huggingface-hub==0.26.2
|
| 135 |
+
hyperframe==6.1.0
|
| 136 |
+
idna==3.10
|
| 137 |
+
Jinja2==3.1.5
|
| 138 |
+
jiter==0.8.2
|
| 139 |
+
jsonpatch==1.33
|
| 140 |
+
jsonpointer==3.0.0
|
| 141 |
+
jsonschema==4.23.0
|
| 142 |
+
jsonschema-specifications==2024.10.1
|
| 143 |
+
kaleido==0.2.1
|
| 144 |
+
kiwisolver==1.4.8
|
| 145 |
+
kornia==0.7.4
|
| 146 |
+
kornia_rs==0.1.7
|
| 147 |
+
langchain==0.3.19
|
| 148 |
+
langchain-community==0.3.18
|
| 149 |
+
langchain-core==0.3.40
|
| 150 |
+
langchain-experimental==0.3.4
|
| 151 |
+
langchain-openai==0.3.7
|
| 152 |
+
langchain-text-splitters==0.3.6
|
| 153 |
+
langfuse==2.59.7
|
| 154 |
+
langsmith==0.3.11
|
| 155 |
+
markdown-it-py==3.0.0
|
| 156 |
+
MarkupSafe==3.0.2
|
| 157 |
+
marshmallow==3.26.1
|
| 158 |
+
matplotlib==3.10.1
|
| 159 |
+
mdurl==0.1.2
|
| 160 |
+
multidict==6.1.0
|
| 161 |
+
mypy-extensions==1.0.0
|
| 162 |
+
narwhals==1.29.0
|
| 163 |
+
numpy==2.2.3
|
| 164 |
+
nvidia-cublas-cu12==12.4.5.8
|
| 165 |
+
nvidia-cuda-cupti-cu12==12.4.127
|
| 166 |
+
nvidia-cuda-nvrtc-cu12==12.4.127
|
| 167 |
+
nvidia-cuda-runtime-cu12==12.4.127
|
| 168 |
+
nvidia-cudnn-cu12==9.1.0.70
|
| 169 |
+
nvidia-cufft-cu12==11.2.1.3
|
| 170 |
+
nvidia-curand-cu12==10.3.5.147
|
| 171 |
+
nvidia-cusolver-cu12==11.6.1.9
|
| 172 |
+
nvidia-cusparse-cu12==12.3.1.170
|
| 173 |
+
nvidia-nccl-cu12==2.21.5
|
| 174 |
+
nvidia-nvjitlink-cu12==12.4.127
|
| 175 |
+
nvidia-nvtx-cu12==12.4.127
|
| 176 |
+
ollama==0.4.7
|
| 177 |
+
openai==1.65.2
|
| 178 |
+
orjson==3.10.15
|
| 179 |
+
packaging==24.2
|
| 180 |
+
pandas==2.2.3
|
| 181 |
+
pdfkit==1.0.0
|
| 182 |
+
pillow==11.1.0
|
| 183 |
+
plotly==6.0.0
|
| 184 |
+
portalocker==2.10.1
|
| 185 |
+
propcache==0.3.0
|
| 186 |
+
protobuf==5.29.3
|
| 187 |
+
pyarrow==19.0.1
|
| 188 |
+
pydantic==2.10.6
|
| 189 |
+
pydantic-settings==2.8.1
|
| 190 |
+
pydantic_core==2.27.2
|
| 191 |
+
pydeck==0.9.1
|
| 192 |
+
Pygments==2.19.1
|
| 193 |
+
pyparsing==3.2.1
|
| 194 |
+
python-dateutil==2.9.0.post0
|
| 195 |
+
python-dotenv==1.0.1
|
| 196 |
+
pytz==2025.1
|
| 197 |
+
PyYAML==6.0.2
|
| 198 |
+
qdrant-client==1.13.2
|
| 199 |
+
referencing==0.36.2
|
| 200 |
+
regex==2024.11.6
|
| 201 |
+
requests==2.32.3
|
| 202 |
+
requests-toolbelt==1.0.0
|
| 203 |
+
rich==13.9.4
|
| 204 |
+
rpds-py==0.23.1
|
| 205 |
+
safetensors==0.4.5
|
| 206 |
+
seaborn==0.13.2
|
| 207 |
+
sentencepiece==0.2.0
|
| 208 |
+
setuptools==75.8.0
|
| 209 |
+
six==1.17.0
|
| 210 |
+
smmap==5.0.2
|
| 211 |
+
sniffio==1.3.1
|
| 212 |
+
soundfile==0.12.1
|
| 213 |
+
spandrel==0.4.0
|
| 214 |
+
SQLAlchemy==2.0.38
|
| 215 |
+
streamlit==1.42.2
|
| 216 |
+
sympy==1.13.1
|
| 217 |
+
tabulate==0.9.0
|
| 218 |
+
tenacity==9.0.0
|
| 219 |
+
tiktoken==0.9.0
|
| 220 |
+
toml==0.10.2
|
| 221 |
+
torchsde==0.2.6
|
| 222 |
+
tornado==6.4.2
|
| 223 |
+
tqdm==4.67.1
|
| 224 |
+
trampoline==0.1.2
|
| 225 |
+
triton==3.1.0
|
| 226 |
+
typing-inspect==0.9.0
|
| 227 |
+
typing_extensions==4.12.2
|
| 228 |
+
tzdata==2025.1
|
| 229 |
+
urllib3==2.3.0
|
| 230 |
+
watchdog==6.0.0
|
| 231 |
+
wheel==0.45.1
|
| 232 |
+
wkhtmltopdf==0.2
|
| 233 |
+
wrapt==1.17.2
|
| 234 |
+
yarl==1.18.3
|
| 235 |
+
zstandard==0.23.0
|
zega_logo.png
ADDED
|