Vlad Bastina commited on
Commit
1d55012
·
0 Parent(s):

program files

Browse files
Files changed (10) hide show
  1. .gitattributes +1 -0
  2. .gitignore +2 -0
  3. .streamlit/config.toml +2 -0
  4. README.md +110 -0
  5. SalesData.csv +3 -0
  6. app_generated.py +481 -0
  7. app_hardcoded.py +511 -0
  8. packages.txt +1 -0
  9. requirements.txt +235 -0
  10. zega_logo.png +0 -0
.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ *.csv filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ .env
2
+ .streamlit/secrets.toml
.streamlit/config.toml ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [theme]
2
+ base="light"
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Data Analysis Agent Interface with Streamlit
2
+
3
+ This Streamlit application provides an interface for interacting with a data analysis agent powered by OpenAI's language models. It allows users to ask questions about data in a CSV file and receive answers in the form of Pandas code, data tables, and visualizations. The application also supports generating a PDF report of the analysis.
4
+
5
+ ## Features
6
+
7
+ * **Natural Language Queries:** Ask questions in plain English (or Romanian) about the data.
8
+ * **Automatic Code Generation:** The agent generates Pandas code to answer the query.
9
+ * **Data Display:** Results are displayed as interactive DataFrames.
10
+ * **Visualization:** Generates various plots (bar charts, pie charts, histograms, heatmaps, scatter plots, line plots, box plots, violin plots, area charts, and radar charts) based on the query and data.
11
+ * **PDF Report Generation:** Download a PDF report containing the query, generated code, data table, and plots.
12
+ * **Syntax-Highlighted Code:** The generated Python code is displayed in a scrollable, syntax-highlighted code block for easy readability.
13
+ * **Collapsible Code Display:** The generated code is hidden by default, with an expander to reveal it on demand.
14
+ * **Sample Questions:** Provides a set of sample questions to get started.
15
+ * **Powered by ZEGA.ai:** Includes ZEGA.ai branding.
16
+
17
+ ## Getting Started
18
+
19
+ ### Prerequisites
20
+
21
+ * Python 3.7+
22
+ * An OpenAI API key
23
+ * pdfkit: you need to have wkhtmltopdf installed on your system.
24
+ * **Windows**: Download and install from [wkhtmltopdf.org](https://wkhtmltopdf.org/downloads.html). Add the `wkhtmltopdf/bin` directory to your system's PATH.
25
+ * **macOS**: `brew install wkhtmltopdf`
26
+ * **Linux (Debian/Ubuntu)**: `sudo apt-get install wkhtmltopdf`
27
+ * **Linux (CentOS/RHEL)**: `sudo yum install wkhtmltopdf`
28
+
29
+ ### Installation
30
+
31
+ 1. **Clone the repository:**
32
+
33
+ ```bash
34
+ git clone <your_repository_url>
35
+ cd <your_repository_directory>
36
+ ```
37
+
38
+ 2. **Install dependencies:**
39
+
40
+ ```bash
41
+ pip install -r requirements.txt
42
+ ```
43
+ Create the `requirements.txt` and place this in:
44
+ ```
45
+ streamlit
46
+ pandas
47
+ matplotlib
48
+ plotly
49
+ python-dotenv
50
+ langchain
51
+ langchain-experimental
52
+ langchain-openai
53
+ seaborn
54
+ pdfkit
55
+ openai
56
+ ```
57
+
58
+ 3. **Create a `.env` file:**
59
+
60
+ Create a file named `.env` in the root directory of your project. Add your OpenAI API key to this file:
61
+
62
+ ```
63
+ OPENAI_API_KEY=your_openai_api_key_here
64
+ ```
65
+ Replace `your_openai_api_key_here` with your actual API key.
66
+
67
+ 4. **Place the CSV data file:**
68
+
69
+ Place the `asig_sales_31012025.csv` file in the same directory as your script. If you use a different CSV file, update the `csv_path` variable in the script.
70
+
71
+ 5. **Place Zega logo**
72
+ Place the `zega_logo.png` into the folder.
73
+
74
+ ### Usage
75
+
76
+ 1. **Run the Streamlit app:**
77
+
78
+ ```bash
79
+ streamlit run your_script_name.py
80
+ ```
81
+ Replace `your_script_name.py` with the name of your Python script.
82
+
83
+ 2. **Interact with the app:**
84
+
85
+ * Select a sample question from the sidebar or enter your own question in the text area. Ensure you ask only one question at a time.
86
+ * Click the "Submit" button.
87
+ * The results (data table and plots) will be displayed.
88
+ * Click the "Show the code" expander to view the generated Pandas code.
89
+ * Click the "Download PDF" button to generate a PDF report.
90
+
91
+ ## File Structure
92
+
93
+ * **`your_script_name.py`:** The main Streamlit application script.
94
+ * **`.env`:** Contains your OpenAI API key (should *not* be committed to Git).
95
+ * **`requirements.txt`:** Lists the required Python packages.
96
+ * **`asig_sales_31012025.csv`:** The CSV data file (or your custom data file).
97
+ * **`zega_logo.png`:** Zega logo.
98
+ * **`exported_pdfs/`:** A directory (created automatically) where generated PDF reports are saved.
99
+ * **`README.md`:** This file.
100
+
101
+ ## Important Notes
102
+
103
+ * **Date Format:** The script is specifically configured to handle dates in the European DD/MM/YYYY format. Ensure your CSV data uses this format. The `parse_dates` argument in `pd.read_csv` is crucial for correct date handling.
104
+ * **OpenAI API Key:** Keep your OpenAI API key secure. Do *not* commit the `.env` file to your Git repository. Add `.env` to your `.gitignore` file.
105
+ * **Error Handling:** The script includes basic error handling (checking for the CSV file), but you might want to add more robust error handling for production use.
106
+ * **wkhtmltopdf:** Ensure `wkhtmltopdf` is correctly installed and accessible in your system's PATH for PDF generation to work.
107
+ * **Prompt Engineering:** The quality of the generated code depends heavily on the prompt used in the `generate_code` function. The provided prompt is highly detailed and includes specific instructions for the agent. You may need to adjust the prompt if you encounter issues or use a different CSV file with different column names or data structures.
108
+ * **One Question:** The app is designed to process one question at a time. Asking multiple questions in a single input may lead to unexpected behavior.
109
+
110
+
SalesData.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05349b4feb225c6d0f0899ab7465d9346c052de0e21f07bec7b56bb6c4b27565
3
+ size 22441174
app_generated.py ADDED
@@ -0,0 +1,481 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import matplotlib.pyplot as plt
4
+ import plotly.express as px
5
+ from dotenv import load_dotenv
6
+ from langchain.agents.agent_types import AgentType
7
+ from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
8
+ from langchain_openai import ChatOpenAI
9
+ import os
10
+ import seaborn as sns
11
+ import plotly.graph_objects as go
12
+ import json
13
+ import pdfkit
14
+ import io
15
+ import base64
16
+ from matplotlib.backends.backend_agg import FigureCanvasAgg
17
+ import html
18
+ import re
19
+ from openai import OpenAI
20
+ from io import StringIO
21
+
22
+ load_dotenv()
23
+
24
+ # --- Configuration ---
25
+ OPENAI_API_KEY=os.getenv("OPENAI_API_KEY") or st.secrets.get("OPENAI_API_KEY")
26
+
27
+ client = OpenAI(api_key=OPENAI_API_KEY)
28
+ csv_path = "SalesData.csv"
29
+
30
+ if not os.path.exists(csv_path):
31
+ print(f"Error: CSV file '{csv_path}' not found.")
32
+ exit(1)
33
+
34
+ def get_csv_sample(csv_path, sample_size=5):
35
+ """Reads a CSV file and returns column info, a sample, and the DataFrame."""
36
+ df = pd.read_csv(csv_path)
37
+ sample_df = df.sample(n=min(sample_size, len(df)), random_state=42)
38
+ return df.dtypes.to_string(), sample_df.to_string(index=False), df
39
+
40
+ column_info, sample_str, _ = get_csv_sample(csv_path)
41
+
42
+ # @observe()
43
+ def chat(response_text):
44
+ return json.loads(response_text) # Directly parse the JSON
45
+
46
+ def generate_code(question, column_info, sample_str, csv_path, model_name="gpt-4o"):
47
+ """Asks OpenAI to generate Pandas code for a given question."""
48
+ prompt = f"""You are a highly skilled Python data analyst with expert-level proficiency in Pandas. Your task is to write **concise, correct, and efficient** Pandas code to answer a specific question about data contained within a CSV file. The code you generate must be self-contained, directly executable, and produce the correct numerical output or DataFrame structure.
49
+
50
+ **CSV File Information:**
51
+
52
+ * **Path:** '{csv_path}'
53
+ * **Column Information:** (This tells you the names and data types of the columns)
54
+ ```
55
+ {column_info}
56
+ ```
57
+ * **Sample Data:** (This gives you a glimpse of the data's structure. Note the European date format DD/MM/YYYY)
58
+ ```
59
+ {sample_str}
60
+ ```
61
+
62
+ **Strict Requirements (Follow these EXACTLY):**
63
+ 0. **Multi-part Questions:**
64
+ * If the user asks a multi-part question, **reformat it** to process each part correctly while maintaining the original meaning. **Do not change the intent** of the question.
65
+ * **For multi-part questions**, the code should reflect how each part of the question is handled. You must ensure that each part is processed and combined correctly at the end.
66
+ * **Print a statement** explaining how you processed the multi-part question, e.g., `print("Question was split into parts for processing.")`.
67
+
68
+ 1. **Load Data and Parse Dates:** Your code *MUST* begin with the following line to load the data, correctly parsing *ALL* potential date columns:
69
+ ```python
70
+ import pandas as pd
71
+ df = pd.read_csv('{csv_path}', parse_dates=['Order Date'])
72
+ ```
73
+ Do *NOT* modify this line. The `parse_dates` argument is *critical* for correct date handling.
74
+
75
+ 2. **Imports:** Do *NOT* import any libraries other than pandas (which is already imported as `pd`). Do *NOT* use `numpy` or `datetime` directly, unless it is used within the context of parsing in read_csv. Pandas is sufficient for all tasks.
76
+
77
+ 3. **Output:**
78
+ * Store your final answer in a variable named `result`.
79
+ * Print the `result` variable using `print(result)`.
80
+ * Do *NOT* use `display()`.
81
+ * The output must be a Pandas DataFrame, Series, or a single value, as appropriate for the question. If it's a DataFrame or Series, ensure the index is reset where appropriate (e.g., after a `groupby()` followed by `.size()`).
82
+
83
+ 4. **Conciseness and Style:**
84
+ * Write the *most concise* and efficient Pandas code possible.
85
+ * Use method chaining (e.g., `df.groupby(...).sum().sort_values().head()`) whenever possible and appropriate.
86
+ * Avoid unnecessary intermediate variables unless they *significantly* improve readability.
87
+ * Use clear and understandable variable names for filtered dataframes, (for example: df_2019, df_filtered etc)
88
+ * If calculating a percentage or distribution, combine operations efficiently, ideally in a single chained expression.
89
+
90
+ 5. **Correctness:** Your code *MUST* be syntactically correct Python and *MUST* produce the correct answer to the question. Double-check your logic, especially when grouping and aggregating. Pay close attention to the wording of the question.
91
+
92
+ 6. **Date and Time Conditions (Implicit Filtering):**
93
+ * **Any question that refers to dates, time periods, months, years, or uses phrases like "issued in," "policies from," "between [dates]," etc., *MUST* filter the data using the `DATA_SEM_OFERTA` column.** This is the *implied* date column for policy issuance. Do *NOT* ask the user which column to use; assume `DATA_SEM_OFERTA`.
94
+ * When filtering dates, use combined boolean conditions for efficiency, e.g., `df[(df['Order Date'].dt.year == 2019) & (df['Order Date'].dt.month == 12)]` rather than separate filtering steps.
95
+
96
+ 7. **Column Names:** Use the *exact* column names provided in the "CSV Column Information." Pay close attention to capitalization, spaces, and any special characters.
97
+
98
+ 8. **No Explanations:** Output *ONLY* the Python code. Do *NOT* include any comments, explanations, surrounding text, or markdown formatting (like ```python). Just the code.
99
+
100
+ 9. **Aggregation (VERY IMPORTANT):** When the question asks for:
101
+ * "top N" or "first N"
102
+ * "most frequent"
103
+ * "highest/lowest" (after grouping)
104
+ * "average/sum/count per [group]"
105
+ * **Calculate Percentage**: When percentage is asked, compute the correct percentage value
106
+
107
+ You *MUST* perform a `groupby()` operation *BEFORE* sorting or selecting the top N values. The correct order is:
108
+ 1. Filter the DataFrame (if needed, using boolean indexing).
109
+ 2. Group by the appropriate column(s) using `.groupby()`.
110
+ 3. Apply an aggregation function (e.g., `.sum()`, `.mean()`, `.size()`, `.count()`, `.median()`).
111
+ 4. *Then*, sort (if needed) using `.sort_values()` and/or select the top N (if needed) using `.nlargest()` or `.head()`.
112
+
113
+ 10. **Error Handling:** Assume the CSV file exists and is correctly formatted. You do *not* need to write any explicit error handling code.
114
+
115
+ 11. **Clarity:** Use clear and meaningful variable names if you create intermediate dataframes, but prioritize conciseness.
116
+ **Column Usage Guidance:**
117
+
118
+
119
+ 13. primele means .nlargest and ultimele means .nsmallest
120
+ * Use *Product* when referring to specific items sold (e.g., "most popular product," "top-selling product").
121
+ * Use *City* when grouping or summarizing sales by location (e.g., "which city had the highest revenue?").
122
+ * Use *Order* Date for any time-based filtering (e.g., "sales in December," "transactions between January and March").
123
+ * Use *Sales* for financial aggregations (e.g., total revenue, average sale per transaction).
124
+ * Use *Quantity* Ordered when analyzing product demand (e.g., "most sold product in terms of units").
125
+ * Use *Hour* to analyze time-based trends (e.g., "which hour has the highest number of purchases?").
126
+
127
+ **Question:**
128
+ {question}
129
+ """
130
+
131
+ response = client.chat.completions.create(model=model_name,
132
+ temperature=0, # Keep temperature at 0 for consistent, deterministic code
133
+ messages=[
134
+ {"role": "system", "content": "You are a helpful assistant that generates Python code."},
135
+ {"role": "user", "content": prompt}
136
+ ])
137
+
138
+ code_to_execute = response.choices[0].message.content.strip()
139
+ code_to_execute = code_to_execute.replace("```python", "").replace("```", "").strip()
140
+
141
+ return code_to_execute
142
+
143
+
144
+ def execute_code(generated_code, csv_path):
145
+ """Executes the generated Pandas code and captures the output."""
146
+ local_vars = {"pd": pd, "__file__": csv_path}
147
+ exec(generated_code, {}, local_vars)
148
+ return local_vars.get("result")
149
+
150
+ def generate_plot_code(question, dataframe, model_name="gpt-4o"):
151
+ """Asks OpenAI to generate plotting code based on the question and dataframe."""
152
+
153
+ # Convert dataframe to string representation
154
+ df_str = dataframe.to_string(index=False)
155
+ df_json = dataframe.to_json(orient="records")
156
+
157
+ prompt = f"""You are a data visualization expert. Create Python code to visualize the data below based on the user's question. The visualizations must comprehensively represent *all* the information returned by the query to effectively answer the question.
158
+
159
+ **User Question:**
160
+ {question}
161
+
162
+ **Data (first few rows):**
163
+ ```
164
+ {df_str}
165
+ ```
166
+
167
+ **Data (JSON format):**
168
+ ```json
169
+ {df_json}
170
+ ```
171
+
172
+ **Requirements:**
173
+ 1. Create 4-7 different, meaningful visualizations that collectively represent all aspects of the data returned by the query, ensuring no key information is omitted.
174
+ 2. Ensure each visualization is simple, clear, and directly tied to a specific part of the data or question, while together they cover the full scope of the result.
175
+ 3. Use ONLY Matplotlib and Seaborn (avoid Plotly to prevent compatibility issues).
176
+ 4. Include proper titles, labels, and legends for clarity, reflecting the specific data being visualized.
177
+ 5. Use appropriate color schemes that are visually appealing and accessible (e.g., colorblind-friendly palettes like Seaborn's 'colorblind').
178
+ 6. Return a list of tuples containing the plot title and the base64-encoded image.
179
+ 7. Make sure to close all plt figures with plt.close() after adding each to the plots list to prevent memory issues.
180
+ 8. If the data includes categories (e.g., sucursale, produse, pachete), ensure these are fully represented across the plots (e.g., bar charts, pie charts, or grouped visuals).
181
+ 9. If the data includes numerical values (e.g., sales, totals), use appropriate plot types (e.g., bar, line, or scatter) to show trends, comparisons, or distributions.
182
+ 10. If the question involves time periods, ensure at least one visualization reflects the temporal aspect using the relevant date information.
183
+
184
+ **Output Format:**
185
+ Your code should ONLY include a function called `create_plots(data)` that takes a pandas DataFrame as input and returns a list of tuples containing the plot titles and the base64-encoded images.
186
+
187
+ Return only the function definition without any explanations, imports, or additional code. Do NOT include any Streamlit-specific code.
188
+ """
189
+
190
+ response = client.chat.completions.create(model=model_name,
191
+ temperature=0.2, # Slightly higher temperature for creative visualizations
192
+ messages=[
193
+ {"role": "system", "content": "You are a data visualization expert who creates Python code for plotting data."},
194
+ {"role": "user", "content": prompt}
195
+ ])
196
+
197
+ plot_code = response.choices[0].message.content.strip()
198
+ plot_code = plot_code.replace("```python", "").replace("```", "").strip()
199
+
200
+ return plot_code
201
+
202
+ def execute_plot_code(plot_code, result_df):
203
+ """Executes the generated plotting code and captures the outputs."""
204
+ try:
205
+ # Create a dictionary with all the necessary imports
206
+ globals_dict = {
207
+ "pd": pd,
208
+ "plt": plt,
209
+ "px": px,
210
+ "sns": sns,
211
+ "go": go,
212
+ "io": io,
213
+ "base64": base64,
214
+ "np": __import__('numpy'),
215
+ "plotly": __import__('plotly')
216
+ }
217
+
218
+ # Create a local variables dictionary with the data
219
+ local_vars = {
220
+ "data": result_df
221
+ }
222
+
223
+ # Define the helper functions first
224
+ helper_code = """
225
+ def fig_to_base64(fig):
226
+ buf = io.BytesIO()
227
+ fig.savefig(buf, format="png", bbox_inches="tight")
228
+ buf.seek(0)
229
+ img_str = base64.b64encode(buf.getvalue()).decode("utf-8")
230
+ buf.close()
231
+ return img_str
232
+
233
+ def plotly_to_base64(fig):
234
+ # For Plotly figures, convert to image bytes and then to base64
235
+ img_bytes = fig.to_image(format="png", scale=2)
236
+ img_str = base64.b64encode(img_bytes).decode("utf-8")
237
+ return img_str
238
+ """
239
+
240
+ # Execute the helper functions first
241
+ exec(helper_code, globals_dict, local_vars)
242
+
243
+ # Then execute the plot code
244
+ exec(plot_code, globals_dict, local_vars)
245
+
246
+ # Get the plots from the create_plots function
247
+ if "create_plots" in local_vars:
248
+ plots = local_vars["create_plots"](result_df)
249
+ return plots
250
+ elif "plots" in local_vars:
251
+ return local_vars["plots"]
252
+ else:
253
+ return []
254
+ except Exception as e:
255
+ st.error(f"Error executing plot code: {str(e)}")
256
+ import traceback
257
+ st.error(traceback.format_exc())
258
+ return []
259
+
260
+ def sanitize_filename(filename):
261
+ return re.sub(r'[^a-zA-Z0-9]', '_', filename)
262
+
263
+ def generate_pdf(query, response_text, chat_response, plots):
264
+ query = html.unescape(query)
265
+ response_text = html.unescape(response_text)
266
+ escaped_query = html.escape(query)
267
+ escaped_response_text = html.escape(response_text)
268
+
269
+ html_content = f"""
270
+ <!DOCTYPE html>
271
+ <html lang="ro">
272
+ <head>
273
+ <title>Data Analysis Report</title>
274
+ <meta charset="UTF-8">
275
+ <style>
276
+ body {{ font-family: Arial, sans-serif; margin: 20px; background-color: #f9f9f9; color: #333; }}
277
+ h1 {{ color: #1f77b4; text-align: center; }}
278
+ h3 {{ color: #2c3e50; border-bottom: 2px solid #ddd; padding-bottom: 5px; }}
279
+ h4 {{ color: #2980b9; }}
280
+ p {{ line-height: 1.6; background-color: #fff; padding: 10px; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1); }}
281
+ pre {{ background-color: #ecf0f1; padding: 10px; border-radius: 5px; font-size: 12px; }}
282
+ table {{ border-collapse: collapse; width: 100%; margin: 10px 0; page-break-inside: avoid; }}
283
+ th, td {{ border: 1px solid #bdc3c7; padding: 10px; text-align: left; }}
284
+ th {{ background-color: #3498db; color: white; }}
285
+ td {{ background-color: #fff; }}
286
+ img {{ max-width: 100%; height: auto; margin: 10px 0; page-break-inside: avoid; }}
287
+ .section {{ margin-bottom: 20px; }}
288
+ .no-break {{ page-break-inside: avoid; }}
289
+ .powered-by {{ text-align: center; margin-top: 20px; font-size: 10px; color: #777; }}
290
+ .logo {{ height: 100px; }}
291
+ </style>
292
+ </head>
293
+ <body>
294
+ <h1>Data Analysis Agent Interface</h1>
295
+ <div class="section no-break"><h3>Query</h3><p>{escaped_query}</p></div>
296
+ <div class="section no-break"><h3>Response</h3><p>{escaped_response_text}</p></div>
297
+ <div class="section no-break">
298
+ <h3>Raw Structured Response</h3>
299
+ <h4>Metadata</h4><pre>{json.dumps(chat_response["metadata"], indent=2, ensure_ascii=False)}</pre>
300
+ <h4>Data</h4>{pd.DataFrame(chat_response["data"]).to_html(index=False, classes="no-break", escape=False)}
301
+ </div>
302
+ <div class="section"><h3>Plots</h3>{"".join([f'<div class="no-break"><h4>{name}</h4><img src="data:image/png;base64,{base64}"/></div>' for name, base64 in plots])}</div>
303
+ <div class="powered-by">Powered by <img src="data:image/png;base64,{get_zega_logo_base64()}" class="logo"></div>
304
+ </body></html>
305
+ """
306
+
307
+ html_file = "temp.html"
308
+ sanitized_query = sanitize_filename(query)
309
+ os.makedirs("./exported_pdfs", exist_ok=True)
310
+ pdf_file = f"./exported_pdfs/{sanitized_query}.pdf"
311
+
312
+ try:
313
+ with open(html_file, "w", encoding="utf-8") as f:
314
+ f.write(html_content)
315
+ options = {'encoding': "UTF-8", 'custom-header': [('Content-Type', 'text/html; charset=UTF-8')], 'no-outline': None}
316
+ pdfkit.from_file(html_file, pdf_file, options=options)
317
+ os.remove(html_file)
318
+ except Exception as e:
319
+ raise
320
+ return pdf_file
321
+
322
+ def get_zega_logo_base64():
323
+ try:
324
+ with open("zega_logo.png", "rb") as image_file:
325
+ encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
326
+ return encoded_string
327
+ except Exception as e:
328
+ raise
329
+
330
+ # Streamlit Interface
331
+ st.title("Data Analysis Agent Interface")
332
+
333
+ st.sidebar.markdown(
334
+ f"""
335
+ <div style="text-align: center;">
336
+ Powered by <img src="data:image/png;base64,{get_zega_logo_base64()}" style="height: 100px;">
337
+ </div>
338
+ """,
339
+ unsafe_allow_html=True,
340
+ )
341
+ st.sidebar.header("Sample Questions")
342
+
343
+ sample_questions = [
344
+ "Top 5 cities with the highest sales?",
345
+ "Bottom 3 products by total sales?",
346
+ "Top 10 products with reference to items sold?",
347
+ "Top 10 products with reference to total sums sold?"
348
+ ]
349
+
350
+ selected_question = st.sidebar.selectbox("Select a sample question:", sample_questions)
351
+ user_query = st.text_area("Please write one question at a time.", value=selected_question, height=100)
352
+
353
+ def process_query():
354
+ try:
355
+ # Step 1: Generate and execute code to get the data
356
+ generated_code = generate_code(user_query, column_info, sample_str, csv_path)
357
+ result = execute_code(generated_code, csv_path)
358
+
359
+ # Convert result to DataFrame if it's not already
360
+ if isinstance(result, pd.DataFrame):
361
+ result_df = result
362
+ elif isinstance(result, pd.Series):
363
+ result_df = result.reset_index()
364
+ elif isinstance(result, list):
365
+ if all(isinstance(item, dict) for item in result):
366
+ result_df = pd.DataFrame(result)
367
+ else:
368
+ result_df = pd.DataFrame({"value": result})
369
+ else:
370
+ result_df = pd.DataFrame({"value": [result]})
371
+
372
+ # Step 2: Generate and execute plotting code
373
+ plot_code = generate_plot_code(user_query, result_df)
374
+ plots = execute_plot_code(plot_code, result_df)
375
+
376
+ # Prepare the chat response
377
+ if isinstance(result, pd.DataFrame):
378
+ chat_response = {
379
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
380
+ "data": result.to_dict(orient='records'),
381
+ "csv_data": result.to_dict(orient='records'),
382
+ }
383
+ elif isinstance(result, pd.Series):
384
+ result = result.reset_index()
385
+ chat_response = {
386
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
387
+ "data": result.to_dict(orient='records'),
388
+ "csv_data": result.to_dict(orient='records'),
389
+ }
390
+ elif isinstance(result, list):
391
+ if all(isinstance(item, (int, float)) for item in result):
392
+ chat_response = {
393
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
394
+ "data": [{"category": str(i), "value": v} for i, v in enumerate(result)],
395
+ "csv_data": [{"category": str(i), "value": v} for i, v in enumerate(result)],
396
+ }
397
+ elif all(isinstance(item, dict) for item in result):
398
+ chat_response = {
399
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
400
+ "data": result,
401
+ "csv_data": result,
402
+ }
403
+ else:
404
+ st.warning("Result is a list with mixed data types. Please inspect.")
405
+ return
406
+ else:
407
+ chat_response = {
408
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
409
+ "data": [{"category": "Result", "value": result}],
410
+ "csv_data": [{"category": "Result", "value": result}],
411
+ }
412
+
413
+ # Display the query and data
414
+ st.markdown(f"<h3 style='color: #2e86de;'>Question:</h3>", unsafe_allow_html=True)
415
+ st.markdown(f"<p style='color: #2e86de;'>{user_query}</p>", unsafe_allow_html=True)
416
+ st.write("-" * 200)
417
+
418
+ # Initially hide the code
419
+ with st.expander("Show the generated data code"):
420
+ st.code(generated_code, language="python")
421
+
422
+ with st.expander("Show the generated plotting code"):
423
+ st.code(plot_code, language="python")
424
+
425
+ st.write("-" * 200)
426
+
427
+ # Display the data
428
+ st.markdown("### Data:")
429
+ st.dataframe(result_df)
430
+ st.write("-" * 200)
431
+
432
+ # Display the plots
433
+ st.markdown("### Visualizations:")
434
+ for name, base64_img in plots:
435
+ st.markdown(f"#### {name}")
436
+ st.markdown(f'<img src="data:image/png;base64,{base64_img}" style="max-width:100%">', unsafe_allow_html=True)
437
+ st.write("-" * 100)
438
+
439
+ # Store the data for PDF generation
440
+ st.session_state["query"] = user_query
441
+ st.session_state["response_text"] = str(result)
442
+ st.session_state["chat_response"] = chat_response
443
+ st.session_state["plots"] = plots
444
+ st.session_state["generated_code"] = generated_code
445
+ st.session_state["plot_code"] = plot_code
446
+
447
+ except Exception as e:
448
+ st.error(f"An error occurred: {e}")
449
+ import traceback
450
+ st.error(traceback.format_exc())
451
+
452
+ if st.button("Submit"):
453
+ with st.spinner("Processing query..."):
454
+ try:
455
+ process_query()
456
+ except Exception as e:
457
+ st.error(f"An error occurred: {e}")
458
+ import traceback
459
+ st.error(traceback.format_exc())
460
+
461
+ if "chat_response" in st.session_state:
462
+ if st.button("Download PDF"):
463
+ with st.spinner("Generating PDF..."):
464
+ try:
465
+ pdf_file = generate_pdf(
466
+ st.session_state["query"],
467
+ st.session_state["response_text"],
468
+ st.session_state["chat_response"],
469
+ st.session_state["plots"]
470
+ )
471
+ with open(pdf_file, "rb") as f:
472
+ pdf_data = f.read()
473
+ sanitized_query = sanitize_filename(st.session_state["query"])
474
+ st.download_button(
475
+ label="Click Here to Download PDF",
476
+ data=pdf_data,
477
+ file_name=f"{sanitized_query}.pdf",
478
+ mime="application/pdf",
479
+ )
480
+ except Exception as e:
481
+ st.error(f"PDF generation failed: {e}")
app_hardcoded.py ADDED
@@ -0,0 +1,511 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import matplotlib.pyplot as plt
4
+ import plotly.express as px
5
+ from dotenv import load_dotenv
6
+ from langchain.agents.agent_types import AgentType
7
+ from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
8
+ from langchain_openai import ChatOpenAI
9
+ import os
10
+ import seaborn as sns
11
+ import plotly.graph_objects as go
12
+ import json
13
+ import pdfkit
14
+ import io
15
+ import base64
16
+ from matplotlib.backends.backend_agg import FigureCanvasAgg
17
+ import html
18
+ import re
19
+ from openai import OpenAI
20
+ from io import StringIO
21
+
22
+ load_dotenv()
23
+
24
+ # --- Configuration ---
25
+ OPENAI_API_KEY=os.getenv("OPENAI_API_KEY") or st.secrets.get("OPENAI_API_KEY")
26
+
27
+ client = OpenAI(api_key=OPENAI_API_KEY)
28
+ csv_path = "asig_sales_31012025.csv"
29
+
30
+ if not os.path.exists(csv_path):
31
+ print(f"Error: CSV file '{csv_path}' not found.")
32
+ exit(1)
33
+
34
+ def get_csv_sample(csv_path, sample_size=5):
35
+ """Reads a CSV file and returns column info, a sample, and the DataFrame."""
36
+ df = pd.read_csv(csv_path)
37
+ sample_df = df.sample(n=min(sample_size, len(df)), random_state=42)
38
+ return df.dtypes.to_string(), sample_df.to_string(index=False), df
39
+
40
+ column_info, sample_str, _ = get_csv_sample(csv_path)
41
+
42
+ # @observe()
43
+ def chat(response_text):
44
+ return json.loads(response_text) # Directly parse the JSON
45
+
46
+ def generate_code(question, column_info, sample_str, csv_path, model_name="gpt-4o"):
47
+ """Asks OpenAI to generate Pandas code for a given question."""
48
+ prompt = f"""You are a highly skilled Python data analyst with expert-level proficiency in Pandas. Your task is to write **concise, correct, and efficient** Pandas code to answer a specific question about data contained within a CSV file. The code you generate must be self-contained, directly executable, and produce the correct numerical output or DataFrame structure.
49
+
50
+ **CSV File Information:**
51
+
52
+ * **Path:** '{csv_path}'
53
+ * **Column Information:** (This tells you the names and data types of the columns)
54
+ ```
55
+ {column_info}
56
+ ```
57
+ * **Sample Data:** (This gives you a glimpse of the data's structure. Note the European date format DD/MM/YYYY)
58
+ ```
59
+ {sample_str}
60
+ ```
61
+
62
+ **Strict Requirements (Follow these EXACTLY):**
63
+ 0. **Multi-part Questions:**
64
+ * If the user asks a multi-part question, **reformat it** to process each part correctly while maintaining the original meaning. **Do not change the intent** of the question.
65
+ * **For multi-part questions**, the code should reflect how each part of the question is handled. You must ensure that each part is processed and combined correctly at the end.
66
+ * **Print a statement** explaining how you processed the multi-part question, e.g., `print("Question was split into parts for processing.")`.
67
+
68
+ 1. **Load Data and Parse Dates:** Your code *MUST* begin with the following line to load the data, correctly parsing *ALL* potential date columns:
69
+ ```python
70
+ import pandas as pd
71
+ df = pd.read_csv('{csv_path}', parse_dates=['HIST_DATE', 'DATA_SEM_OFERTA', 'DATA_STARE_CERERE', 'DATA_IN_OFERTA', 'CTR_DATA_START', 'CTR_DATA_STATUS'], dayfirst=True)
72
+ ```
73
+ Do *NOT* modify this line. The `parse_dates` argument is *critical* for correct date handling, and `dayfirst=True` is absolutely required because dates are in European DD/MM/YYYY format.
74
+
75
+ 2. **Imports:** Do *NOT* import any libraries other than pandas (which is already imported as `pd`). Do *NOT* use `numpy` or `datetime` directly, unless it is used within the context of parsing in read_csv. Pandas is sufficient for all tasks.
76
+
77
+ 3. **Output:**
78
+ * Store your final answer in a variable named `result`.
79
+ * Print the `result` variable using `print(result)`.
80
+ * Do *NOT* use `display()`.
81
+ * The output must be a Pandas DataFrame, Series, or a single value, as appropriate for the question. If it's a DataFrame or Series, ensure the index is reset where appropriate (e.g., after a `groupby()` followed by `.size()`).
82
+
83
+ 4. **Conciseness and Style:**
84
+ * Write the *most concise* and efficient Pandas code possible.
85
+ * Use method chaining (e.g., `df.groupby(...).sum().sort_values().head()`) whenever possible and appropriate.
86
+ * Avoid unnecessary intermediate variables unless they *significantly* improve readability.
87
+ * Use clear and understandable variable names for filtered dataframes, (for example: df_2010, df_filtered etc)
88
+ * If calculating a percentage or distribution, combine operations efficiently, ideally in a single chained expression.
89
+
90
+ 5. **Correctness:** Your code *MUST* be syntactically correct Python and *MUST* produce the correct answer to the question. Double-check your logic, especially when grouping and aggregating. Pay close attention to the wording of the question.
91
+
92
+ 6. **Date and Time Conditions (Implicit Filtering):**
93
+ * **Any question that refers to dates, time periods, months, years, or uses phrases like "issued in," "policies from," "between [dates]," etc., *MUST* filter the data using the `DATA_SEM_OFERTA` column.** This is the *implied* date column for policy issuance. Do *NOT* ask the user which column to use; assume `DATA_SEM_OFERTA`.
94
+ * When filtering dates, use combined boolean conditions for efficiency, e.g., `df[(df['DATA_SEM_OFERTA'].dt.year == 2010) & (df['DATA_SEM_OFERTA'].dt.month == 12)]` rather than separate filtering steps.
95
+
96
+ 7. **Column Names:** Use the *exact* column names provided in the "CSV Column Information." Pay close attention to capitalization, spaces, and any special characters.
97
+
98
+ 8. **No Explanations:** Output *ONLY* the Python code. Do *NOT* include any comments, explanations, surrounding text, or markdown formatting (like ```python). Just the code.
99
+
100
+ 9. **Aggregation (VERY IMPORTANT):** When the question asks for:
101
+ * "top N" or "first N"
102
+ * "most frequent"
103
+ * "highest/lowest" (after grouping)
104
+ * "average/sum/count per [group]"
105
+ * **Calculate Percentage**: When percentage is asked, compute the correct percentage value
106
+
107
+ You *MUST* perform a `groupby()` operation *BEFORE* sorting or selecting the top N values. The correct order is:
108
+ 1. Filter the DataFrame (if needed, using boolean indexing).
109
+ 2. Group by the appropriate column(s) using `.groupby()`.
110
+ 3. Apply an aggregation function (e.g., `.sum()`, `.mean()`, `.size()`, `.count()`, `.median()`).
111
+ 4. *Then*, sort (if needed) using `.sort_values()` and/or select the top N (if needed) using `.nlargest()` or `.head()`.
112
+
113
+ 10. **Error Handling:** Assume the CSV file exists and is correctly formatted. You do *not* need to write any explicit error handling code.
114
+
115
+ 11. **Clarity:** Use clear and meaningful variable names if you create intermediate dataframes, but prioritize conciseness.
116
+ **Column Usage Guidance:**
117
+
118
+
119
+ 13. primele means .nlargest and ultimele means .nsmallest
120
+ * Use `CTR_STATUS` when a concise or coded representation of the contract status is needed (e.g., for technical filtering or matching with system data).
121
+ * Use `CTR_DESCRIERE_STATUS` when a human-readable description is required (e.g., for distributions, summaries, or grouping by status type, such as "Activ", "Reziliat"). Default to `CTR_DESCRIERE_STATUS` for questions involving totals, distributions, or descriptive analysis unless the question specifies a coded status.
122
+ * Use `COD_SUCURSALA` for numerical branch identification (e.g., filtering or joining with other datasets); use `DENUMIRE_SUCURSALA` for human-readable branch names (e.g., grouping or summarizing by branch name).
123
+ * Use `COD_AGENTIE` for numerical agency identification; use `DENUMIRE_AGENTIE` for human-readable agency names, preferring the latter for summaries or rankings.
124
+ * Use `DATA_SEM_OFERTA` as the implied date column for policy issuance or time-based filtering (e.g., "issued in", "per month"), unless the question specifies another date column.
125
+ * Use `PBA_BAZA`, `PBA_ASIG_SUPLIM`, `PBA_TOTAL_SEMNARE_CERERE`, and `PBA_TOTAL_EMITERE_CERERE` for financial aggregations (e.g., sum, mean) based on the specific PBA type mentioned in the question.
126
+
127
+ **Question:**
128
+ {question}
129
+ """
130
+
131
+ response = client.chat.completions.create(model=model_name,
132
+ temperature=0, # Keep temperature at 0 for consistent, deterministic code
133
+ messages=[
134
+ {"role": "system", "content": "You are a helpful assistant that generates Python code."},
135
+ {"role": "user", "content": prompt}
136
+ ])
137
+
138
+ code_to_execute = response.choices[0].message.content.strip()
139
+ code_to_execute = code_to_execute.replace("```python", "").replace("```", "").strip()
140
+
141
+ return code_to_execute
142
+
143
+
144
+ def execute_code(generated_code, csv_path):
145
+ """Executes the generated Pandas code and captures the output."""
146
+ local_vars = {"pd": pd, "__file__": csv_path}
147
+ exec(generated_code, {}, local_vars)
148
+ return local_vars.get("result")
149
+
150
+ def fig_to_base64(fig):
151
+ buf = io.BytesIO()
152
+ fig.savefig(buf, format="png", bbox_inches="tight")
153
+ buf.seek(0)
154
+ img_str = base64.b64encode(buf.getvalue()).decode("utf-8")
155
+ buf.close()
156
+ return img_str
157
+
158
+ def plotly_to_base64(fig):
159
+ img_bytes = fig.to_image(format="png", scale=2)
160
+ img_str = base64.b64encode(img_bytes).decode("utf-8")
161
+ return img_str
162
+
163
+ def generate_plots(metadata, categories, values):
164
+ # Filter numeric values and categories
165
+ numeric_values = [v for v in values if isinstance(v, (int, float))]
166
+ numeric_categories = [c for c, v in zip(categories, values) if isinstance(v, (int, float))]
167
+
168
+ if not numeric_values:
169
+ st.warning("No numeric data to plot for this query.")
170
+ return []
171
+
172
+ sorted_categories, sorted_values = zip(*sorted(zip(numeric_categories, numeric_values), key=lambda x: x[1], reverse=True))
173
+ plots = []
174
+
175
+ if all(isinstance(c, str) for c in categories) and all(isinstance(v, (int, float)) for v in values):
176
+ sorted_categories, sorted_values = zip(*sorted(zip(categories, values), key=lambda x: x[1], reverse=True))
177
+
178
+ # Bar Plot (Main plot for string categories and numeric values)
179
+ fig_bar = px.bar(x=sorted_values, y=sorted_categories, orientation="h",
180
+ labels={"x": "Value", "y": "Category"},
181
+ title=f"{metadata['query']} (Bar Chart)",
182
+ color=sorted_values, color_continuous_scale="blues")
183
+ fig_bar.update_layout(yaxis=dict(categoryorder="total ascending"))
184
+ st.plotly_chart(fig_bar)
185
+ plots.append(("Bar Chart (Plotly)", plotly_to_base64(fig_bar)))
186
+
187
+ # Numeric plots (only if there are numeric values)
188
+ if any(isinstance(v, (int, float)) for v in values):
189
+ numeric_values = [v for v in values if isinstance(v, (int, float))]
190
+ numeric_categories = [c for c, v in zip(categories, values) if isinstance(v, (int, float))]
191
+
192
+ if numeric_values:
193
+ sorted_categories, sorted_values = zip(*sorted(zip(numeric_categories, numeric_values), key=lambda x: x[1], reverse=True))
194
+
195
+ # Bar Plot (Plotly)
196
+ fig1 = px.bar(x=sorted_categories, y=sorted_values, labels={"x": "Category", "y": metadata.get("unit", "Value")},
197
+ title=f"{metadata['query']} (Plotly Bar)", color=sorted_values, color_continuous_scale="blues")
198
+ st.plotly_chart(fig1)
199
+ plots.append(("Bar Plot (Plotly)", plotly_to_base64(fig1)))
200
+
201
+ # Pie Chart
202
+ fig2, ax2 = plt.subplots(figsize=(10, 8))
203
+ cmap = plt.get_cmap("tab20c")
204
+ colors = [cmap(i) for i in range(len(sorted_categories))]
205
+ wedges, texts = ax2.pie(sorted_values, labels=None, autopct=None, startangle=140, colors=colors, wedgeprops=dict(width=0.4))
206
+ legend_labels = [f"{cat} ({val / sum(sorted_values):.1%})" for cat, val in zip(sorted_categories, sorted_values)]
207
+ ax2.legend(wedges, legend_labels, title="Categories", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1), fontsize=10)
208
+ ax2.axis("equal")
209
+ ax2.set_title(f"{metadata['query']} (Pie)", fontsize=16)
210
+ st.pyplot(fig2)
211
+ plots.append(("Pie Chart", fig_to_base64(fig2)))
212
+ plt.close(fig2)
213
+
214
+ # Histogram
215
+ fig3, ax3 = plt.subplots(figsize=(10, 6))
216
+ ax3.hist(sorted_values, bins=10, color="skyblue", edgecolor="black")
217
+ ax3.set_title(f"Distribution of {metadata['query']} (Histogram)", fontsize=16)
218
+ st.pyplot(fig3)
219
+ plots.append(("Histogram", fig_to_base64(fig3)))
220
+ plt.close(fig3)
221
+
222
+ # Heatmap
223
+ fig4, ax4 = plt.subplots(figsize=(10, 6))
224
+ data_matrix = pd.DataFrame({metadata.get("unit", "Value"): sorted_values}, index=sorted_categories)
225
+ sns.heatmap(data_matrix, annot=True, cmap="Blues", ax=ax4, fmt=".1f")
226
+ ax4.set_title(f"{metadata['query']} (Heatmap)", fontsize=16)
227
+ st.pyplot(fig4)
228
+ plots.append(("Heatmap", fig_to_base64(fig4)))
229
+ plt.close(fig4)
230
+
231
+ # Scatter Plot
232
+ fig5 = px.scatter(x=sorted_categories, y=sorted_values, title=f"{metadata['query']} (Scatter Plot)",
233
+ labels={"x": "Category", "y": metadata.get("unit", "Value")})
234
+ st.plotly_chart(fig5)
235
+ plots.append(("Scatter Plot (Plotly)", plotly_to_base64(fig5)))
236
+
237
+ # Line Plot
238
+ fig6 = px.line(x=sorted_categories, y=sorted_values, title=f"{metadata['query']} (Line Plot)",
239
+ labels={"x": "Category", "y": metadata.get("unit", "Value")})
240
+ st.plotly_chart(fig6)
241
+ plots.append(("Line Plot (Plotly)", plotly_to_base64(fig6)))
242
+
243
+ # Box Plot
244
+ fig7, ax7 = plt.subplots(figsize=(10, 6))
245
+ ax7.boxplot(sorted_values, vert=False, tick_labels=["Data"], patch_artist=True)
246
+ ax7.set_title(f"{metadata['query']} (Box Plot)", fontsize=16)
247
+ st.pyplot(fig7)
248
+ plots.append(("Box Plot", fig_to_base64(fig7)))
249
+ plt.close(fig7)
250
+
251
+ # Violin Plot
252
+ fig8, ax8 = plt.subplots(figsize=(10, 6))
253
+ ax8.violinplot(sorted_values, vert=False, showmeans=True, showextrema=True)
254
+ ax8.set_title(f"{metadata['query']} (Violin Plot)", fontsize=16)
255
+ st.pyplot(fig8)
256
+ plots.append(("Violin Plot", fig_to_base64(fig8)))
257
+ plt.close(fig8)
258
+
259
+ # Area Chart
260
+ fig9 = px.area(x=sorted_categories, y=sorted_values, title=f"{metadata['query']} (Area Chart)", labels={"x": "Category", "y": metadata.get("unit", "Value")})
261
+ st.plotly_chart(fig9)
262
+ plots.append(("Area Chart (Plotly)", plotly_to_base64(fig9)))
263
+
264
+ # Radar Chart
265
+ fig10 = go.Figure(data=go.Scatterpolar(r=sorted_values, theta=sorted_categories, fill='toself', name=metadata['query']))
266
+ fig10.update_layout(polar=dict(radialaxis=dict(visible=True)), showlegend=True, title=f"{metadata['query']} (Radar Chart)")
267
+
268
+ st.plotly_chart(fig10)
269
+ plots.append(("Radar Chart (Plotly)", plotly_to_base64(fig10)))
270
+
271
+ else:
272
+ st.warning("No numeric data to plot for this query.")
273
+
274
+ return plots
275
+
276
+ def sanitize_filename(filename):
277
+ return re.sub(r'[^a-zA-Z0-9]', '_', filename)
278
+
279
+ def generate_pdf(query, response_text, chat_response, plots):
280
+ query = html.unescape(query)
281
+ response_text = html.unescape(response_text)
282
+ escaped_query = html.escape(query)
283
+ escaped_response_text = html.escape(response_text)
284
+
285
+ html_content = f"""
286
+ <!DOCTYPE html>
287
+ <html lang="ro">
288
+ <head>
289
+ <title>Data Analysis Report</title>
290
+ <meta charset="UTF-8">
291
+ <style>
292
+ body {{ font-family: Arial, sans-serif; margin: 20px; background-color: #f9f9f9; color: #333; }}
293
+ h1 {{ color: #1f77b4; text-align: center; }}
294
+ h3 {{ color: #2c3e50; border-bottom: 2px solid #ddd; padding-bottom: 5px; }}
295
+ h4 {{ color: #2980b9; }}
296
+ p {{ line-height: 1.6; background-color: #fff; padding: 10px; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1); }}
297
+ pre {{ background-color: #ecf0f1; padding: 10px; border-radius: 5px; font-size: 12px; }}
298
+ table {{ border-collapse: collapse; width: 100%; margin: 10px 0; page-break-inside: avoid; }}
299
+ th, td {{ border: 1px solid #bdc3c7; padding: 10px; text-align: left; }}
300
+ th {{ background-color: #3498db; color: white; }}
301
+ td {{ background-color: #fff; }}
302
+ img {{ max-width: 100%; height: auto; margin: 10px 0; page-break-inside: avoid; }}
303
+ .section {{ margin-bottom: 20px; }}
304
+ .no-break {{ page-break-inside: avoid; }}
305
+ .powered-by {{ text-align: center; margin-top: 20px; font-size: 10px; color: #777; }}
306
+ .logo {{ height: 100px; }}
307
+ </style>
308
+ </head>
309
+ <body>
310
+ <h1>Data Analysis Agent Interface</h1>
311
+ <div class="section no-break"><h3>Query</h3><p>{escaped_query}</p></div>
312
+ <div class="section no-break"><h3>Response</h3><p>{escaped_response_text}</p></div>
313
+ <div class="section no-break">
314
+ <h3>Raw Structured Response</h3>
315
+ <h4>Metadata</h4><pre>{json.dumps(chat_response["metadata"], indent=2, ensure_ascii=False)}</pre>
316
+ <h4>Data</h4>{pd.DataFrame(chat_response["data"]).to_html(index=False, classes="no-break", escape=False)}
317
+ </div>
318
+ <div class="section"><h3>Plots</h3>{"".join([f'<div class="no-break"><h4>{name}</h4><img src="data:image/png;base64,{base64}"/></div>' for name, base64 in plots])}</div>
319
+ <div class="powered-by">Powered by <img src="data:image/png;base64,{get_zega_logo_base64()}" class="logo"></div>
320
+ </body></html>
321
+ """
322
+
323
+ html_file = "temp.html"
324
+ sanitized_query = sanitize_filename(query)
325
+ os.makedirs("./exported_pdfs", exist_ok=True)
326
+ pdf_file = f"./exported_pdfs/{sanitized_query}.pdf"
327
+
328
+ try:
329
+ with open(html_file, "w", encoding="utf-8") as f:
330
+ f.write(html_content)
331
+ options = {'encoding': "UTF-8", 'custom-header': [('Content-Type', 'text/html; charset=UTF-8')], 'no-outline': None}
332
+ pdfkit.from_file(html_file, pdf_file, options=options)
333
+ os.remove(html_file)
334
+ except Exception as e:
335
+ raise
336
+ return pdf_file
337
+
338
+ def get_zega_logo_base64():
339
+ try:
340
+ with open("zega_logo.png", "rb") as image_file:
341
+ encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
342
+ return encoded_string
343
+ except Exception as e:
344
+ raise
345
+
346
+ # Streamlit Interface
347
+ st.title("Data Analysis Agent Interface")
348
+
349
+ st.sidebar.markdown(
350
+ f"""
351
+ <div style="text-align: center;">
352
+ Powered by <img src="data:image/png;base64,{get_zega_logo_base64()}" style="height: 100px;">
353
+ </div>
354
+ """,
355
+ unsafe_allow_html=True,
356
+ )
357
+ st.sidebar.header("Sample Questions")
358
+
359
+ sample_questions = [
360
+ "Da-mi top cinci sucursale cu vânzări în perioada 01.03.2024-01.04.2024.",
361
+ "Da-mi vânzările defalcate pe produse pentru top cinci sucursale cu vânzări în perioada 01.03.2024-01.04.2024.",
362
+ "Da-mi vânzările defalcate pe pachete pentru top cinci sucursale cu vânzări în perioada 01.03.2024-01.04.2024.",
363
+ ]
364
+
365
+ selected_question = st.sidebar.selectbox("Select a sample question:", sample_questions)
366
+ user_query = st.text_area("Please write one question at a time.", value=selected_question, height=100)
367
+
368
+ def process_query():
369
+ try:
370
+ generated_code = generate_code(user_query, column_info, sample_str, csv_path)
371
+ result = execute_code(generated_code, csv_path)
372
+
373
+ if isinstance(result, pd.DataFrame):
374
+ chat_response = {
375
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
376
+ "data": result.to_dict(orient='records'),
377
+ "csv_data": result.to_dict(orient='records'),
378
+ }
379
+
380
+ elif isinstance(result, pd.Series):
381
+ result = result.reset_index()
382
+ chat_response = {
383
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
384
+ "data": result.to_dict(orient='records'),
385
+ "csv_data": result.to_dict(orient='records'),
386
+ }
387
+
388
+ elif isinstance(result, list):
389
+ if all(isinstance(item, (int, float)) for item in result):
390
+ chat_response = {
391
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
392
+ "data": [{"category": str(i), "value": v} for i, v in enumerate(result)],
393
+ "csv_data": [{"category": str(i), "value": v} for i, v in enumerate(result)],
394
+ }
395
+ elif all(isinstance(item, dict) for item in result):
396
+ chat_response = {
397
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
398
+ "data": result,
399
+ "csv_data": result,
400
+ }
401
+ else:
402
+ st.warning("Result is a list with mixed data types. Please inspect.")
403
+ return
404
+
405
+ else:
406
+ chat_response = {
407
+ "metadata": {"query": user_query, "unit": "", "plot_types": []},
408
+ "data": [{"category": "Result", "value": result}],
409
+ "csv_data": [{"category": "Result", "value": result}],
410
+ }
411
+
412
+ st.markdown(f"<h3 style='color: #2e86de;'>Question:</h3>", unsafe_allow_html=True)
413
+ st.markdown(f"<p style='color: #2e86de;'>{user_query}</p>", unsafe_allow_html=True)
414
+ st.write("-" * 200)
415
+
416
+ # Initially hide the code.
417
+ with st.expander("Show the code"):
418
+ st.code(generated_code, language="python")
419
+ st.write("-" * 200)
420
+
421
+
422
+ st.markdown("### Data:")
423
+ st.dataframe(pd.DataFrame(chat_response["data"]))
424
+
425
+ metadata = chat_response["metadata"]
426
+ data = chat_response["data"]
427
+
428
+ if data and isinstance(data, list) and isinstance(data[0], dict):
429
+ if len(data[0]) == 1:
430
+ categories = [item[list(item.keys())[0]] for item in data]
431
+ values = categories
432
+ else:
433
+ categories = list(data[0].keys())
434
+ if len(categories) == 1:
435
+ values = [item[categories[0]] for item in data]
436
+ categories = values
437
+ else:
438
+ prioritized_columns = ["DENUMIRE_SUCURSALA", "NUMAR_CERERE", "size", "HIST_DATE", "COD_SUCURSALA", "COD_AGENTIE",
439
+ "DENUMIRE_AGENTIE", "PRODUS", "DATA_SEM_OFERTA", "DATA_STARE_CERERE", "STATUS_CERERE",
440
+ "DESCRIERE_STARE_CERERE", "DATA_IN_OFERTA", "PBA_BAZA", "PBA_ASIG_SUM",
441
+ "PBA_TOTAL_SEMNARE_CERERE", "PBA_CTR_ASOC", "PBA_TOTAL_EMITERE_CERERE", "FRECVENTA_PLATA"]
442
+
443
+ for col in prioritized_columns:
444
+ if all(col in item for item in data):
445
+ categories = [str(item[col]) for item in data]
446
+ if col != "NUMAR_CERERE" and col != "size":
447
+ if all("NUMAR_CERERE" in item for item in data):
448
+ values = [item.get("NUMAR_CERERE", 0) for item in data]
449
+ elif all("size" in item for item in data):
450
+ values = [item.get("size", 0) for item in data]
451
+
452
+ else:
453
+ numeric_col = next((c for c in data[0] if isinstance(data[0][c], (int, float))), None)
454
+ if numeric_col:
455
+ values = [item.get(numeric_col, 0) for item in data]
456
+ else:
457
+ values = [str(list(item.values())[1]) for item in data]
458
+ break
459
+ else:
460
+ values = [str(list(item.values())[1]) for item in data]
461
+
462
+ elif isinstance(data, list) and all(isinstance(item, (int, float)) for item in data):
463
+ categories = list(range(len(data)))
464
+ values = data
465
+ elif isinstance(data, (int, float, str)):
466
+ categories = ["Result"]
467
+ values = [data]
468
+ else:
469
+ categories = []
470
+ values = []
471
+ st.warning("Unexpected data format. Check the query and data.")
472
+
473
+ plots = generate_plots(metadata, categories, values)
474
+
475
+ st.session_state["query"] = user_query
476
+ st.session_state["response_text"] = result
477
+ st.session_state["chat_response"] = chat_response
478
+ st.session_state["plots"] = plots
479
+ st.session_state["generated_code"] = generated_code # Store the generated code
480
+
481
+ except Exception as e:
482
+ st.error(f"An error occurred: {e}")
483
+
484
+ if st.button("Submit"):
485
+ with st.spinner("Processing query..."):
486
+ try:
487
+ process_query()
488
+ except Exception as e:
489
+ st.error(f"An error occurred: {e}")
490
+
491
+ if "chat_response" in st.session_state:
492
+ if st.button("Download PDF"):
493
+ with st.spinner("Generating PDF..."):
494
+ try:
495
+ pdf_file = generate_pdf(
496
+ st.session_state["query"],
497
+ st.session_state["response_text"],
498
+ st.session_state["chat_response"],
499
+ st.session_state["plots"]
500
+ )
501
+ with open(pdf_file, "rb") as f:
502
+ pdf_data = f.read()
503
+ sanitized_query = sanitize_filename(st.session_state["query"])
504
+ st.download_button(
505
+ label="Click Here to Download PDF",
506
+ data=pdf_data,
507
+ file_name=f"{sanitized_query}.pdf",
508
+ mime="application/pdf",
509
+ )
510
+ except Exception as e:
511
+ st.error(f"PDF generation failed: {e}")
packages.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ wkhtmltopdf
requirements.txt ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aiohappyeyeballs==2.4.8
2
+ aiohttp==3.11.13
3
+ aiosignal==1.3.2
4
+ altair==5.5.0
5
+ annotated-types==0.7.0
6
+ anyio==4.8.0
7
+ attrs==25.1.0
8
+ backoff==2.2.1
9
+ blinker==1.9.0
10
+ cachetools==5.5.2
11
+ certifi==2025.1.31
12
+ charset-normalizer==3.4.1
13
+ click==8.1.8
14
+ contourpy==1.3.1
15
+ cycler==0.12.1
16
+ dataclasses-json==0.6.7
17
+ distro==1.9.0
18
+ dotenv==0.9.9
19
+ fonttools==4.56.0
20
+ frozenlist==1.5.0
21
+ gitdb==4.0.12
22
+ GitPython==3.1.44
23
+ greenlet==3.1.1
24
+ h11==0.14.0
25
+ httpcore==1.0.7
26
+ httpx==0.28.1
27
+ httpx-sse==0.4.0
28
+ idna==3.10
29
+ Jinja2==3.1.5
30
+ jiter==0.8.2
31
+ jsonpatch==1.33
32
+ jsonpointer==3.0.0
33
+ jsonschema==4.23.0
34
+ jsonschema-specifications==2024.10.1
35
+ kaleido==0.2.1
36
+ kiwisolver==1.4.8
37
+ langchain==0.3.19
38
+ langchain-community==0.3.18
39
+ langchain-core==0.3.40
40
+ langchain-experimental==0.3.4
41
+ langchain-openai==0.3.7
42
+ langchain-text-splitters==0.3.6
43
+ langfuse==2.59.7
44
+ langsmith==0.3.11
45
+ markdown-it-py==3.0.0
46
+ MarkupSafe==3.0.2
47
+ marshmallow==3.26.1
48
+ matplotlib==3.10.1
49
+ mdurl==0.1.2
50
+ multidict==6.1.0
51
+ mypy-extensions==1.0.0
52
+ narwhals==1.29.0
53
+ numpy==2.2.3
54
+ openai==1.65.2
55
+ orjson==3.10.15
56
+ packaging==24.2
57
+ pandas==2.2.3
58
+ pdfkit==1.0.0
59
+ pillow==11.1.0
60
+ plotly==6.0.0
61
+ propcache==0.3.0
62
+ protobuf==5.29.3
63
+ pyarrow==19.0.1
64
+ pydantic==2.10.6
65
+ pydantic-settings==2.8.1
66
+ pydantic_core==2.27.2
67
+ pydeck==0.9.1
68
+ Pygments==2.19.1
69
+ pyparsing==3.2.1
70
+ python-dateutil==2.9.0.post0
71
+ python-dotenv==1.0.1
72
+ pytz==2025.1
73
+ PyYAML==6.0.2
74
+ referencing==0.36.2
75
+ regex==2024.11.6
76
+ requests==2.32.3
77
+ requests-toolbelt==1.0.0
78
+ rich==13.9.4
79
+ rpds-py==0.23.1
80
+ seaborn==0.13.2
81
+ setuptools==75.8.0
82
+ six==1.17.0
83
+ smmap==5.0.2
84
+ sniffio==1.3.1
85
+ SQLAlchemy==2.0.38
86
+ streamlit==1.42.2
87
+ tabulate==0.9.0
88
+ tenacity==9.0.0
89
+ tiktoken==0.9.0
90
+ toml==0.10.2
91
+ tornado==6.4.2
92
+ tqdm==4.67.1
93
+ typing-inspect==0.9.0
94
+ typing_extensions==4.12.2
95
+ tzdata==2025.1
96
+ urllib3==2.3.0
97
+ watchdog==6.0.0
98
+ wheel==0.45.1
99
+ wkhtmltopdf==0.2
100
+ wrapt==1.17.2
101
+ yarl==1.18.3
102
+ zstandard==0.23.0
103
+ aiohappyeyeballs==2.4.8
104
+ aiohttp==3.11.13
105
+ aiosignal==1.3.2
106
+ altair==5.5.0
107
+ annotated-types==0.7.0
108
+ anyio==4.8.0
109
+ attrs==25.1.0
110
+ backoff==2.2.1
111
+ blinker==1.9.0
112
+ cachetools==5.5.2
113
+ certifi==2025.1.31
114
+ charset-normalizer==3.4.1
115
+ click==8.1.8
116
+ contourpy==1.3.1
117
+ cycler==0.12.1
118
+ dataclasses-json==0.6.7
119
+ distro==1.9.0
120
+ dotenv==0.9.9
121
+ fonttools==4.56.0
122
+ frozenlist==1.5.0
123
+ gitdb==4.0.12
124
+ GitPython==3.1.44
125
+ greenlet==3.1.1
126
+ grpcio==1.70.0
127
+ grpcio-tools==1.70.0
128
+ h11==0.14.0
129
+ h2==4.2.0
130
+ hpack==4.1.0
131
+ httpcore==1.0.7
132
+ httpx==0.28.1
133
+ httpx-sse==0.4.0
134
+ huggingface-hub==0.26.2
135
+ hyperframe==6.1.0
136
+ idna==3.10
137
+ Jinja2==3.1.5
138
+ jiter==0.8.2
139
+ jsonpatch==1.33
140
+ jsonpointer==3.0.0
141
+ jsonschema==4.23.0
142
+ jsonschema-specifications==2024.10.1
143
+ kaleido==0.2.1
144
+ kiwisolver==1.4.8
145
+ kornia==0.7.4
146
+ kornia_rs==0.1.7
147
+ langchain==0.3.19
148
+ langchain-community==0.3.18
149
+ langchain-core==0.3.40
150
+ langchain-experimental==0.3.4
151
+ langchain-openai==0.3.7
152
+ langchain-text-splitters==0.3.6
153
+ langfuse==2.59.7
154
+ langsmith==0.3.11
155
+ markdown-it-py==3.0.0
156
+ MarkupSafe==3.0.2
157
+ marshmallow==3.26.1
158
+ matplotlib==3.10.1
159
+ mdurl==0.1.2
160
+ multidict==6.1.0
161
+ mypy-extensions==1.0.0
162
+ narwhals==1.29.0
163
+ numpy==2.2.3
164
+ nvidia-cublas-cu12==12.4.5.8
165
+ nvidia-cuda-cupti-cu12==12.4.127
166
+ nvidia-cuda-nvrtc-cu12==12.4.127
167
+ nvidia-cuda-runtime-cu12==12.4.127
168
+ nvidia-cudnn-cu12==9.1.0.70
169
+ nvidia-cufft-cu12==11.2.1.3
170
+ nvidia-curand-cu12==10.3.5.147
171
+ nvidia-cusolver-cu12==11.6.1.9
172
+ nvidia-cusparse-cu12==12.3.1.170
173
+ nvidia-nccl-cu12==2.21.5
174
+ nvidia-nvjitlink-cu12==12.4.127
175
+ nvidia-nvtx-cu12==12.4.127
176
+ ollama==0.4.7
177
+ openai==1.65.2
178
+ orjson==3.10.15
179
+ packaging==24.2
180
+ pandas==2.2.3
181
+ pdfkit==1.0.0
182
+ pillow==11.1.0
183
+ plotly==6.0.0
184
+ portalocker==2.10.1
185
+ propcache==0.3.0
186
+ protobuf==5.29.3
187
+ pyarrow==19.0.1
188
+ pydantic==2.10.6
189
+ pydantic-settings==2.8.1
190
+ pydantic_core==2.27.2
191
+ pydeck==0.9.1
192
+ Pygments==2.19.1
193
+ pyparsing==3.2.1
194
+ python-dateutil==2.9.0.post0
195
+ python-dotenv==1.0.1
196
+ pytz==2025.1
197
+ PyYAML==6.0.2
198
+ qdrant-client==1.13.2
199
+ referencing==0.36.2
200
+ regex==2024.11.6
201
+ requests==2.32.3
202
+ requests-toolbelt==1.0.0
203
+ rich==13.9.4
204
+ rpds-py==0.23.1
205
+ safetensors==0.4.5
206
+ seaborn==0.13.2
207
+ sentencepiece==0.2.0
208
+ setuptools==75.8.0
209
+ six==1.17.0
210
+ smmap==5.0.2
211
+ sniffio==1.3.1
212
+ soundfile==0.12.1
213
+ spandrel==0.4.0
214
+ SQLAlchemy==2.0.38
215
+ streamlit==1.42.2
216
+ sympy==1.13.1
217
+ tabulate==0.9.0
218
+ tenacity==9.0.0
219
+ tiktoken==0.9.0
220
+ toml==0.10.2
221
+ torchsde==0.2.6
222
+ tornado==6.4.2
223
+ tqdm==4.67.1
224
+ trampoline==0.1.2
225
+ triton==3.1.0
226
+ typing-inspect==0.9.0
227
+ typing_extensions==4.12.2
228
+ tzdata==2025.1
229
+ urllib3==2.3.0
230
+ watchdog==6.0.0
231
+ wheel==0.45.1
232
+ wkhtmltopdf==0.2
233
+ wrapt==1.17.2
234
+ yarl==1.18.3
235
+ zstandard==0.23.0
zega_logo.png ADDED