Spaces:

k-pavlo
/

excel-ai-analyzer

Sleeping

App Files Files Community

Pavlo Kostianov commited on Sep 14

Commit

c74321c

1 Parent(s): bfac968

Improve Budget sheet handling, schema and prompt.

Browse files

Files changed (1) hide show

app.py +39 -12

app.py CHANGED Viewed

@@ -42,8 +42,14 @@ for sheet, raw_df in df1_all.items():
         )
         # Part 2: summary section (rows 21–33)
-        budget_summary = budget_raw.iloc[21:33].reset_index(drop=True)
         # Add into df1 dict
         df1["Budget_Details"] = budget_details
         df1["Budget_Summary"] = budget_summary
@@ -80,7 +86,16 @@ for sheet, raw_df in df2_all.items():
 def get_schema_info():
     lines = ["Report 1 - OC Onboarding Information:"]
     for sheet, df in df1.items():
-        lines.append(f"Sheet: {sheet}, Columns: {list(df.columns)}")
         try:
             sample = df.head(1).to_dict(orient="records")[0]
             lines.append(f"Example row: {sample}")
@@ -149,35 +164,44 @@ The reports are loaded as dictionaries of DataFrames:
 - Access Report 1 with df1['SheetName']
 - Access Report 2 with df2['SheetName']
 Do not reload Excel files with pandas.
-If unsure is the question relevant, try to reason using columns available.
 If absolutely no relation to provided sheets, respond with:
 "I can only answer questions about the provided Excel reports."
-The reports are OC Onboarding Information and The Alex Ideas Report.
-OC Onboarding Information (df1) is the initial hotel data including a budget set by the hotel for the year.
-- "Budget_Details": line-item breakdown (channels, rates, revenue, etc.)
-- "Budget_Summary": summary rows (total occupancy, RevPar for each month, etc.)
-The Alex Ideas Report (df2) is the real reports about revenue of the hotel for the year.
-They have the following schema (sheet names, columns, and example rows, but Budget_Summary sheet is fully loaded):
 {schema_info}
 The user asked:
 {message}
 Rules:
 - Use ONLY pandas, df1, df2, and Python built-ins.
 - Do NOT write import statements (pandas is already imported as pd).
 - Access all dataframes ONLY as df1["SheetName"] or df2["SheetName"].
-  Never assign them to new variables like budget_df, property_df, etc.
-- ALWAYS put the answer in a variable named `result`.
 - Return ONLY Python code, nothing else.
 - If multiple values are tied for the maximum, include all of them in a list.
 - If result is numeric, round to 2 decimal places.
 - If result is a list, return the full list (not just the first element).
-- If a column is missing, return a clear error string in `result`, do not crash.
     """
     try:
         # Ask OpenAI
         response = client.chat.completions.create(
@@ -232,6 +256,9 @@ Rules:
             "round": round,
             "KeyError": KeyError,
             "ValueError": ValueError,
         }
         # Execute code

         )
         # Part 2: summary section (rows 21–33)
+        headers = budget_raw.iloc[2, 4:].tolist()   # row3, cols E+ as headers
+        metrics = budget_raw.iloc[20:33, 1].tolist()  # colB = row labels
+        values = budget_raw.iloc[20:33, 4:].reset_index(drop=True)
+        values.columns = headers
+        values.insert(0, "Metric", metrics)
+        budget_summary = values
         # Add into df1 dict
         df1["Budget_Details"] = budget_details
         df1["Budget_Summary"] = budget_summary
 def get_schema_info():
     lines = ["Report 1 - OC Onboarding Information:"]
     for sheet, df in df1.items():
+        if sheet == "Budget_Details":
+            lines.append("Sheet: Budget_Details (line-item breakdown by channel/segment)")
+            lines.append(f"Columns: {list(df.columns)}")
+        elif sheet == "Budget_Summary":
+            lines.append("Sheet: Budget_Summary (metrics by month; first column = 'Metric')")
+            lines.append(f"Metrics available: {df['Metric'].tolist()}")
+            lines.append(f"Month columns: {[c for c in df.columns if c != 'Metric']}")
+        else:
+            lines.append(f"Sheet: {sheet}, Columns: {list(df.columns)}")
+        # Add one example row for context
         try:
             sample = df.head(1).to_dict(orient="records")[0]
             lines.append(f"Example row: {sample}")
 - Access Report 1 with df1['SheetName']
 - Access Report 2 with df2['SheetName']
 Do not reload Excel files with pandas.
+If unsure whether the question is relevant, try to reason using the available columns.
 If absolutely no relation to provided sheets, respond with:
 "I can only answer questions about the provided Excel reports."
+The reports are:
+- OC Onboarding Information (df1): planned hotel data including a budget set by the hotel for the year.
+    - "Budget_Details": line-item breakdown (channels, segments, rates, rooms, revenue, ADR).
+    - "Budget_Summary": pivot-style table with 'Metric' as the first column
+      (e.g. Total, Occupancy %, RevPar, Capacity), and the other columns representing monthly values
+      such as 'Jan Rooms', 'Jan Rooms Revenue', 'Jan ADR', 'Feb Rooms', etc.
+- The Alex Ideas Report (df2): actual hotel revenue and performance data.
+They have the following schema (sheet names, columns, and example rows):
 {schema_info}
 The user asked:
 {message}
+The history of the conversation is:
+{history}
 Rules:
 - Use ONLY pandas, df1, df2, and Python built-ins.
 - Do NOT write import statements (pandas is already imported as pd).
 - Access all dataframes ONLY as df1["SheetName"] or df2["SheetName"].
+  Never assign them to new variables (e.g. budget_df, property_df).
+- Do not create aliases for dataframes. Always reference them directly.
+- When using df1["Budget_Summary"], always filter by 'Metric' first,
+  then select the appropriate month column.
+- ALWAYS put the final answer in a variable named `result`.
 - Return ONLY Python code, nothing else.
 - If multiple values are tied for the maximum, include all of them in a list.
 - If result is numeric, round to 2 decimal places.
 - If result is a list, return the full list (not just the first element).
+- If a required column or metric is missing, return a clear error string in `result` instead of crashing.
     """
     try:
         # Ask OpenAI
         response = client.chat.completions.create(
             "round": round,
             "KeyError": KeyError,
             "ValueError": ValueError,
+            "sorted": sorted,
+            "enumerate": enumerate,
+            "zip": zip,
         }
         # Execute code