Pavlo Kostianov commited on
Commit
c74321c
·
1 Parent(s): bfac968

Improve Budget sheet handling, schema and prompt.

Browse files
Files changed (1) hide show
  1. app.py +39 -12
app.py CHANGED
@@ -42,8 +42,14 @@ for sheet, raw_df in df1_all.items():
42
  )
43
 
44
  # Part 2: summary section (rows 21–33)
45
- budget_summary = budget_raw.iloc[21:33].reset_index(drop=True)
 
 
 
 
46
 
 
 
47
  # Add into df1 dict
48
  df1["Budget_Details"] = budget_details
49
  df1["Budget_Summary"] = budget_summary
@@ -80,7 +86,16 @@ for sheet, raw_df in df2_all.items():
80
  def get_schema_info():
81
  lines = ["Report 1 - OC Onboarding Information:"]
82
  for sheet, df in df1.items():
83
- lines.append(f"Sheet: {sheet}, Columns: {list(df.columns)}")
 
 
 
 
 
 
 
 
 
84
  try:
85
  sample = df.head(1).to_dict(orient="records")[0]
86
  lines.append(f"Example row: {sample}")
@@ -149,35 +164,44 @@ The reports are loaded as dictionaries of DataFrames:
149
  - Access Report 1 with df1['SheetName']
150
  - Access Report 2 with df2['SheetName']
151
  Do not reload Excel files with pandas.
152
- If unsure is the question relevant, try to reason using columns available.
153
  If absolutely no relation to provided sheets, respond with:
154
  "I can only answer questions about the provided Excel reports."
155
 
156
- The reports are OC Onboarding Information and The Alex Ideas Report.
157
- OC Onboarding Information (df1) is the initial hotel data including a budget set by the hotel for the year.
158
- - "Budget_Details": line-item breakdown (channels, rates, revenue, etc.)
159
- - "Budget_Summary": summary rows (total occupancy, RevPar for each month, etc.)
160
- The Alex Ideas Report (df2) is the real reports about revenue of the hotel for the year.
 
 
161
 
162
- They have the following schema (sheet names, columns, and example rows, but Budget_Summary sheet is fully loaded):
163
  {schema_info}
164
 
165
  The user asked:
166
  {message}
167
 
 
 
 
168
  Rules:
169
  - Use ONLY pandas, df1, df2, and Python built-ins.
170
  - Do NOT write import statements (pandas is already imported as pd).
171
  - Access all dataframes ONLY as df1["SheetName"] or df2["SheetName"].
172
- Never assign them to new variables like budget_df, property_df, etc.
173
- - ALWAYS put the answer in a variable named `result`.
 
 
 
174
  - Return ONLY Python code, nothing else.
175
  - If multiple values are tied for the maximum, include all of them in a list.
176
  - If result is numeric, round to 2 decimal places.
177
  - If result is a list, return the full list (not just the first element).
178
- - If a column is missing, return a clear error string in `result`, do not crash.
179
  """
180
 
 
181
  try:
182
  # Ask OpenAI
183
  response = client.chat.completions.create(
@@ -232,6 +256,9 @@ Rules:
232
  "round": round,
233
  "KeyError": KeyError,
234
  "ValueError": ValueError,
 
 
 
235
  }
236
 
237
  # Execute code
 
42
  )
43
 
44
  # Part 2: summary section (rows 21–33)
45
+ headers = budget_raw.iloc[2, 4:].tolist() # row3, cols E+ as headers
46
+ metrics = budget_raw.iloc[20:33, 1].tolist() # colB = row labels
47
+ values = budget_raw.iloc[20:33, 4:].reset_index(drop=True)
48
+ values.columns = headers
49
+ values.insert(0, "Metric", metrics)
50
 
51
+ budget_summary = values
52
+
53
  # Add into df1 dict
54
  df1["Budget_Details"] = budget_details
55
  df1["Budget_Summary"] = budget_summary
 
86
  def get_schema_info():
87
  lines = ["Report 1 - OC Onboarding Information:"]
88
  for sheet, df in df1.items():
89
+ if sheet == "Budget_Details":
90
+ lines.append("Sheet: Budget_Details (line-item breakdown by channel/segment)")
91
+ lines.append(f"Columns: {list(df.columns)}")
92
+ elif sheet == "Budget_Summary":
93
+ lines.append("Sheet: Budget_Summary (metrics by month; first column = 'Metric')")
94
+ lines.append(f"Metrics available: {df['Metric'].tolist()}")
95
+ lines.append(f"Month columns: {[c for c in df.columns if c != 'Metric']}")
96
+ else:
97
+ lines.append(f"Sheet: {sheet}, Columns: {list(df.columns)}")
98
+ # Add one example row for context
99
  try:
100
  sample = df.head(1).to_dict(orient="records")[0]
101
  lines.append(f"Example row: {sample}")
 
164
  - Access Report 1 with df1['SheetName']
165
  - Access Report 2 with df2['SheetName']
166
  Do not reload Excel files with pandas.
167
+ If unsure whether the question is relevant, try to reason using the available columns.
168
  If absolutely no relation to provided sheets, respond with:
169
  "I can only answer questions about the provided Excel reports."
170
 
171
+ The reports are:
172
+ - OC Onboarding Information (df1): planned hotel data including a budget set by the hotel for the year.
173
+ - "Budget_Details": line-item breakdown (channels, segments, rates, rooms, revenue, ADR).
174
+ - "Budget_Summary": pivot-style table with 'Metric' as the first column
175
+ (e.g. Total, Occupancy %, RevPar, Capacity), and the other columns representing monthly values
176
+ such as 'Jan Rooms', 'Jan Rooms Revenue', 'Jan ADR', 'Feb Rooms', etc.
177
+ - The Alex Ideas Report (df2): actual hotel revenue and performance data.
178
 
179
+ They have the following schema (sheet names, columns, and example rows):
180
  {schema_info}
181
 
182
  The user asked:
183
  {message}
184
 
185
+ The history of the conversation is:
186
+ {history}
187
+
188
  Rules:
189
  - Use ONLY pandas, df1, df2, and Python built-ins.
190
  - Do NOT write import statements (pandas is already imported as pd).
191
  - Access all dataframes ONLY as df1["SheetName"] or df2["SheetName"].
192
+ Never assign them to new variables (e.g. budget_df, property_df).
193
+ - Do not create aliases for dataframes. Always reference them directly.
194
+ - When using df1["Budget_Summary"], always filter by 'Metric' first,
195
+ then select the appropriate month column.
196
+ - ALWAYS put the final answer in a variable named `result`.
197
  - Return ONLY Python code, nothing else.
198
  - If multiple values are tied for the maximum, include all of them in a list.
199
  - If result is numeric, round to 2 decimal places.
200
  - If result is a list, return the full list (not just the first element).
201
+ - If a required column or metric is missing, return a clear error string in `result` instead of crashing.
202
  """
203
 
204
+
205
  try:
206
  # Ask OpenAI
207
  response = client.chat.completions.create(
 
256
  "round": round,
257
  "KeyError": KeyError,
258
  "ValueError": ValueError,
259
+ "sorted": sorted,
260
+ "enumerate": enumerate,
261
+ "zip": zip,
262
  }
263
 
264
  # Execute code