VEDAGI1 commited on
Commit
c6e557d
·
verified ·
1 Parent(s): 641f14c

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -1
app.py CHANGED
@@ -717,13 +717,14 @@ You have dataframes in a list `dfs`.
717
  CRITICAL RULES:
718
  1. **DO NOT READ FILES:** You MUST NOT include `pd.read_csv`. The data is ALREADY loaded in the `dfs` variable. You MUST use this variable. Failure to do so will cause a fatal error.
719
  2. **JSON OUTPUT ONLY:** Your script's ONLY output must be a single JSON object printed to stdout containing the raw data findings.
720
- 3. **BE PRECISE:** Use the exact, case-sensitive column names from the schema and robustly clean strings (`re.sub()`) before converting to numbers.
721
  4. **JSON SERIALIZATION:** For DataFrame-to-dict conversion, use `json.loads(df.to_json(orient='records'))` which handles type conversion automatically. For single numeric values, use `.item()`. Avoid manual type conversion loops which are error-prone.
722
  5. **SINGLE JSON OUTPUT:** Print exactly ONE JSON object at the end of your script. Do not print debug statements or multiple JSON objects.
723
  6. **VALID JSON STRUCTURE:** The output MUST be a dictionary/object, not an array or primitive value.
724
  7. **SAFE DATA JOINING:** When joining/merging dataframes or looking up values across dataframes, ALWAYS check if matches exist before accessing with `.iloc[0]`. Use `.merge()` with `how='left'` or check `len(filtered_df) > 0` before accessing rows. Never assume keys will match exactly between dataframes.
725
  8. **CONSISTENT COLUMN NAMES:** After aggregation or renaming, always use the NEW column names in subsequent operations. Track renamed columns carefully - if you rename 'Zone' to 'zone', use 'zone' everywhere after.
726
  9. **SAFE ITERATION:** When iterating over mixed data structures, always check types before accessing attributes. Not all list items are dicts (some may be strings), not all values have `.items()`.
 
727
 
728
  --- USER'S SCENARIO ---
729
  {user_scenario}
 
717
  CRITICAL RULES:
718
  1. **DO NOT READ FILES:** You MUST NOT include `pd.read_csv`. The data is ALREADY loaded in the `dfs` variable. You MUST use this variable. Failure to do so will cause a fatal error.
719
  2. **JSON OUTPUT ONLY:** Your script's ONLY output must be a single JSON object printed to stdout containing the raw data findings.
720
+ 3. **BE PRECISE:** Use ONLY the exact column names shown in the schema. NEVER guess or invent column names. If the schema shows columns like 'Indicator' and 'Value', do NOT try to access columns like 'Startup cost per client' directly - instead filter rows where Indicator matches the desired label.
721
  4. **JSON SERIALIZATION:** For DataFrame-to-dict conversion, use `json.loads(df.to_json(orient='records'))` which handles type conversion automatically. For single numeric values, use `.item()`. Avoid manual type conversion loops which are error-prone.
722
  5. **SINGLE JSON OUTPUT:** Print exactly ONE JSON object at the end of your script. Do not print debug statements or multiple JSON objects.
723
  6. **VALID JSON STRUCTURE:** The output MUST be a dictionary/object, not an array or primitive value.
724
  7. **SAFE DATA JOINING:** When joining/merging dataframes or looking up values across dataframes, ALWAYS check if matches exist before accessing with `.iloc[0]`. Use `.merge()` with `how='left'` or check `len(filtered_df) > 0` before accessing rows. Never assume keys will match exactly between dataframes.
725
  8. **CONSISTENT COLUMN NAMES:** After aggregation or renaming, always use the NEW column names in subsequent operations. Track renamed columns carefully - if you rename 'Zone' to 'zone', use 'zone' everywhere after.
726
  9. **SAFE ITERATION:** When iterating over mixed data structures, always check types before accessing attributes. Not all list items are dicts (some may be strings), not all values have `.items()`.
727
+ 10. **KEY-VALUE DATA PATTERN:** Many healthcare datasets use key-value format (e.g., columns: 'Indicator'/'Value' or 'Metric'/'Amount'). To extract a specific value, filter rows by the key column, then access the value column: `df.loc[df['Indicator'] == 'Cost per client', 'Value'].iloc[0]`
728
 
729
  --- USER'S SCENARIO ---
730
  {user_scenario}