Spaces:

VEDAGI1
/

Medica_DecisionSupportAI

Sleeping

App Files Files Community

VEDAGI1 commited on 27 days ago

Commit

2f37ded

verified ·

1 Parent(s): c8c6f45

Update app.py

Browse files

Files changed (1) hide show

app.py +31 -23

app.py CHANGED Viewed

@@ -100,12 +100,12 @@ def safe_log(event_name: str, meta: dict | None = None):
 def _create_python_script(user_scenario: str, schema_context: str) -> str:
     """
-    IMPROVED: Generates a Python script using a "Plan-and-Execute" approach.
-    The AI first creates a step-by-step plan, then writes code to execute it.
-    This ensures the analysis is logical, correctly aggregated, and aligned with the user's goal.
     """
     prompt_for_coder = f"""\
-You are an expert-level Python data scientist acting as a consultant. Your task is to analyze data to answer a user's business request.
 --- USER'S SCENARIO ---
 {user_scenario}
@@ -115,32 +115,40 @@ You are an expert-level Python data scientist acting as a consultant. Your task
 {schema_context}
 --- END DATA SCHEMA ---
-You must follow a rigorous two-step process:
-**Step 1: Create a Detailed Analysis Plan.**
-First, think step-by-step. Deconstruct the user's request into a clear, logical plan.
-The plan must identify the key metrics, necessary data manipulations (cleaning, grouping, aggregation), and the final outputs required.
-- **CRITICAL for aggregation:** If the user asks for analysis by category (e.g., "specialty," "department"), you MUST identify the correct high-level categorical column for grouping. DO NOT aggregate by granular, free-text procedure descriptions unless explicitly asked. Your goal is to find meaningful, strategic trends.
-**Step 2: Write the Python Script.**
-Based on your plan, write a complete Python script.
 CRITICAL SCRIPTING RULES:
-1.  **NO FILE READING:** The data is already loaded into a list of pandas DataFrames called `dfs`. You MUST use this variable. Do not include `pd.read_csv`.
-2.  **STRICTLY JSON OUTPUT:** The script's ONLY output to stdout MUST be a single, well-structured JSON object containing all the raw data findings from your plan.
-3.  **ROBUST DATA CLEANING:** Before performing calculations, clean data robustly. Convert numeric columns to numbers using `pd.to_numeric(..., errors='coerce')`. Handle missing values (`NaN`) appropriately (e.g., by excluding them from averages).
-4.  **JSON SERIALIZATION:** Ensure all data in the final dictionary is JSON-serializable. Use `.item()` for single numpy values and `.tolist()` for arrays/series.
 Now, provide your response in the following format:
 **ANALYSIS PLAN:**
 ```text
-1.  **Objective:** [Briefly state the main goal]
-2.  **Data Cleaning:** [Describe steps to clean and prepare the data]
-3.  **Analysis Step A:** [e.g., "Calculate average wait times per hospital by grouping `dfs[0]` by 'Facility' and averaging 'Surgery_Median'."]
-4.  **Analysis Step B:** [e.g., "Identify top 5 specialties by grouping `dfs[0]` by the 'Specialty' column and calculating the mean of 'Surgery_Median'."]
-5.  **Analysis Step C:** [e.g., "Determine zone-level performance by grouping by 'Zone' and comparing to the overall provincial average."]
-6.  **JSON Output Structure:** [Describe the keys and values of the final JSON object]
 # Your complete Python script starts here
 import pandas as pd
@@ -150,8 +158,8 @@ import re
 # Main analysis logic...
 # ...
 # Final print statement
-print(json.dumps(final_data_structure, indent=4))"""
     generated_text = cohere_chat(prompt_for_coder)
     # This regex is more robust for extracting the final code block
     match = re2.search(r"PYTHON SCRIPT:\s*```python\n(.*?)```", generated_text, re2.DOTALL)

 def _create_python_script(user_scenario: str, schema_context: str) -> str:
     """
+    IMPROVED: Generates a Python script using a universal "Map, Plan, Execute" approach.
+    The AI first maps user concepts to data columns, then plans and executes the analysis.
+    This ensures the logic is robust, dynamic, and not hardcoded to a specific dataset.
     """
     prompt_for_coder = f"""\
+You are an expert-level, universal Python data scientist. Your task is to dynamically analyze any provided dataset(s) to answer a user's business request.
 --- USER'S SCENARIO ---
 {user_scenario}
 {schema_context}
 --- END DATA SCHEMA ---
+You must follow a rigorous three-step "Map, Plan, Execute" process:
+**Step 1: Map Concepts to Data.**
+First, analyze the user's scenario and the provided data schemas. Identify the key business concepts (e.g., "hospitals", "sales", "regions") and metrics (e.g., "wait times", "revenue", "population"). Then, create a logical mapping from these concepts to the actual column names in the provided DataFrames. State these mappings clearly. This is the most critical step to ensure your analysis is relevant.
+**Step 2: Create a Detailed Analysis Plan.**
+Based on your mapping, formulate a step-by-step plan. Describe the data cleaning, merging, grouping, and aggregation steps needed to answer the user's request using the columns you identified.
+**Step 3: Write the Python Script.**
+Based on your plan, write a complete Python script that performs the analysis.
 CRITICAL SCRIPTING RULES:
+1.  **DYNAMIC DATAFRAME IDENTIFICATION:** The order of DataFrames in the `dfs` list is NOT guaranteed. Your script MUST identify the correct DataFrame to use for each part of the analysis by checking for the presence of the columns you mapped in Step 1. Do NOT use hardcoded indices like `dfs[0]`.
+2.  **VERIFY COLUMN EXISTENCE:** Only use columns that you have explicitly identified and mapped in your plan. This will prevent `KeyError`.
+3.  **NO FILE READING:** The data is already in the `dfs` list.
+4.  **STRICTLY JSON OUTPUT:** The script's ONLY output must be a single JSON object.
+5.  **ROBUST & GENERIC:** Write robust code that can handle potential missing data (`errors='coerce'`, checking for `None`) and is not hardcoded to specific values from this single request.
 Now, provide your response in the following format:
 **ANALYSIS PLAN:**
 ```text
+**1. Concept-to-Column Mapping:**
+- Concept: [e.g., 'Hospitals' or 'Facilities'] -> Mapped Column: [e.g., `Facility` from the wait times dataframe]
+- Concept: [e.g., 'Surgical Wait Time' Metric] -> Mapped Column: [e.g., `Surgery_Median` from the wait times dataframe]
+- Concept: [e.g., 'Geographic Locations'] -> Mapped Columns: [e.g., `latitude`, `longitude` from the facilities dataframe]
+**2. Step-by-Step Analysis:**
+1.  **Data Identification:** Identify the necessary dataframes by checking for the mapped columns (e.g., find the DF with 'Surgery_Median', find the DF with 'facility_name').
+2.  **Data Cleaning:** [Describe steps, e.g., "Convert metric columns to numeric using `pd.to_numeric`..."]
+3.  **Analysis Step A:** [e.g., "Group the primary dataframe by the 'Facility' column and calculate the mean of the 'Surgery_Median' column..."]
+4.  ...
+the final JSON object]
 # Your complete Python script starts here
 import pandas as pd
 # Main analysis logic...
 # ...
 # Final print statement
+print(json.dumps(final_data_structure, indent=4))
+"""
     generated_text = cohere_chat(prompt_for_coder)
     # This regex is more robust for extracting the final code block
     match = re2.search(r"PYTHON SCRIPT:\s*```python\n(.*?)```", generated_text, re2.DOTALL)