VEDAGI1 commited on
Commit
e9d0d37
·
verified ·
1 Parent(s): 750060d

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +62 -9
app.py CHANGED
@@ -672,14 +672,37 @@ def execute_in_sandbox(script: str, dataframes: List[Any]) -> str:
672
  def _create_python_script(user_scenario: str, schema_context: str) -> str:
673
  EXPERT_ANALYTICAL_GUIDELINES = """
674
  --- EXPERT ANALYTICAL GUIDELINES ---
675
- When writing your script, you MUST follow these expert business rules:
676
- 1. **Linking Datasets Rule:** If you need to connect facilities to health zones when the 'zone' column is not in the facility list,
677
- you must first identify the high-priority zone from the beds data, then find the major city (by facility count) in the facility list,
678
- and *then* assess that city's capacity. Do not try to filter the facility list by a 'zone' column if it does not exist in the schema.
679
- 2. **Prioritization Rule:** To prioritize locations, you MUST combine the most recent population data with specific high-risk health indicators
680
- to create a multi-factor risk score.
681
- 3. **Capacity Calculation Rule:** For capacity over a 3-month window, assume **60 working days**.
682
- 4. **Cost Calculation Rule:** Sum 'Startup cost' and 'Ongoing cost' per person before multiplying.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
683
  """
684
  prompt_for_coder = f"""\
685
  You are an expert Python data scientist. Your job is to write a script to extract the data needed to answer the user's request.
@@ -732,7 +755,7 @@ def _generate_long_report(prompt: str) -> str:
732
 
733
  def _generate_final_report(user_scenario: str, validated_json_str: str) -> str:
734
  prompt_for_writer = f"""\
735
- You are an expert management consultant and data analyst.
736
  A data science script has run to extract key findings. You have the user's original request and the validated JSON data.
737
 
738
  Your task is to synthesize these validated findings into a single, comprehensive, and professional report that directly answers all of the user's questions with detailed justifications.
@@ -745,11 +768,41 @@ Your task is to synthesize these validated findings into a single, comprehensive
745
  {validated_json_str}
746
  --- END VALIDATED DATA ---
747
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
748
  Now, write the final, polished report. The report MUST:
749
  1. Follow the "Expected Output Format" requested by the user.
750
  2. Use tables, bullet points, and DETAILED narrative justifications for each recommendation.
751
  3. Synthesize the validated data into actionable insights. Do not just copy the raw numbers; interpret them.
752
  4. Ensure you fully address ALL evaluation questions, especially the final recommendations.
 
753
  """
754
  return _generate_long_report(prompt_for_writer)
755
 
 
672
  def _create_python_script(user_scenario: str, schema_context: str) -> str:
673
  EXPERT_ANALYTICAL_GUIDELINES = """
674
  --- EXPERT ANALYTICAL GUIDELINES ---
675
+ When writing your script, you MUST follow these expert analytical principles:
676
+
677
+ **DATA INTEGRATION & LINKING:**
678
+ 1. When linking datasets, identify the correct join keys by examining column names and values. Never assume column names match across datasets.
679
+ 2. If a required column doesn't exist in a dataset, derive it from related data or clearly note its absence in the output.
680
+ 3. Use the most recent/relevant time period data when multiple periods exist (e.g., prefer 2021 over 2013 census data if both available).
681
+
682
+ **AGGREGATION & GROUPING:**
683
+ 4. When asked about "specialties," "categories," or "types," group by the broadest categorical column first (e.g., 'Specialty' not 'Procedure').
684
+ 5. When asked about specific items, use the most granular level (e.g., specific procedures, individual facilities).
685
+ 6. Always verify the appropriate level of aggregation matches the user's question.
686
+
687
+ **PRIORITIZATION & RANKING:**
688
+ 7. To prioritize locations/facilities, create a composite risk score combining: (a) population/volume, (b) relevant health indicators, and (c) recency of data.
689
+ 8. When ranking, consider both absolute values AND relative performance against benchmarks (provincial/national averages).
690
+ 9. Include sample sizes/record counts alongside rankings to indicate statistical reliability.
691
+
692
+ **CALCULATIONS & ESTIMATES:**
693
+ 10. For time-based capacity calculations, use standard assumptions: 60 working days per 3-month period, 5 days/week, unless data specifies otherwise.
694
+ 11. For cost calculations, always separate and sum component costs (startup + ongoing + variable) before multiplying by volume.
695
+ 12. When extracting numeric values from text fields, use robust parsing: strip currency symbols, handle ranges (take midpoint), convert percentages.
696
+
697
+ **UNITS & VALIDATION:**
698
+ 13. Preserve and label units correctly: percentages (%), currency (CAD/USD), time (days/weeks), clinical measures (mmHg for BP, % for A1c, kg/m² for BMI).
699
+ 14. Validate calculated values against reasonable ranges (e.g., A1c typically 4-14%, BP typically 60-200 mmHg).
700
+ 15. Flag outliers or unexpected values in the output for human review.
701
+
702
+ **OUTPUT COMPLETENESS:**
703
+ 16. For each evaluation question, ensure the JSON output contains all data needed to answer it fully.
704
+ 17. Include both raw values AND calculated metrics (averages, percentages, rankings).
705
+ 18. When comparing to benchmarks, include both the benchmark value and the comparison result.
706
  """
707
  prompt_for_coder = f"""\
708
  You are an expert Python data scientist. Your job is to write a script to extract the data needed to answer the user's request.
 
755
 
756
  def _generate_final_report(user_scenario: str, validated_json_str: str) -> str:
757
  prompt_for_writer = f"""\
758
+ You are an expert healthcare management consultant and data analyst.
759
  A data science script has run to extract key findings. You have the user's original request and the validated JSON data.
760
 
761
  Your task is to synthesize these validated findings into a single, comprehensive, and professional report that directly answers all of the user's questions with detailed justifications.
 
768
  {validated_json_str}
769
  --- END VALIDATED DATA ---
770
 
771
+ --- ANALYTICAL INTERPRETATION GUIDELINES ---
772
+ When writing your report, follow these principles:
773
+
774
+ **ACCURACY & UNITS:**
775
+ - Report numerical values with appropriate precision (1-2 decimal places for percentages, whole numbers for counts).
776
+ - Always include correct units: % for percentages, days for wait times, $ for costs, mmHg for blood pressure, % for A1c, kg/m² for BMI.
777
+ - Verify that values make clinical/operational sense before reporting (e.g., A1c should be 4-14%, not measured in mmHg).
778
+
779
+ **CONTEXT & BENCHMARKS:**
780
+ - Compare findings against relevant benchmarks (provincial averages, national standards, historical baselines).
781
+ - Explain what "good" vs "poor" performance means in context.
782
+ - Quantify differences (e.g., "50 days above average" not just "higher than average").
783
+
784
+ **CAUSATION & INTERPRETATION:**
785
+ - Distinguish correlation from causation; avoid overstating causal claims.
786
+ - Consider confounding factors (case complexity, patient demographics, resource constraints).
787
+ - Acknowledge data limitations and uncertainty.
788
+
789
+ **RECOMMENDATIONS:**
790
+ - Make recommendations specific, actionable, and tied directly to the data findings.
791
+ - Prioritize recommendations by impact and feasibility.
792
+ - Include implementation considerations (resources needed, timeline, risks).
793
+ - Suggest metrics for monitoring success.
794
+
795
+ **COMPLETENESS:**
796
+ - Address EVERY evaluation question explicitly.
797
+ - If data is insufficient to answer a question fully, state what's missing and provide the best available answer.
798
+ - Cross-reference related findings to provide a coherent narrative.
799
+
800
  Now, write the final, polished report. The report MUST:
801
  1. Follow the "Expected Output Format" requested by the user.
802
  2. Use tables, bullet points, and DETAILED narrative justifications for each recommendation.
803
  3. Synthesize the validated data into actionable insights. Do not just copy the raw numbers; interpret them.
804
  4. Ensure you fully address ALL evaluation questions, especially the final recommendations.
805
+ 5. Verify all units and values are clinically/operationally plausible before including them.
806
  """
807
  return _generate_long_report(prompt_for_writer)
808