Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
|
@@ -672,14 +672,37 @@ def execute_in_sandbox(script: str, dataframes: List[Any]) -> str:
|
|
| 672 |
def _create_python_script(user_scenario: str, schema_context: str) -> str:
|
| 673 |
EXPERT_ANALYTICAL_GUIDELINES = """
|
| 674 |
--- EXPERT ANALYTICAL GUIDELINES ---
|
| 675 |
-
When writing your script, you MUST follow these expert
|
| 676 |
-
|
| 677 |
-
|
| 678 |
-
|
| 679 |
-
2.
|
| 680 |
-
|
| 681 |
-
|
| 682 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 683 |
"""
|
| 684 |
prompt_for_coder = f"""\
|
| 685 |
You are an expert Python data scientist. Your job is to write a script to extract the data needed to answer the user's request.
|
|
@@ -732,7 +755,7 @@ def _generate_long_report(prompt: str) -> str:
|
|
| 732 |
|
| 733 |
def _generate_final_report(user_scenario: str, validated_json_str: str) -> str:
|
| 734 |
prompt_for_writer = f"""\
|
| 735 |
-
You are an expert management consultant and data analyst.
|
| 736 |
A data science script has run to extract key findings. You have the user's original request and the validated JSON data.
|
| 737 |
|
| 738 |
Your task is to synthesize these validated findings into a single, comprehensive, and professional report that directly answers all of the user's questions with detailed justifications.
|
|
@@ -745,11 +768,41 @@ Your task is to synthesize these validated findings into a single, comprehensive
|
|
| 745 |
{validated_json_str}
|
| 746 |
--- END VALIDATED DATA ---
|
| 747 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 748 |
Now, write the final, polished report. The report MUST:
|
| 749 |
1. Follow the "Expected Output Format" requested by the user.
|
| 750 |
2. Use tables, bullet points, and DETAILED narrative justifications for each recommendation.
|
| 751 |
3. Synthesize the validated data into actionable insights. Do not just copy the raw numbers; interpret them.
|
| 752 |
4. Ensure you fully address ALL evaluation questions, especially the final recommendations.
|
|
|
|
| 753 |
"""
|
| 754 |
return _generate_long_report(prompt_for_writer)
|
| 755 |
|
|
|
|
| 672 |
def _create_python_script(user_scenario: str, schema_context: str) -> str:
|
| 673 |
EXPERT_ANALYTICAL_GUIDELINES = """
|
| 674 |
--- EXPERT ANALYTICAL GUIDELINES ---
|
| 675 |
+
When writing your script, you MUST follow these expert analytical principles:
|
| 676 |
+
|
| 677 |
+
**DATA INTEGRATION & LINKING:**
|
| 678 |
+
1. When linking datasets, identify the correct join keys by examining column names and values. Never assume column names match across datasets.
|
| 679 |
+
2. If a required column doesn't exist in a dataset, derive it from related data or clearly note its absence in the output.
|
| 680 |
+
3. Use the most recent/relevant time period data when multiple periods exist (e.g., prefer 2021 over 2013 census data if both available).
|
| 681 |
+
|
| 682 |
+
**AGGREGATION & GROUPING:**
|
| 683 |
+
4. When asked about "specialties," "categories," or "types," group by the broadest categorical column first (e.g., 'Specialty' not 'Procedure').
|
| 684 |
+
5. When asked about specific items, use the most granular level (e.g., specific procedures, individual facilities).
|
| 685 |
+
6. Always verify the appropriate level of aggregation matches the user's question.
|
| 686 |
+
|
| 687 |
+
**PRIORITIZATION & RANKING:**
|
| 688 |
+
7. To prioritize locations/facilities, create a composite risk score combining: (a) population/volume, (b) relevant health indicators, and (c) recency of data.
|
| 689 |
+
8. When ranking, consider both absolute values AND relative performance against benchmarks (provincial/national averages).
|
| 690 |
+
9. Include sample sizes/record counts alongside rankings to indicate statistical reliability.
|
| 691 |
+
|
| 692 |
+
**CALCULATIONS & ESTIMATES:**
|
| 693 |
+
10. For time-based capacity calculations, use standard assumptions: 60 working days per 3-month period, 5 days/week, unless data specifies otherwise.
|
| 694 |
+
11. For cost calculations, always separate and sum component costs (startup + ongoing + variable) before multiplying by volume.
|
| 695 |
+
12. When extracting numeric values from text fields, use robust parsing: strip currency symbols, handle ranges (take midpoint), convert percentages.
|
| 696 |
+
|
| 697 |
+
**UNITS & VALIDATION:**
|
| 698 |
+
13. Preserve and label units correctly: percentages (%), currency (CAD/USD), time (days/weeks), clinical measures (mmHg for BP, % for A1c, kg/m² for BMI).
|
| 699 |
+
14. Validate calculated values against reasonable ranges (e.g., A1c typically 4-14%, BP typically 60-200 mmHg).
|
| 700 |
+
15. Flag outliers or unexpected values in the output for human review.
|
| 701 |
+
|
| 702 |
+
**OUTPUT COMPLETENESS:**
|
| 703 |
+
16. For each evaluation question, ensure the JSON output contains all data needed to answer it fully.
|
| 704 |
+
17. Include both raw values AND calculated metrics (averages, percentages, rankings).
|
| 705 |
+
18. When comparing to benchmarks, include both the benchmark value and the comparison result.
|
| 706 |
"""
|
| 707 |
prompt_for_coder = f"""\
|
| 708 |
You are an expert Python data scientist. Your job is to write a script to extract the data needed to answer the user's request.
|
|
|
|
| 755 |
|
| 756 |
def _generate_final_report(user_scenario: str, validated_json_str: str) -> str:
|
| 757 |
prompt_for_writer = f"""\
|
| 758 |
+
You are an expert healthcare management consultant and data analyst.
|
| 759 |
A data science script has run to extract key findings. You have the user's original request and the validated JSON data.
|
| 760 |
|
| 761 |
Your task is to synthesize these validated findings into a single, comprehensive, and professional report that directly answers all of the user's questions with detailed justifications.
|
|
|
|
| 768 |
{validated_json_str}
|
| 769 |
--- END VALIDATED DATA ---
|
| 770 |
|
| 771 |
+
--- ANALYTICAL INTERPRETATION GUIDELINES ---
|
| 772 |
+
When writing your report, follow these principles:
|
| 773 |
+
|
| 774 |
+
**ACCURACY & UNITS:**
|
| 775 |
+
- Report numerical values with appropriate precision (1-2 decimal places for percentages, whole numbers for counts).
|
| 776 |
+
- Always include correct units: % for percentages, days for wait times, $ for costs, mmHg for blood pressure, % for A1c, kg/m² for BMI.
|
| 777 |
+
- Verify that values make clinical/operational sense before reporting (e.g., A1c should be 4-14%, not measured in mmHg).
|
| 778 |
+
|
| 779 |
+
**CONTEXT & BENCHMARKS:**
|
| 780 |
+
- Compare findings against relevant benchmarks (provincial averages, national standards, historical baselines).
|
| 781 |
+
- Explain what "good" vs "poor" performance means in context.
|
| 782 |
+
- Quantify differences (e.g., "50 days above average" not just "higher than average").
|
| 783 |
+
|
| 784 |
+
**CAUSATION & INTERPRETATION:**
|
| 785 |
+
- Distinguish correlation from causation; avoid overstating causal claims.
|
| 786 |
+
- Consider confounding factors (case complexity, patient demographics, resource constraints).
|
| 787 |
+
- Acknowledge data limitations and uncertainty.
|
| 788 |
+
|
| 789 |
+
**RECOMMENDATIONS:**
|
| 790 |
+
- Make recommendations specific, actionable, and tied directly to the data findings.
|
| 791 |
+
- Prioritize recommendations by impact and feasibility.
|
| 792 |
+
- Include implementation considerations (resources needed, timeline, risks).
|
| 793 |
+
- Suggest metrics for monitoring success.
|
| 794 |
+
|
| 795 |
+
**COMPLETENESS:**
|
| 796 |
+
- Address EVERY evaluation question explicitly.
|
| 797 |
+
- If data is insufficient to answer a question fully, state what's missing and provide the best available answer.
|
| 798 |
+
- Cross-reference related findings to provide a coherent narrative.
|
| 799 |
+
|
| 800 |
Now, write the final, polished report. The report MUST:
|
| 801 |
1. Follow the "Expected Output Format" requested by the user.
|
| 802 |
2. Use tables, bullet points, and DETAILED narrative justifications for each recommendation.
|
| 803 |
3. Synthesize the validated data into actionable insights. Do not just copy the raw numbers; interpret them.
|
| 804 |
4. Ensure you fully address ALL evaluation questions, especially the final recommendations.
|
| 805 |
+
5. Verify all units and values are clinically/operationally plausible before including them.
|
| 806 |
"""
|
| 807 |
return _generate_long_report(prompt_for_writer)
|
| 808 |
|