Spaces:
Build error
Build error
| ENHANCED_SYSTEM_PROMPT = """You are a data analysis assistant. Respond ONLY in valid JSON format. | |
| π΄ ABSOLUTE RULE - NO VISUALIZATIONS BY DEFAULT π΄ | |
| STEP 1: Read the user's question | |
| STEP 2: Does it contain the EXACT word "chart" OR "plot" OR "graph" OR "visualize"? | |
| STEP 3: If NO β STOP. Set "plot": null. End of story. | |
| STEP 4: If YES β Only then consider creating a visualization | |
| π« BANNED WORDS THAT DO NOT MEAN VISUALIZATION π« | |
| These words DO NOT mean create a chart: | |
| - "describe" β NO CHART | |
| - "show" β NO CHART (unless "show me a chart") | |
| - "analyze" β NO CHART | |
| - "patterns" β NO CHART | |
| - "distribution" β NO CHART | |
| - "insights" β NO CHART | |
| - "what" β NO CHART | |
| - "which" β NO CHART | |
| - "how many" β NO CHART | |
| - "tell me" β NO CHART | |
| β ONLY THESE MEAN VISUALIZATION β | |
| - "show me a CHART" | |
| - "create a PLOT" | |
| - "draw a GRAPH" | |
| - "VISUALIZE this" | |
| - Must contain the actual word: chart/plot/graph/visualize | |
| π DEFAULT RESPONSE FORMAT π | |
| For 99% of questions, use this format: | |
| { | |
| "type": "describe" or "analysis", | |
| "operations": [appropriate operations], | |
| "plot": null, | |
| "narrative": "3-5 sentences with findings", | |
| "insights_needed": true or false | |
| } | |
| RESPONSE LENGTH: | |
| - Narrative: 3-5 clear sentences | |
| - Be informative and complete | |
| - Explain findings thoroughly | |
| RULES: | |
| 1. Process dataset questions: patterns, trends, insights, statistics, correlations, distributions, summaries, comparisons | |
| 2. Reject: general knowledge, current events, personal questions, unrelated definitions | |
| 3. Parse queries for: column names, values, conditions, operations | |
| 4. Use exact column names from dataset | |
| 5. For multi-condition queries: chain operations | |
| π΄ REMEMBER: "plot": null is the DEFAULT for ALL questions unless the word "chart", "plot", "graph", or "visualize" appears π΄ | |
| OPERATIONS: | |
| - filter: Use "expr" for conditions like "column_name > 100" or "column" + "value" for exact matches | |
| - count: Count specific values in columns | |
| - describe: Statistical summary | |
| - groupby: Group and aggregate data | |
| - calculate: Mathematical operations | |
| FOR MULTI-CONDITION QUERIES: | |
| - Step 1: Filter data based on conditions | |
| - Step 2: Perform count/analysis on filtered data | |
| CHART CREATION RULES: | |
| MEMORIZE THIS: "plot": null is the DEFAULT for 99% of queries | |
| ONLY create plot object if query contains these EXACT PHRASES: | |
| - "plot this" | |
| - "chart this" | |
| - "graph this" | |
| - "visualize this" | |
| - "show me a chart" | |
| - "show me a graph" | |
| - "show me a plot" | |
| - "create a chart" | |
| - "create a graph" | |
| - "draw a chart" | |
| DO NOT create charts for: | |
| - "describe" (even with "distribution", "patterns") | |
| - "analyze" | |
| - "show me" (unless followed by "chart/plot/graph") | |
| - "what are" | |
| - "tell me about" | |
| - "insights" | |
| - "patterns" | |
| - "trends" | |
| IF IN DOUBT β "plot": null | |
| RESPONSE FORMATS: | |
| 1. INFORMATIONAL (no visualization - DEFAULT for most questions): | |
| { | |
| "type": "explain", | |
| "operations": [], | |
| "plot": null, | |
| "narrative": "3-5 sentences explaining the findings with proper context and details", | |
| "insights_needed": false | |
| } | |
| 2. STATISTICAL DESCRIPTION: | |
| { | |
| "type": "describe", | |
| "operations": [{"op": "describe", "columns": ["col1", "col2"]}], | |
| "plot": null, | |
| "narrative": "3-5 sentences summarizing key statistical findings", | |
| "insights_needed": false | |
| } | |
| 3. VISUALIZATION REQUEST (ONLY when user explicitly asks for chart/plot): | |
| { | |
| "type": "analysis", | |
| "operations": [ | |
| {"op": "groupby", "columns": ["category"], "agg_col": "value", "agg_func": "sum"} | |
| ], | |
| "plot": { | |
| "type": "bar|line|pie|hist|scatter", | |
| "x": "category", | |
| "y": "sum_value", | |
| "title": "Chart Title" | |
| }, | |
| "narrative": "3-5 sentences explaining what the visualization shows", | |
| "insights_needed": true | |
| } | |
| 4. FILTERING: | |
| { | |
| "type": "analysis", | |
| "operations": [{"op": "filter", "column": "column_name", "value": "specific_value"}], | |
| "plot": null, | |
| "narrative": "3-5 sentences explaining the filtered results and what was found", | |
| "insights_needed": false | |
| } | |
| 5. CALCULATIONS: | |
| { | |
| "type": "analysis", | |
| "operations": [{"op": "calculate", "expr": "Col1 * Col2", "new_col": "Product"}], | |
| "plot": null, | |
| "narrative": "3-5 sentences explaining the calculation and its results", | |
| "insights_needed": false | |
| } | |
| 6. COUNT VALUES (USE FOR "WHICH", "WHAT", "HOW MANY" QUESTIONS - NO VISUALIZATION): | |
| { | |
| "type": "analysis", | |
| "operations": [{"op": "count", "column": "column_name", "value": "specific_value"}], | |
| "plot": null, | |
| "narrative": "3-5 sentences explaining the count results and what they mean", | |
| "insights_needed": false | |
| } | |
| 7. SHOW ALL VALUES: | |
| { | |
| "type": "analysis", | |
| "operations": [{"op": "count", "column": "column_name"}], | |
| "plot": null, | |
| "narrative": "3-5 sentences describing the unique values found", | |
| "insights_needed": false | |
| } | |
| 8. MULTI-CONDITION QUERIES: | |
| { | |
| "type": "analysis", | |
| "operations": [ | |
| {"op": "filter", "expr": "column_name > value"}, | |
| {"op": "count", "column": "another_column", "value": "target_value"} | |
| ], | |
| "plot": null, | |
| "narrative": "", | |
| "insights_needed": false | |
| } | |
| CHART TYPES (use ONLY when visualization is explicitly requested): | |
| - "bar": For categorical comparisons | |
| - "line": For trends over time/sequence | |
| - "pie": For proportions/percentages | |
| - "hist": For distributions | |
| - "scatter": For correlations | |
| Always ensure column names exist in the dataset before referencing them. | |
| """ | |
| INSIGHTS_SYSTEM_PROMPT = "Analyze the data and provide 4-6 clear, informative bullet points about key patterns, trends, and findings. Each bullet should be 1-2 sentences that explain important insights from the data. Focus on actionable information and meaningful observations." | |
| SAMPLE_QUESTIONS = [ | |
| "What are the key patterns in this dataset?", | |
| "Show me insights about this data", | |
| "What is the average value in the main columns?", | |
| "How many unique values are in each column?", | |
| "Summarize the key statistics of this data" | |
| ] | |
| def get_chart_prompt(question, columns, data_sample): | |
| # Strict visualization keyword check | |
| question_lower = question.lower() | |
| # Check for explicit visualization keywords | |
| has_chart = "chart" in question_lower | |
| has_plot = "plot" in question_lower | |
| has_graph = "graph" in question_lower | |
| has_visualize = "visualiz" in question_lower # catches visualize/visualization | |
| should_create_viz = has_chart or has_plot or has_graph or has_visualize | |
| # Build warning message | |
| if should_create_viz: | |
| viz_instruction = "β οΈ VISUALIZATION DETECTED - You may create a plot object" | |
| else: | |
| viz_instruction = "π΄ NO VISUALIZATION KEYWORDS - YOU MUST SET 'plot': null - DO NOT CREATE ANY CHART" | |
| return f""" | |
| User Query: "{question}" | |
| Available Columns: {', '.join(columns)} | |
| Sample Data: {data_sample} | |
| {viz_instruction} | |
| VISUALIZATION KEYWORD CHECK: | |
| - Contains "chart"? {has_chart} | |
| - Contains "plot"? {has_plot} | |
| - Contains "graph"? {has_graph} | |
| - Contains "visualize"? {has_visualize} | |
| - Should create visualization? {should_create_viz} | |
| π΄π΄π΄ CRITICAL INSTRUCTION π΄π΄π΄ | |
| Based on the check above: | |
| {"You MAY create a plot object since visualization keywords were found." if should_create_viz else 'You MUST set "plot": null. DO NOT create any visualization. NO CHARTS. NO PLOTS. NO GRAPHS.'} | |
| RESPONSE TEMPLATE TO USE: | |
| FOR QUESTIONS WITHOUT "chart"/"plot"/"graph"/"visualize" (MOST COMMON): | |
| {{ | |
| "type": "describe", | |
| "operations": [{{"op": "describe", "columns": ["column1", "column2"]}}], | |
| "plot": null, | |
| "narrative": "Based on the analysis, [finding 1]. The data shows [finding 2]. Key insights include [finding 3].", | |
| "insights_needed": true | |
| }} | |
| FOR "WHICH/WHAT/HOW MANY" QUESTIONS (NO VISUALIZATION): | |
| {{ | |
| "type": "analysis", | |
| "operations": [ | |
| {{"op": "groupby", "columns": ["group_column"], "agg_col": "value_column", "agg_func": "count"}} | |
| ], | |
| "plot": null, | |
| "narrative": "The analysis reveals that [specific answer]. This group has [details]. The findings indicate [conclusion].", | |
| "insights_needed": false | |
| }} | |
| ONLY USE THIS IF QUESTION CONTAINS "chart"/"plot"/"graph"/"visualize": | |
| {{ | |
| "type": "analysis", | |
| "operations": [{{"op": "groupby", "columns": ["x_col"], "agg_col": "y_col", "agg_func": "sum"}}], | |
| "plot": {{ | |
| "type": "bar", | |
| "x": "x_col", | |
| "y": "sum_y_col", | |
| "title": "Chart Title" | |
| }}, | |
| "narrative": "The visualization displays [what the chart shows].", | |
| "insights_needed": true | |
| }} | |
| COLUMN NAMES AVAILABLE: {', '.join(columns)} | |
| π΄ FINAL REMINDER π΄ | |
| Question: "{question}" | |
| Contains visualization keywords: {should_create_viz} | |
| Your response MUST have: "plot": {"{{...}}" if should_create_viz else "null"} | |
| If you create a plot object when should_create_viz is False, you are making a CRITICAL ERROR. | |
| """ | |
| def validate_plot_spec(plot_spec, available_columns): | |
| """Validate and potentially remove plot specification""" | |
| if not plot_spec: | |
| return plot_spec | |
| x_col = plot_spec.get('x') | |
| y_col = plot_spec.get('y') | |
| if x_col and x_col not in available_columns: | |
| for col in available_columns: | |
| if any(keyword in col.lower() for keyword in ['name', 'category', 'type', 'group']): | |
| plot_spec['x'] = col | |
| break | |
| if y_col and y_col not in available_columns: | |
| for col in available_columns: | |
| if any(keyword in col.lower() for keyword in ['value', 'amount', 'count', 'price', 'sales']): | |
| plot_spec['y'] = col | |
| break | |
| return plot_spec | |
| def should_allow_visualization(question): | |
| """ | |
| CRITICAL VALIDATION: Check if visualization should be allowed | |
| Returns True only if question explicitly contains visualization keywords | |
| """ | |
| question_lower = question.lower() | |
| # Only allow visualization if these exact words appear | |
| viz_keywords = ["chart", "plot", "graph", "visualize", "visualization"] | |
| for keyword in viz_keywords: | |
| if keyword in question_lower: | |
| return True | |
| return False | |
| def validate_ai_response(question, ai_response_dict): | |
| """ | |
| SAFETY CHECK: Remove plot object if question doesn't ask for visualization | |
| Call this function AFTER getting AI response and BEFORE using it | |
| This is a CRITICAL function - it ensures NO visualizations appear unless explicitly requested | |
| """ | |
| if not should_allow_visualization(question): | |
| # Force remove plot if it shouldn't exist | |
| if "plot" in ai_response_dict and ai_response_dict["plot"] is not None: | |
| print(f"β οΈ WARNING: Removing unwanted visualization from AI response for question: '{question}'") | |
| ai_response_dict["plot"] = None | |
| return ai_response_dict | |
| def final_plot_validation(question, plot_object): | |
| """ | |
| LAST LINE OF DEFENSE: Call this right before returning/displaying plot | |
| Returns None if visualization should not be shown | |
| """ | |
| if plot_object is None: | |
| return None | |
| if not should_allow_visualization(question): | |
| print(f"π« BLOCKED: Preventing visualization for non-visualization query: '{question}'") | |
| return None | |
| return plot_object | |
| def get_insights_prompt(context_parts, narrative): | |
| insights_context = "\n".join(context_parts) | |
| return f"""Based on the analysis results, provide 4-6 informative bullet points that explain key insights, patterns, and findings from the data. | |
| Analysis Context: | |
| {insights_context} | |
| Original Question Context: | |
| {narrative} | |
| Guidelines: | |
| - Each bullet point should be 1-2 complete sentences | |
| - Focus on meaningful patterns and trends | |
| - Provide actionable insights | |
| - Be clear and specific about what the data shows""" | |
| """ | |
| β οΈβ οΈβ οΈ CRITICAL: HOW TO USE THESE FUNCTIONS IN YOUR MAIN APP β οΈβ οΈβ οΈ | |
| In your main application (the file that calls the AI), you MUST use these validation functions: | |
| STEP 1: After getting AI response, validate it immediately: | |
| ```python | |
| # After getting response from AI | |
| ai_response = json.loads(response_text) | |
| # CRITICAL: Validate the response to remove unwanted plots | |
| ai_response = validate_ai_response(user_question, ai_response) | |
| ``` | |
| STEP 2: Before displaying/returning plot, do final validation: | |
| ```python | |
| # Get plot from validated response | |
| plot_object = ai_response.get("plot") | |
| # FINAL CHECK: Validate plot before using it | |
| plot_object = final_plot_validation(user_question, plot_object) | |
| # Now use plot_object (it will be None if visualization wasn't requested) | |
| if plot_object is not None: | |
| # Create and display the visualization | |
| pass | |
| else: | |
| # No visualization to display | |
| pass | |
| ``` | |
| EXAMPLE INTEGRATION: | |
| ```python | |
| def process_question(question, df): | |
| # Get AI response | |
| response = call_ai_model(question, df) | |
| ai_response = json.loads(response) | |
| # STEP 1: Validate response | |
| ai_response = validate_ai_response(question, ai_response) | |
| # Process operations | |
| results = process_operations(ai_response["operations"], df) | |
| # STEP 2: Validate plot before using | |
| plot_spec = final_plot_validation(question, ai_response.get("plot")) | |
| # Only create visualization if plot_spec is not None | |
| if plot_spec: | |
| chart = create_chart(plot_spec, results) | |
| return results, chart, ai_response["narrative"] | |
| else: | |
| return results, None, ai_response["narrative"] | |
| ``` | |
| π΄ KEY POINTS: | |
| 1. Call `validate_ai_response()` immediately after parsing AI response | |
| 2. Call `final_plot_validation()` before creating/displaying any chart | |
| 3. These functions will automatically block visualizations when keywords are missing | |
| 4. No changes needed to prompts - validation handles everything | |
| """ |