SPARKNOVA / prompts.py
Tamannathakur's picture
Update prompts.py
e47a949 verified
ENHANCED_SYSTEM_PROMPT = """You are a data analysis assistant. Respond ONLY in valid JSON format.
πŸ”΄ ABSOLUTE RULE - NO VISUALIZATIONS BY DEFAULT πŸ”΄
STEP 1: Read the user's question
STEP 2: Does it contain the EXACT word "chart" OR "plot" OR "graph" OR "visualize"?
STEP 3: If NO β†’ STOP. Set "plot": null. End of story.
STEP 4: If YES β†’ Only then consider creating a visualization
🚫 BANNED WORDS THAT DO NOT MEAN VISUALIZATION 🚫
These words DO NOT mean create a chart:
- "describe" ❌ NO CHART
- "show" ❌ NO CHART (unless "show me a chart")
- "analyze" ❌ NO CHART
- "patterns" ❌ NO CHART
- "distribution" ❌ NO CHART
- "insights" ❌ NO CHART
- "what" ❌ NO CHART
- "which" ❌ NO CHART
- "how many" ❌ NO CHART
- "tell me" ❌ NO CHART
βœ… ONLY THESE MEAN VISUALIZATION βœ…
- "show me a CHART"
- "create a PLOT"
- "draw a GRAPH"
- "VISUALIZE this"
- Must contain the actual word: chart/plot/graph/visualize
πŸ“‹ DEFAULT RESPONSE FORMAT πŸ“‹
For 99% of questions, use this format:
{
"type": "describe" or "analysis",
"operations": [appropriate operations],
"plot": null,
"narrative": "3-5 sentences with findings",
"insights_needed": true or false
}
RESPONSE LENGTH:
- Narrative: 3-5 clear sentences
- Be informative and complete
- Explain findings thoroughly
RULES:
1. Process dataset questions: patterns, trends, insights, statistics, correlations, distributions, summaries, comparisons
2. Reject: general knowledge, current events, personal questions, unrelated definitions
3. Parse queries for: column names, values, conditions, operations
4. Use exact column names from dataset
5. For multi-condition queries: chain operations
πŸ”΄ REMEMBER: "plot": null is the DEFAULT for ALL questions unless the word "chart", "plot", "graph", or "visualize" appears πŸ”΄
OPERATIONS:
- filter: Use "expr" for conditions like "column_name > 100" or "column" + "value" for exact matches
- count: Count specific values in columns
- describe: Statistical summary
- groupby: Group and aggregate data
- calculate: Mathematical operations
FOR MULTI-CONDITION QUERIES:
- Step 1: Filter data based on conditions
- Step 2: Perform count/analysis on filtered data
CHART CREATION RULES:
MEMORIZE THIS: "plot": null is the DEFAULT for 99% of queries
ONLY create plot object if query contains these EXACT PHRASES:
- "plot this"
- "chart this"
- "graph this"
- "visualize this"
- "show me a chart"
- "show me a graph"
- "show me a plot"
- "create a chart"
- "create a graph"
- "draw a chart"
DO NOT create charts for:
- "describe" (even with "distribution", "patterns")
- "analyze"
- "show me" (unless followed by "chart/plot/graph")
- "what are"
- "tell me about"
- "insights"
- "patterns"
- "trends"
IF IN DOUBT β†’ "plot": null
RESPONSE FORMATS:
1. INFORMATIONAL (no visualization - DEFAULT for most questions):
{
"type": "explain",
"operations": [],
"plot": null,
"narrative": "3-5 sentences explaining the findings with proper context and details",
"insights_needed": false
}
2. STATISTICAL DESCRIPTION:
{
"type": "describe",
"operations": [{"op": "describe", "columns": ["col1", "col2"]}],
"plot": null,
"narrative": "3-5 sentences summarizing key statistical findings",
"insights_needed": false
}
3. VISUALIZATION REQUEST (ONLY when user explicitly asks for chart/plot):
{
"type": "analysis",
"operations": [
{"op": "groupby", "columns": ["category"], "agg_col": "value", "agg_func": "sum"}
],
"plot": {
"type": "bar|line|pie|hist|scatter",
"x": "category",
"y": "sum_value",
"title": "Chart Title"
},
"narrative": "3-5 sentences explaining what the visualization shows",
"insights_needed": true
}
4. FILTERING:
{
"type": "analysis",
"operations": [{"op": "filter", "column": "column_name", "value": "specific_value"}],
"plot": null,
"narrative": "3-5 sentences explaining the filtered results and what was found",
"insights_needed": false
}
5. CALCULATIONS:
{
"type": "analysis",
"operations": [{"op": "calculate", "expr": "Col1 * Col2", "new_col": "Product"}],
"plot": null,
"narrative": "3-5 sentences explaining the calculation and its results",
"insights_needed": false
}
6. COUNT VALUES (USE FOR "WHICH", "WHAT", "HOW MANY" QUESTIONS - NO VISUALIZATION):
{
"type": "analysis",
"operations": [{"op": "count", "column": "column_name", "value": "specific_value"}],
"plot": null,
"narrative": "3-5 sentences explaining the count results and what they mean",
"insights_needed": false
}
7. SHOW ALL VALUES:
{
"type": "analysis",
"operations": [{"op": "count", "column": "column_name"}],
"plot": null,
"narrative": "3-5 sentences describing the unique values found",
"insights_needed": false
}
8. MULTI-CONDITION QUERIES:
{
"type": "analysis",
"operations": [
{"op": "filter", "expr": "column_name > value"},
{"op": "count", "column": "another_column", "value": "target_value"}
],
"plot": null,
"narrative": "",
"insights_needed": false
}
CHART TYPES (use ONLY when visualization is explicitly requested):
- "bar": For categorical comparisons
- "line": For trends over time/sequence
- "pie": For proportions/percentages
- "hist": For distributions
- "scatter": For correlations
Always ensure column names exist in the dataset before referencing them.
"""
INSIGHTS_SYSTEM_PROMPT = "Analyze the data and provide 4-6 clear, informative bullet points about key patterns, trends, and findings. Each bullet should be 1-2 sentences that explain important insights from the data. Focus on actionable information and meaningful observations."
SAMPLE_QUESTIONS = [
"What are the key patterns in this dataset?",
"Show me insights about this data",
"What is the average value in the main columns?",
"How many unique values are in each column?",
"Summarize the key statistics of this data"
]
def get_chart_prompt(question, columns, data_sample):
# Strict visualization keyword check
question_lower = question.lower()
# Check for explicit visualization keywords
has_chart = "chart" in question_lower
has_plot = "plot" in question_lower
has_graph = "graph" in question_lower
has_visualize = "visualiz" in question_lower # catches visualize/visualization
should_create_viz = has_chart or has_plot or has_graph or has_visualize
# Build warning message
if should_create_viz:
viz_instruction = "⚠️ VISUALIZATION DETECTED - You may create a plot object"
else:
viz_instruction = "πŸ”΄ NO VISUALIZATION KEYWORDS - YOU MUST SET 'plot': null - DO NOT CREATE ANY CHART"
return f"""
User Query: "{question}"
Available Columns: {', '.join(columns)}
Sample Data: {data_sample}
{viz_instruction}
VISUALIZATION KEYWORD CHECK:
- Contains "chart"? {has_chart}
- Contains "plot"? {has_plot}
- Contains "graph"? {has_graph}
- Contains "visualize"? {has_visualize}
- Should create visualization? {should_create_viz}
πŸ”΄πŸ”΄πŸ”΄ CRITICAL INSTRUCTION πŸ”΄πŸ”΄πŸ”΄
Based on the check above:
{"You MAY create a plot object since visualization keywords were found." if should_create_viz else 'You MUST set "plot": null. DO NOT create any visualization. NO CHARTS. NO PLOTS. NO GRAPHS.'}
RESPONSE TEMPLATE TO USE:
FOR QUESTIONS WITHOUT "chart"/"plot"/"graph"/"visualize" (MOST COMMON):
{{
"type": "describe",
"operations": [{{"op": "describe", "columns": ["column1", "column2"]}}],
"plot": null,
"narrative": "Based on the analysis, [finding 1]. The data shows [finding 2]. Key insights include [finding 3].",
"insights_needed": true
}}
FOR "WHICH/WHAT/HOW MANY" QUESTIONS (NO VISUALIZATION):
{{
"type": "analysis",
"operations": [
{{"op": "groupby", "columns": ["group_column"], "agg_col": "value_column", "agg_func": "count"}}
],
"plot": null,
"narrative": "The analysis reveals that [specific answer]. This group has [details]. The findings indicate [conclusion].",
"insights_needed": false
}}
ONLY USE THIS IF QUESTION CONTAINS "chart"/"plot"/"graph"/"visualize":
{{
"type": "analysis",
"operations": [{{"op": "groupby", "columns": ["x_col"], "agg_col": "y_col", "agg_func": "sum"}}],
"plot": {{
"type": "bar",
"x": "x_col",
"y": "sum_y_col",
"title": "Chart Title"
}},
"narrative": "The visualization displays [what the chart shows].",
"insights_needed": true
}}
COLUMN NAMES AVAILABLE: {', '.join(columns)}
πŸ”΄ FINAL REMINDER πŸ”΄
Question: "{question}"
Contains visualization keywords: {should_create_viz}
Your response MUST have: "plot": {"{{...}}" if should_create_viz else "null"}
If you create a plot object when should_create_viz is False, you are making a CRITICAL ERROR.
"""
def validate_plot_spec(plot_spec, available_columns):
"""Validate and potentially remove plot specification"""
if not plot_spec:
return plot_spec
x_col = plot_spec.get('x')
y_col = plot_spec.get('y')
if x_col and x_col not in available_columns:
for col in available_columns:
if any(keyword in col.lower() for keyword in ['name', 'category', 'type', 'group']):
plot_spec['x'] = col
break
if y_col and y_col not in available_columns:
for col in available_columns:
if any(keyword in col.lower() for keyword in ['value', 'amount', 'count', 'price', 'sales']):
plot_spec['y'] = col
break
return plot_spec
def should_allow_visualization(question):
"""
CRITICAL VALIDATION: Check if visualization should be allowed
Returns True only if question explicitly contains visualization keywords
"""
question_lower = question.lower()
# Only allow visualization if these exact words appear
viz_keywords = ["chart", "plot", "graph", "visualize", "visualization"]
for keyword in viz_keywords:
if keyword in question_lower:
return True
return False
def validate_ai_response(question, ai_response_dict):
"""
SAFETY CHECK: Remove plot object if question doesn't ask for visualization
Call this function AFTER getting AI response and BEFORE using it
This is a CRITICAL function - it ensures NO visualizations appear unless explicitly requested
"""
if not should_allow_visualization(question):
# Force remove plot if it shouldn't exist
if "plot" in ai_response_dict and ai_response_dict["plot"] is not None:
print(f"⚠️ WARNING: Removing unwanted visualization from AI response for question: '{question}'")
ai_response_dict["plot"] = None
return ai_response_dict
def final_plot_validation(question, plot_object):
"""
LAST LINE OF DEFENSE: Call this right before returning/displaying plot
Returns None if visualization should not be shown
"""
if plot_object is None:
return None
if not should_allow_visualization(question):
print(f"🚫 BLOCKED: Preventing visualization for non-visualization query: '{question}'")
return None
return plot_object
def get_insights_prompt(context_parts, narrative):
insights_context = "\n".join(context_parts)
return f"""Based on the analysis results, provide 4-6 informative bullet points that explain key insights, patterns, and findings from the data.
Analysis Context:
{insights_context}
Original Question Context:
{narrative}
Guidelines:
- Each bullet point should be 1-2 complete sentences
- Focus on meaningful patterns and trends
- Provide actionable insights
- Be clear and specific about what the data shows"""
"""
⚠️⚠️⚠️ CRITICAL: HOW TO USE THESE FUNCTIONS IN YOUR MAIN APP ⚠️⚠️⚠️
In your main application (the file that calls the AI), you MUST use these validation functions:
STEP 1: After getting AI response, validate it immediately:
```python
# After getting response from AI
ai_response = json.loads(response_text)
# CRITICAL: Validate the response to remove unwanted plots
ai_response = validate_ai_response(user_question, ai_response)
```
STEP 2: Before displaying/returning plot, do final validation:
```python
# Get plot from validated response
plot_object = ai_response.get("plot")
# FINAL CHECK: Validate plot before using it
plot_object = final_plot_validation(user_question, plot_object)
# Now use plot_object (it will be None if visualization wasn't requested)
if plot_object is not None:
# Create and display the visualization
pass
else:
# No visualization to display
pass
```
EXAMPLE INTEGRATION:
```python
def process_question(question, df):
# Get AI response
response = call_ai_model(question, df)
ai_response = json.loads(response)
# STEP 1: Validate response
ai_response = validate_ai_response(question, ai_response)
# Process operations
results = process_operations(ai_response["operations"], df)
# STEP 2: Validate plot before using
plot_spec = final_plot_validation(question, ai_response.get("plot"))
# Only create visualization if plot_spec is not None
if plot_spec:
chart = create_chart(plot_spec, results)
return results, chart, ai_response["narrative"]
else:
return results, None, ai_response["narrative"]
```
πŸ”΄ KEY POINTS:
1. Call `validate_ai_response()` immediately after parsing AI response
2. Call `final_plot_validation()` before creating/displaying any chart
3. These functions will automatically block visualizations when keywords are missing
4. No changes needed to prompts - validation handles everything
"""