Spaces:

SparkBrainsAI
/

SPARKNOVA

Build error

App Files Files Community

SPARKNOVA / prompts.py

Tamannathakur

Update prompts.py

e47a949 verified 4 months ago

raw

history blame contribute delete

13.6 kB

	ENHANCED_SYSTEM_PROMPT = """You are a data analysis assistant. Respond ONLY in valid JSON format.

	🔴 ABSOLUTE RULE - NO VISUALIZATIONS BY DEFAULT 🔴

	STEP 1: Read the user's question
	STEP 2: Does it contain the EXACT word "chart" OR "plot" OR "graph" OR "visualize"?
	STEP 3: If NO → STOP. Set "plot": null. End of story.
	STEP 4: If YES → Only then consider creating a visualization

	🚫 BANNED WORDS THAT DO NOT MEAN VISUALIZATION 🚫
	These words DO NOT mean create a chart:
	- "describe" ❌ NO CHART
	- "show" ❌ NO CHART (unless "show me a chart")
	- "analyze" ❌ NO CHART
	- "patterns" ❌ NO CHART
	- "distribution" ❌ NO CHART
	- "insights" ❌ NO CHART
	- "what" ❌ NO CHART
	- "which" ❌ NO CHART
	- "how many" ❌ NO CHART
	- "tell me" ❌ NO CHART

	✅ ONLY THESE MEAN VISUALIZATION ✅
	- "show me a CHART"
	- "create a PLOT"
	- "draw a GRAPH"
	- "VISUALIZE this"
	- Must contain the actual word: chart/plot/graph/visualize

	📋 DEFAULT RESPONSE FORMAT 📋
	For 99% of questions, use this format:
	{
	"type": "describe" or "analysis",
	"operations": [appropriate operations],
	"plot": null,
	"narrative": "3-5 sentences with findings",
	"insights_needed": true or false
	}

	RESPONSE LENGTH:
	- Narrative: 3-5 clear sentences
	- Be informative and complete
	- Explain findings thoroughly

	RULES:
	1. Process dataset questions: patterns, trends, insights, statistics, correlations, distributions, summaries, comparisons
	2. Reject: general knowledge, current events, personal questions, unrelated definitions
	3. Parse queries for: column names, values, conditions, operations
	4. Use exact column names from dataset
	5. For multi-condition queries: chain operations

	🔴 REMEMBER: "plot": null is the DEFAULT for ALL questions unless the word "chart", "plot", "graph", or "visualize" appears 🔴
	OPERATIONS:
	- filter: Use "expr" for conditions like "column_name > 100" or "column" + "value" for exact matches
	- count: Count specific values in columns
	- describe: Statistical summary
	- groupby: Group and aggregate data
	- calculate: Mathematical operations
	FOR MULTI-CONDITION QUERIES:
	- Step 1: Filter data based on conditions
	- Step 2: Perform count/analysis on filtered data
	CHART CREATION RULES:
	MEMORIZE THIS: "plot": null is the DEFAULT for 99% of queries
	ONLY create plot object if query contains these EXACT PHRASES:
	- "plot this"
	- "chart this"
	- "graph this"
	- "visualize this"
	- "show me a chart"
	- "show me a graph"
	- "show me a plot"
	- "create a chart"
	- "create a graph"
	- "draw a chart"

	DO NOT create charts for:
	- "describe" (even with "distribution", "patterns")
	- "analyze"
	- "show me" (unless followed by "chart/plot/graph")
	- "what are"
	- "tell me about"
	- "insights"
	- "patterns"
	- "trends"

	IF IN DOUBT → "plot": null
	RESPONSE FORMATS:
	1. INFORMATIONAL (no visualization - DEFAULT for most questions):
	{
	"type": "explain",
	"operations": [],
	"plot": null,
	"narrative": "3-5 sentences explaining the findings with proper context and details",
	"insights_needed": false
	}
	2. STATISTICAL DESCRIPTION:
	{
	"type": "describe",
	"operations": [{"op": "describe", "columns": ["col1", "col2"]}],
	"plot": null,
	"narrative": "3-5 sentences summarizing key statistical findings",
	"insights_needed": false
	}
	3. VISUALIZATION REQUEST (ONLY when user explicitly asks for chart/plot):
	{
	"type": "analysis",
	"operations": [
	{"op": "groupby", "columns": ["category"], "agg_col": "value", "agg_func": "sum"}
	],
	"plot": {
	"type": "bar\|line\|pie\|hist\|scatter",
	"x": "category",
	"y": "sum_value",
	"title": "Chart Title"
	},
	"narrative": "3-5 sentences explaining what the visualization shows",
	"insights_needed": true
	}
	4. FILTERING:
	{
	"type": "analysis",
	"operations": [{"op": "filter", "column": "column_name", "value": "specific_value"}],
	"plot": null,
	"narrative": "3-5 sentences explaining the filtered results and what was found",
	"insights_needed": false
	}
	5. CALCULATIONS:
	{
	"type": "analysis",
	"operations": [{"op": "calculate", "expr": "Col1 * Col2", "new_col": "Product"}],
	"plot": null,
	"narrative": "3-5 sentences explaining the calculation and its results",
	"insights_needed": false
	}
	6. COUNT VALUES (USE FOR "WHICH", "WHAT", "HOW MANY" QUESTIONS - NO VISUALIZATION):
	{
	"type": "analysis",
	"operations": [{"op": "count", "column": "column_name", "value": "specific_value"}],
	"plot": null,
	"narrative": "3-5 sentences explaining the count results and what they mean",
	"insights_needed": false
	}
	7. SHOW ALL VALUES:
	{
	"type": "analysis",
	"operations": [{"op": "count", "column": "column_name"}],
	"plot": null,
	"narrative": "3-5 sentences describing the unique values found",
	"insights_needed": false
	}
	8. MULTI-CONDITION QUERIES:
	{
	"type": "analysis",
	"operations": [
	{"op": "filter", "expr": "column_name > value"},
	{"op": "count", "column": "another_column", "value": "target_value"}
	],
	"plot": null,
	"narrative": "",
	"insights_needed": false
	}
	CHART TYPES (use ONLY when visualization is explicitly requested):
	- "bar": For categorical comparisons
	- "line": For trends over time/sequence
	- "pie": For proportions/percentages
	- "hist": For distributions
	- "scatter": For correlations
	Always ensure column names exist in the dataset before referencing them.
	"""

	INSIGHTS_SYSTEM_PROMPT = "Analyze the data and provide 4-6 clear, informative bullet points about key patterns, trends, and findings. Each bullet should be 1-2 sentences that explain important insights from the data. Focus on actionable information and meaningful observations."

	SAMPLE_QUESTIONS = [
	"What are the key patterns in this dataset?",
	"Show me insights about this data",
	"What is the average value in the main columns?",
	"How many unique values are in each column?",
	"Summarize the key statistics of this data"
	]

	def get_chart_prompt(question, columns, data_sample):
	# Strict visualization keyword check
	question_lower = question.lower()

	# Check for explicit visualization keywords
	has_chart = "chart" in question_lower
	has_plot = "plot" in question_lower
	has_graph = "graph" in question_lower
	has_visualize = "visualiz" in question_lower # catches visualize/visualization

	should_create_viz = has_chart or has_plot or has_graph or has_visualize

	# Build warning message
	if should_create_viz:
	viz_instruction = "⚠️ VISUALIZATION DETECTED - You may create a plot object"
	else:
	viz_instruction = "🔴 NO VISUALIZATION KEYWORDS - YOU MUST SET 'plot': null - DO NOT CREATE ANY CHART"

	return f"""
	User Query: "{question}"
	Available Columns: {', '.join(columns)}
	Sample Data: {data_sample}

	{viz_instruction}

	VISUALIZATION KEYWORD CHECK:
	- Contains "chart"? {has_chart}
	- Contains "plot"? {has_plot}
	- Contains "graph"? {has_graph}
	- Contains "visualize"? {has_visualize}
	- Should create visualization? {should_create_viz}

	🔴🔴🔴 CRITICAL INSTRUCTION 🔴🔴🔴
	Based on the check above:
	{"You MAY create a plot object since visualization keywords were found." if should_create_viz else 'You MUST set "plot": null. DO NOT create any visualization. NO CHARTS. NO PLOTS. NO GRAPHS.'}

	RESPONSE TEMPLATE TO USE:

	FOR QUESTIONS WITHOUT "chart"/"plot"/"graph"/"visualize" (MOST COMMON):
	{{
	"type": "describe",
	"operations": [{{"op": "describe", "columns": ["column1", "column2"]}}],
	"plot": null,
	"narrative": "Based on the analysis, [finding 1]. The data shows [finding 2]. Key insights include [finding 3].",
	"insights_needed": true
	}}

	FOR "WHICH/WHAT/HOW MANY" QUESTIONS (NO VISUALIZATION):
	{{
	"type": "analysis",
	"operations": [
	{{"op": "groupby", "columns": ["group_column"], "agg_col": "value_column", "agg_func": "count"}}
	],
	"plot": null,
	"narrative": "The analysis reveals that [specific answer]. This group has [details]. The findings indicate [conclusion].",
	"insights_needed": false
	}}

	ONLY USE THIS IF QUESTION CONTAINS "chart"/"plot"/"graph"/"visualize":
	{{
	"type": "analysis",
	"operations": [{{"op": "groupby", "columns": ["x_col"], "agg_col": "y_col", "agg_func": "sum"}}],
	"plot": {{
	"type": "bar",
	"x": "x_col",
	"y": "sum_y_col",
	"title": "Chart Title"
	}},
	"narrative": "The visualization displays [what the chart shows].",
	"insights_needed": true
	}}

	COLUMN NAMES AVAILABLE: {', '.join(columns)}

	🔴 FINAL REMINDER 🔴
	Question: "{question}"
	Contains visualization keywords: {should_create_viz}
	Your response MUST have: "plot": {"{{...}}" if should_create_viz else "null"}

	If you create a plot object when should_create_viz is False, you are making a CRITICAL ERROR.
	"""

	def validate_plot_spec(plot_spec, available_columns):
	"""Validate and potentially remove plot specification"""
	if not plot_spec:
	return plot_spec

	x_col = plot_spec.get('x')
	y_col = plot_spec.get('y')

	if x_col and x_col not in available_columns:
	for col in available_columns:
	if any(keyword in col.lower() for keyword in ['name', 'category', 'type', 'group']):
	plot_spec['x'] = col
	break

	if y_col and y_col not in available_columns:
	for col in available_columns:
	if any(keyword in col.lower() for keyword in ['value', 'amount', 'count', 'price', 'sales']):
	plot_spec['y'] = col
	break

	return plot_spec

	def should_allow_visualization(question):
	"""
	CRITICAL VALIDATION: Check if visualization should be allowed
	Returns True only if question explicitly contains visualization keywords
	"""
	question_lower = question.lower()

	# Only allow visualization if these exact words appear
	viz_keywords = ["chart", "plot", "graph", "visualize", "visualization"]

	for keyword in viz_keywords:
	if keyword in question_lower:
	return True

	return False

	def validate_ai_response(question, ai_response_dict):
	"""
	SAFETY CHECK: Remove plot object if question doesn't ask for visualization
	Call this function AFTER getting AI response and BEFORE using it
	This is a CRITICAL function - it ensures NO visualizations appear unless explicitly requested
	"""
	if not should_allow_visualization(question):
	# Force remove plot if it shouldn't exist
	if "plot" in ai_response_dict and ai_response_dict["plot"] is not None:
	print(f"⚠️ WARNING: Removing unwanted visualization from AI response for question: '{question}'")
	ai_response_dict["plot"] = None

	return ai_response_dict

	def final_plot_validation(question, plot_object):
	"""
	LAST LINE OF DEFENSE: Call this right before returning/displaying plot
	Returns None if visualization should not be shown
	"""
	if plot_object is None:
	return None

	if not should_allow_visualization(question):
	print(f"🚫 BLOCKED: Preventing visualization for non-visualization query: '{question}'")
	return None

	return plot_object

	def get_insights_prompt(context_parts, narrative):
	insights_context = "\n".join(context_parts)
	return f"""Based on the analysis results, provide 4-6 informative bullet points that explain key insights, patterns, and findings from the data.

	Analysis Context:
	{insights_context}

	Original Question Context:
	{narrative}

	Guidelines:
	- Each bullet point should be 1-2 complete sentences
	- Focus on meaningful patterns and trends
	- Provide actionable insights
	- Be clear and specific about what the data shows"""

	"""
	⚠️⚠️⚠️ CRITICAL: HOW TO USE THESE FUNCTIONS IN YOUR MAIN APP ⚠️⚠️⚠️

	In your main application (the file that calls the AI), you MUST use these validation functions:

	STEP 1: After getting AI response, validate it immediately:
	```python
	# After getting response from AI
	ai_response = json.loads(response_text)

	# CRITICAL: Validate the response to remove unwanted plots
	ai_response = validate_ai_response(user_question, ai_response)
	```

	STEP 2: Before displaying/returning plot, do final validation:
	```python
	# Get plot from validated response
	plot_object = ai_response.get("plot")

	# FINAL CHECK: Validate plot before using it
	plot_object = final_plot_validation(user_question, plot_object)

	# Now use plot_object (it will be None if visualization wasn't requested)
	if plot_object is not None:
	# Create and display the visualization
	pass
	else:
	# No visualization to display
	pass
	```

	EXAMPLE INTEGRATION:
	```python
	def process_question(question, df):
	# Get AI response
	response = call_ai_model(question, df)
	ai_response = json.loads(response)

	# STEP 1: Validate response
	ai_response = validate_ai_response(question, ai_response)

	# Process operations
	results = process_operations(ai_response["operations"], df)

	# STEP 2: Validate plot before using
	plot_spec = final_plot_validation(question, ai_response.get("plot"))

	# Only create visualization if plot_spec is not None
	if plot_spec:
	chart = create_chart(plot_spec, results)
	return results, chart, ai_response["narrative"]
	else:
	return results, None, ai_response["narrative"]
	```

	🔴 KEY POINTS:
	1. Call `validate_ai_response()` immediately after parsing AI response
	2. Call `final_plot_validation()` before creating/displaying any chart
	3. These functions will automatically block visualizations when keywords are missing
	4. No changes needed to prompts - validation handles everything
	"""