Spaces:
Running
Feb sprint: vertical×function matrix, structured outliers, unified prompts
Browse files- demo_personas.py: VERTICALS, FUNCTIONS, MATRIX_OVERRIDES dicts with get_use_case_config() and parse_use_case()
- outlier_system.py: OutlierPattern/OutlierConfig dataclasses, OUTLIER_CONFIGS with Retail Sales patterns
- prompts.py: build_prompt() composable system, STAGE_TEMPLATES for research/ddl/liveboard/demo_notes
- chat_interface.py: research cache fix (absolute paths), auto-use cache, DDL failure guard
- liveboard_creator.py: _clean_viz_title() helper, revert MCP to working stdio/npx approach
- smart_data_adjuster.py: multi-LLM support (Claude + OpenAI) via _call_llm()
- thoughtspot_deployer.py: fix model validation by keeping FK/PK columns, tag debug logging
- CLAUDE.md/PROJECT_STATUS.md: simplify liveboard docs to unified process
- demo_prep.py: remove unsupported max_lines from gr.Code()
Co-authored-by: Cursor <cursoragent@cursor.com>
- CLAUDE.md +21 -58
- PROJECT_STATUS.md +8 -11
- chat_interface.py +129 -41
- demo_personas.py +368 -15
- demo_prep.py +2 -4
- liveboard_creator.py +125 -38
- outlier_system.py +197 -5
- prompts.py +156 -1
- smart_data_adjuster.py +52 -10
- sprint_2026_02.md +145 -0
- supabase_client.py +0 -1
- thoughtspot_deployer.py +17 -8
|
@@ -111,7 +111,7 @@ Example:
|
|
| 111 |
|
| 112 |
```bash
|
| 113 |
# Run the app properly
|
| 114 |
-
source ./demoprep/bin/activate && python
|
| 115 |
|
| 116 |
# Check git changes
|
| 117 |
git diff --stat
|
|
@@ -201,51 +201,21 @@ DO NOT use create_visualization_tml() directly - that's internal low-level code
|
|
| 201 |
|
| 202 |
---
|
| 203 |
|
| 204 |
-
## Liveboard Creation
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
| **TML** | ~20s | High (with tuning) | Full | Precise control, debugging |
|
| 220 |
-
| **MCP** | ~60s | Basic | None | Quick prototypes |
|
| 221 |
-
| **HYBRID** | ~90s | Best | Via post-processing | Production demos |
|
| 222 |
-
|
| 223 |
-
### TML Method (Template-Based)
|
| 224 |
-
- Builds ThoughtSpot Modeling Language (YAML) structures directly
|
| 225 |
-
- Full control over chart types, layout, colors
|
| 226 |
-
- REST API with token auth
|
| 227 |
-
- **Main function:** `create_liveboard_from_model()` in liveboard_creator.py
|
| 228 |
-
- **Class:** `LiveboardCreator`
|
| 229 |
-
|
| 230 |
-
### MCP Method (AI-Driven)
|
| 231 |
-
- Uses Model Context Protocol with ThoughtSpot's agent.thoughtspot.app
|
| 232 |
-
- Leverages ThoughtSpot's AI for smart question generation
|
| 233 |
-
- Natural language questions → ThoughtSpot creates visualizations
|
| 234 |
-
- OAuth authentication, requires npx/Node.js
|
| 235 |
-
- **Main function:** `create_liveboard_from_model_mcp()` in liveboard_creator.py
|
| 236 |
-
|
| 237 |
-
### HYBRID Method (Recommended)
|
| 238 |
-
- **Step 1:** MCP creates liveboard quickly with AI-driven questions
|
| 239 |
-
- **Step 2:** TML post-processing enhances with:
|
| 240 |
-
- Groups (tabs) for organization
|
| 241 |
-
- KPI sparkline fixes
|
| 242 |
-
- Brand color styling
|
| 243 |
-
- **Main functions:**
|
| 244 |
-
- `create_liveboard_from_model_mcp()` for creation
|
| 245 |
-
- `enhance_mcp_liveboard()` for post-processing
|
| 246 |
-
|
| 247 |
-
### enhance_mcp_liveboard() Function
|
| 248 |
-
Located in `liveboard_creator.py`, this function:
|
| 249 |
1. Exports the MCP-created liveboard TML
|
| 250 |
2. Classifies visualizations by type (KPI, trend, categorical)
|
| 251 |
3. Adds Groups (tabs) to organize by type
|
|
@@ -253,18 +223,16 @@ Located in `liveboard_creator.py`, this function:
|
|
| 253 |
5. Applies brand colors to groups and tiles
|
| 254 |
6. Re-imports the enhanced TML
|
| 255 |
|
| 256 |
-
### KPI Requirements
|
| 257 |
- **For sparklines and percent change comparisons:**
|
| 258 |
- Must include time dimension (date column)
|
| 259 |
- Must specify granularity (daily, weekly, monthly, quarterly, yearly)
|
| 260 |
- Example: `[Total_revenue] [Order_date].monthly`
|
| 261 |
-
-
|
| 262 |
-
- **TML:** Search query must have `[measure] [date_column].granularity`
|
| 263 |
-
- **HYBRID:** Post-processing adds sparkline settings automatically
|
| 264 |
|
| 265 |
### Terminology (Important!)
|
| 266 |
-
- **Outliers** = Interesting data points in existing data
|
| 267 |
-
- **Data Adjuster** = Modifying data values (
|
| 268 |
|
| 269 |
### Golden Demo Structure
|
| 270 |
- **Location:** `dev_notes/liveboard_demogold2/🏬 Global Retail Apparel Sales (New).liveboard.tml`
|
|
@@ -273,11 +241,6 @@ Located in `liveboard_creator.py`, this function:
|
|
| 273 |
- Brand colors via style_properties (GBC_A-J for groups, TBC_A-J for tiles)
|
| 274 |
- KPI structure: `[sales] [date].weekly [date].'last 8 quarters'`
|
| 275 |
|
| 276 |
-
### Testing Strategy
|
| 277 |
-
- Test all three methods when changing shared code
|
| 278 |
-
- HYBRID should be the default for most testing
|
| 279 |
-
- Use TML for debugging visualization issues
|
| 280 |
-
|
| 281 |
---
|
| 282 |
|
| 283 |
## Frustration Points (AVOID)
|
|
@@ -301,5 +264,5 @@ User gets frustrated when you:
|
|
| 301 |
|
| 302 |
---
|
| 303 |
|
| 304 |
-
*Last Updated:
|
| 305 |
*This is the source of truth - update rules here, not in .cursorrules*
|
|
|
|
| 111 |
|
| 112 |
```bash
|
| 113 |
# Run the app properly
|
| 114 |
+
source ./demoprep/bin/activate && python chat_interface.py
|
| 115 |
|
| 116 |
# Check git changes
|
| 117 |
git diff --stat
|
|
|
|
| 201 |
|
| 202 |
---
|
| 203 |
|
| 204 |
+
## Liveboard Creation
|
| 205 |
+
|
| 206 |
+
Liveboard creation is a single unified process with two phases:
|
| 207 |
+
|
| 208 |
+
1. **MCP Creation** - Uses ThoughtSpot's AI (via Model Context Protocol at `agent.thoughtspot.app`) to generate smart visualizations from natural language questions
|
| 209 |
+
2. **TML Post-Processing** - Enhances the AI-created liveboard with groups, KPI sparklines, brand colors, and layout refinement
|
| 210 |
+
|
| 211 |
+
These are implemented as separate functions but are **one process** - do NOT treat them as separate "methods" or offer the user a choice between them.
|
| 212 |
+
|
| 213 |
+
### Key Functions (liveboard_creator.py)
|
| 214 |
+
- **`create_liveboard_from_model_mcp()`** - Main entry point. Handles MCP creation.
|
| 215 |
+
- **`enhance_mcp_liveboard()`** - Post-processing. Exports TML, enhances, re-imports.
|
| 216 |
+
- **`LiveboardCreator` class** - TML utilities used during post-processing.
|
| 217 |
+
|
| 218 |
+
### enhance_mcp_liveboard() Details
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 219 |
1. Exports the MCP-created liveboard TML
|
| 220 |
2. Classifies visualizations by type (KPI, trend, categorical)
|
| 221 |
3. Adds Groups (tabs) to organize by type
|
|
|
|
| 223 |
5. Applies brand colors to groups and tiles
|
| 224 |
6. Re-imports the enhanced TML
|
| 225 |
|
| 226 |
+
### KPI Requirements
|
| 227 |
- **For sparklines and percent change comparisons:**
|
| 228 |
- Must include time dimension (date column)
|
| 229 |
- Must specify granularity (daily, weekly, monthly, quarterly, yearly)
|
| 230 |
- Example: `[Total_revenue] [Order_date].monthly`
|
| 231 |
+
- Post-processing adds sparkline settings automatically
|
|
|
|
|
|
|
| 232 |
|
| 233 |
### Terminology (Important!)
|
| 234 |
+
- **Outliers** = Interesting data points in existing data
|
| 235 |
+
- **Data Adjuster** = Modifying data values (needs Snowflake views)
|
| 236 |
|
| 237 |
### Golden Demo Structure
|
| 238 |
- **Location:** `dev_notes/liveboard_demogold2/🏬 Global Retail Apparel Sales (New).liveboard.tml`
|
|
|
|
| 241 |
- Brand colors via style_properties (GBC_A-J for groups, TBC_A-J for tiles)
|
| 242 |
- KPI structure: `[sales] [date].weekly [date].'last 8 quarters'`
|
| 243 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 244 |
---
|
| 245 |
|
| 246 |
## Frustration Points (AVOID)
|
|
|
|
| 264 |
|
| 265 |
---
|
| 266 |
|
| 267 |
+
*Last Updated: February 4, 2026*
|
| 268 |
*This is the source of truth - update rules here, not in .cursorrules*
|
|
@@ -40,15 +40,13 @@ An AI-powered demo builder for ThoughtSpot that automatically creates complete d
|
|
| 40 |
|
| 41 |
**Working:**
|
| 42 |
- End-to-end demo creation via chat interface
|
| 43 |
-
-
|
| 44 |
-
- HYBRID method: MCP creates + TML post-processing for Groups, KPIs, colors
|
| 45 |
-
- Settings UI for method selection
|
| 46 |
- LegitData for realistic data generation
|
| 47 |
- Supabase settings persistence
|
| 48 |
- ThoughtSpot authentication and deployment
|
| 49 |
|
| 50 |
**Needs Work:**
|
| 51 |
-
- Outliers
|
| 52 |
- Data adjuster has column matching issues
|
| 53 |
- Tags not assigning to objects
|
| 54 |
|
|
@@ -56,10 +54,9 @@ An AI-powered demo builder for ThoughtSpot that automatically creates complete d
|
|
| 56 |
|
| 57 |
## Key Technical Decisions
|
| 58 |
|
| 59 |
-
**Liveboard Creation**:
|
| 60 |
-
-
|
| 61 |
-
-
|
| 62 |
-
- HYBRID (default): MCP creates + TML post-processing (recommended)
|
| 63 |
|
| 64 |
**Data Generation**: LegitData
|
| 65 |
- Uses AI + web search for realistic data
|
|
@@ -79,10 +76,10 @@ An AI-powered demo builder for ThoughtSpot that automatically creates complete d
|
|
| 79 |
|
| 80 |
## Sprint History
|
| 81 |
|
| 82 |
-
- **Sprint
|
|
|
|
| 83 |
- *(Previous sprints archived in dev_notes/archive/)*
|
| 84 |
-
- *(Sprint files are gitignored - local working docs)*
|
| 85 |
|
| 86 |
---
|
| 87 |
|
| 88 |
-
*Last Updated:
|
|
|
|
| 40 |
|
| 41 |
**Working:**
|
| 42 |
- End-to-end demo creation via chat interface
|
| 43 |
+
- Liveboard creation: MCP creates visualizations + TML post-processing for Groups, KPIs, colors
|
|
|
|
|
|
|
| 44 |
- LegitData for realistic data generation
|
| 45 |
- Supabase settings persistence
|
| 46 |
- ThoughtSpot authentication and deployment
|
| 47 |
|
| 48 |
**Needs Work:**
|
| 49 |
+
- Outliers need better integration into liveboard creation
|
| 50 |
- Data adjuster has column matching issues
|
| 51 |
- Tags not assigning to objects
|
| 52 |
|
|
|
|
| 54 |
|
| 55 |
## Key Technical Decisions
|
| 56 |
|
| 57 |
+
**Liveboard Creation**: MCP creation + TML post-processing
|
| 58 |
+
- MCP (via `agent.thoughtspot.app`) generates AI-driven visualizations
|
| 59 |
+
- TML post-processing adds Groups, KPI sparklines, brand colors, layout refinement
|
|
|
|
| 60 |
|
| 61 |
**Data Generation**: LegitData
|
| 62 |
- Uses AI + web search for realistic data
|
|
|
|
| 76 |
|
| 77 |
## Sprint History
|
| 78 |
|
| 79 |
+
- **Sprint Feb 2026**: Current - see `sprint_2026_02.md` in root
|
| 80 |
+
- **Sprint Jan 2026**: Closed - see `sprint_2026_01.md` in root
|
| 81 |
- *(Previous sprints archived in dev_notes/archive/)*
|
|
|
|
| 82 |
|
| 83 |
---
|
| 84 |
|
| 85 |
+
*Last Updated: February 4, 2026*
|
|
@@ -9,11 +9,21 @@ warnings.filterwarnings('ignore', message='.*tuples.*format.*chatbot.*deprecated
|
|
| 9 |
import gradio as gr
|
| 10 |
import os
|
| 11 |
import sys
|
|
|
|
|
|
|
|
|
|
| 12 |
from dotenv import load_dotenv
|
| 13 |
from demo_builder_class import DemoBuilder
|
| 14 |
from supabase_client import load_gradio_settings
|
| 15 |
from main_research import MultiLLMResearcher, Website
|
| 16 |
-
from demo_personas import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
from demo_prep import map_llm_display_to_provider
|
| 18 |
|
| 19 |
load_dotenv(override=True)
|
|
@@ -497,6 +507,13 @@ Watch the AI Feedback tab for real-time progress!"""
|
|
| 497 |
|
| 498 |
# Auto-create DDL
|
| 499 |
ddl_response, ddl_code = self.run_ddl_creation()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 500 |
chat_history[-1] = (message, f"✅ DDL Created\n\n🚀 **Deploying to Snowflake...**")
|
| 501 |
yield chat_history, current_stage, current_model, company, use_case, ""
|
| 502 |
|
|
@@ -1402,6 +1419,10 @@ To change settings, use:
|
|
| 1402 |
use_case: Use case name
|
| 1403 |
generic_context: Additional context provided by user for generic use cases
|
| 1404 |
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1405 |
import time
|
| 1406 |
import os
|
| 1407 |
from main_research import ResultsManager
|
|
@@ -1454,25 +1475,41 @@ To change settings, use:
|
|
| 1454 |
use_case_safe = use_case.lower().replace(' ', '_').replace('/', '_')
|
| 1455 |
|
| 1456 |
# Try new format first (with use case)
|
|
|
|
|
|
|
|
|
|
| 1457 |
cache_filename = f"{safe_domain}_{use_case_safe}.json"
|
| 1458 |
-
cache_filepath = os.path.join(
|
| 1459 |
|
| 1460 |
-
# If
|
| 1461 |
if not os.path.exists(cache_filepath):
|
| 1462 |
-
|
| 1463 |
-
|
| 1464 |
-
|
| 1465 |
-
|
| 1466 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1467 |
|
| 1468 |
cached_results = None
|
| 1469 |
cache_age_hours = None
|
| 1470 |
|
| 1471 |
-
#
|
| 1472 |
-
|
| 1473 |
-
# self.log_feedback(f"🔄 Generic use case detected - skipping cache, running fresh research")
|
| 1474 |
-
# progress_message += f"🔄 **Generic use case** - running fresh research for custom context...\n"
|
| 1475 |
-
# yield progress_message
|
| 1476 |
if os.path.exists(cache_filepath):
|
| 1477 |
try:
|
| 1478 |
# Check cache age (5 day expiry)
|
|
@@ -1481,25 +1518,38 @@ To change settings, use:
|
|
| 1481 |
cache_age_hours = cache_age / 3600 # Convert to hours
|
| 1482 |
|
| 1483 |
if cache_age_hours <= 120: # Cache valid for 5 days (120 hours)
|
| 1484 |
-
self.log_feedback(f"📋
|
| 1485 |
-
progress_message += f"📋 **
|
| 1486 |
-
progress_message += f"**Age:** {cache_age_hours:.1f} hours old\n"
|
| 1487 |
-
progress_message += f"**Company:** {domain}\n"
|
| 1488 |
-
progress_message += f"**Use Case:** {use_case}\n\n"
|
| 1489 |
-
progress_message += "**Would you like to use the cached results?**\n"
|
| 1490 |
-
progress_message += "- Type 'yes' to use cache (instant)\n"
|
| 1491 |
-
progress_message += "- Type 'no' to run fresh research (2-3 minutes)\n"
|
| 1492 |
|
| 1493 |
-
#
|
| 1494 |
-
|
| 1495 |
-
|
| 1496 |
|
| 1497 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1498 |
yield progress_message
|
| 1499 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1500 |
else:
|
| 1501 |
-
self.log_feedback(f"📋
|
| 1502 |
-
progress_message += f"📋 Cache
|
| 1503 |
yield progress_message
|
| 1504 |
except Exception as e:
|
| 1505 |
self.log_feedback(f"⚠️ Could not load cache: {str(e)}")
|
|
@@ -1659,8 +1709,8 @@ To change settings, use:
|
|
| 1659 |
'use_case': use_case,
|
| 1660 |
'generated_at': datetime.now().isoformat(),
|
| 1661 |
}
|
| 1662 |
-
os.makedirs(
|
| 1663 |
-
ResultsManager.save_results(research_results, cache_filename,
|
| 1664 |
progress_message += "💾 Cached research results for future use!\n\n"
|
| 1665 |
yield progress_message
|
| 1666 |
except Exception as e:
|
|
@@ -2076,6 +2126,10 @@ Generate complete CREATE TABLE statements with proper Snowflake syntax and depen
|
|
| 2076 |
self.log_feedback("Generating DDL...")
|
| 2077 |
ddl_result = researcher.make_request(messages, temperature=0.2, max_tokens=4000, stream=False)
|
| 2078 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2079 |
# Store in demo_builder
|
| 2080 |
self.demo_builder.schema_generation_results = ddl_result
|
| 2081 |
self.ddl_code = ddl_result
|
|
@@ -2104,6 +2158,9 @@ Generate complete CREATE TABLE statements with proper Snowflake syntax and depen
|
|
| 2104 |
import traceback
|
| 2105 |
error_msg = f"❌ DDL creation failed: {str(e)}\n{traceback.format_exc()}"
|
| 2106 |
self.log_feedback(error_msg)
|
|
|
|
|
|
|
|
|
|
| 2107 |
return error_msg, ""
|
| 2108 |
|
| 2109 |
def get_fallback_population_code(self, schema_info, fact_rows=10000, dim_rows=100):
|
|
@@ -2475,19 +2532,28 @@ Generate complete CREATE TABLE statements with proper Snowflake syntax and depen
|
|
| 2475 |
self.log_feedback("🔢 Starting data population...")
|
| 2476 |
|
| 2477 |
try:
|
| 2478 |
-
from demo_personas import get_persona_config
|
| 2479 |
from schema_utils import parse_ddl_schema, generate_schema_constrained_prompt
|
| 2480 |
import re
|
| 2481 |
|
| 2482 |
-
|
|
|
|
|
|
|
| 2483 |
|
| 2484 |
# Build business context for population
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2485 |
business_context = f"""
|
| 2486 |
BUSINESS CONTEXT:
|
| 2487 |
-
- Use Case: {
|
| 2488 |
-
- Target Persona: {
|
| 2489 |
-
- Business Problem: {
|
| 2490 |
-
- Demo Objectives: {
|
| 2491 |
|
| 2492 |
MANDATORY CONNECTION CODE (MUST BE COMPLETE):
|
| 2493 |
```python
|
|
@@ -2649,6 +2715,14 @@ LegitData will generate realistic, AI-powered data.
|
|
| 2649 |
self.demo_builder.schema_generation_results
|
| 2650 |
)
|
| 2651 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2652 |
if not success:
|
| 2653 |
log_progress(f"[ERROR] DDL Deployment failed!")
|
| 2654 |
raise Exception(f"Schema deployment failed: {deploy_message}")
|
|
@@ -2706,8 +2780,17 @@ LegitData will generate realistic, AI-powered data.
|
|
| 2706 |
|
| 2707 |
def run_population():
|
| 2708 |
try:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2709 |
success, message, results = populate_demo_data(
|
| 2710 |
-
ddl_content=
|
| 2711 |
company_url=self.demo_builder.company_url,
|
| 2712 |
use_case=self.demo_builder.use_case,
|
| 2713 |
schema_name=schema_name,
|
|
@@ -2865,11 +2948,14 @@ Tables: Created and populated
|
|
| 2865 |
ts_secret = os.getenv('THOUGHTSPOT_SECRET_KEY')
|
| 2866 |
|
| 2867 |
liveboard_method = self.settings.get('liveboard_method', 'HYBRID')
|
| 2868 |
-
|
|
|
|
|
|
|
|
|
|
| 2869 |
|
| 2870 |
# Get company data for liveboard
|
| 2871 |
company_data = {
|
| 2872 |
-
'name':
|
| 2873 |
'url': getattr(self.demo_builder, 'company_url', company),
|
| 2874 |
'logo_url': getattr(self.demo_builder, 'logo_url', None),
|
| 2875 |
'primary_color': getattr(self.demo_builder, 'primary_color', '#3498db'),
|
|
@@ -3231,7 +3317,9 @@ Ask these questions to showcase ThoughtSpot's AI capabilities:
|
|
| 3231 |
try:
|
| 3232 |
from smart_data_adjuster import SmartDataAdjuster
|
| 3233 |
|
| 3234 |
-
|
|
|
|
|
|
|
| 3235 |
adjuster.connect()
|
| 3236 |
|
| 3237 |
if adjuster.load_liveboard_context():
|
|
@@ -4243,7 +4331,7 @@ if __name__ == "__main__":
|
|
| 4243 |
|
| 4244 |
app.launch(
|
| 4245 |
server_name="0.0.0.0",
|
| 4246 |
-
server_port=
|
| 4247 |
share=False,
|
| 4248 |
inbrowser=True,
|
| 4249 |
debug=True,
|
|
|
|
| 9 |
import gradio as gr
|
| 10 |
import os
|
| 11 |
import sys
|
| 12 |
+
import json
|
| 13 |
+
import time
|
| 14 |
+
import glob
|
| 15 |
from dotenv import load_dotenv
|
| 16 |
from demo_builder_class import DemoBuilder
|
| 17 |
from supabase_client import load_gradio_settings
|
| 18 |
from main_research import MultiLLMResearcher, Website
|
| 19 |
+
from demo_personas import (
|
| 20 |
+
build_company_analysis_prompt,
|
| 21 |
+
build_industry_research_prompt,
|
| 22 |
+
VERTICALS,
|
| 23 |
+
FUNCTIONS,
|
| 24 |
+
get_use_case_config,
|
| 25 |
+
parse_use_case
|
| 26 |
+
)
|
| 27 |
from demo_prep import map_llm_display_to_provider
|
| 28 |
|
| 29 |
load_dotenv(override=True)
|
|
|
|
| 507 |
|
| 508 |
# Auto-create DDL
|
| 509 |
ddl_response, ddl_code = self.run_ddl_creation()
|
| 510 |
+
|
| 511 |
+
# Check if DDL creation failed
|
| 512 |
+
if not ddl_code or ddl_code.strip() == "":
|
| 513 |
+
chat_history[-1] = (message, f"{ddl_response}\n\n❌ **Cannot proceed without valid DDL.** Please fix the error and try again.")
|
| 514 |
+
yield chat_history, current_stage, current_model, company, use_case, ""
|
| 515 |
+
return
|
| 516 |
+
|
| 517 |
chat_history[-1] = (message, f"✅ DDL Created\n\n🚀 **Deploying to Snowflake...**")
|
| 518 |
yield chat_history, current_stage, current_model, company, use_case, ""
|
| 519 |
|
|
|
|
| 1419 |
use_case: Use case name
|
| 1420 |
generic_context: Additional context provided by user for generic use cases
|
| 1421 |
"""
|
| 1422 |
+
print(f"\n\n[CACHE DEBUG] === run_research_streaming called ===")
|
| 1423 |
+
print(f"[CACHE DEBUG] company: {company}")
|
| 1424 |
+
print(f"[CACHE DEBUG] use_case: {use_case}\n\n")
|
| 1425 |
+
|
| 1426 |
import time
|
| 1427 |
import os
|
| 1428 |
from main_research import ResultsManager
|
|
|
|
| 1475 |
use_case_safe = use_case.lower().replace(' ', '_').replace('/', '_')
|
| 1476 |
|
| 1477 |
# Try new format first (with use case)
|
| 1478 |
+
# Use absolute path to ensure we find cache regardless of CWD
|
| 1479 |
+
script_dir = os.path.dirname(os.path.abspath(__file__))
|
| 1480 |
+
results_dir = os.path.join(script_dir, "results")
|
| 1481 |
cache_filename = f"{safe_domain}_{use_case_safe}.json"
|
| 1482 |
+
cache_filepath = os.path.join(results_dir, cache_filename)
|
| 1483 |
|
| 1484 |
+
# If exact match doesn't exist, try fuzzy matching for similar use cases
|
| 1485 |
if not os.path.exists(cache_filepath):
|
| 1486 |
+
import glob
|
| 1487 |
+
print(f"[CACHE DEBUG] Current working directory: {os.getcwd()}")
|
| 1488 |
+
print(f"[CACHE DEBUG] Script directory: {script_dir}")
|
| 1489 |
+
print(f"[CACHE DEBUG] Results directory: {results_dir}")
|
| 1490 |
+
similar_files = glob.glob(os.path.join(results_dir, f"{safe_domain}_*.json"))
|
| 1491 |
+
print(f"[CACHE DEBUG] Exact file {cache_filepath} not found")
|
| 1492 |
+
print(f"[CACHE DEBUG] Glob pattern: {results_dir}/{safe_domain}_*.json")
|
| 1493 |
+
print(f"[CACHE DEBUG] Similar files found: {similar_files}")
|
| 1494 |
+
if similar_files:
|
| 1495 |
+
# Found similar cache files for this company
|
| 1496 |
+
cache_filepath = similar_files[0] # Use the first one found
|
| 1497 |
+
cache_filename = os.path.basename(cache_filepath)
|
| 1498 |
+
print(f"[CACHE DEBUG] Using similar file: {cache_filename}")
|
| 1499 |
+
self.log_feedback(f"📋 Found similar cache file: {cache_filename}")
|
| 1500 |
+
elif not os.path.exists(cache_filepath):
|
| 1501 |
+
# Try old format (without use case)
|
| 1502 |
+
old_cache_filename = f"research_{safe_domain}.json"
|
| 1503 |
+
old_cache_filepath = os.path.join(results_dir, old_cache_filename)
|
| 1504 |
+
if os.path.exists(old_cache_filepath):
|
| 1505 |
+
cache_filename = old_cache_filename
|
| 1506 |
+
cache_filepath = old_cache_filepath
|
| 1507 |
|
| 1508 |
cached_results = None
|
| 1509 |
cache_age_hours = None
|
| 1510 |
|
| 1511 |
+
# Check for cached research and use automatically if valid
|
| 1512 |
+
print(f"[CACHE DEBUG] Final cache_filepath: {cache_filepath}, exists: {os.path.exists(cache_filepath)}")
|
|
|
|
|
|
|
|
|
|
| 1513 |
if os.path.exists(cache_filepath):
|
| 1514 |
try:
|
| 1515 |
# Check cache age (5 day expiry)
|
|
|
|
| 1518 |
cache_age_hours = cache_age / 3600 # Convert to hours
|
| 1519 |
|
| 1520 |
if cache_age_hours <= 120: # Cache valid for 5 days (120 hours)
|
| 1521 |
+
self.log_feedback(f"📋 Using cached research (age: {cache_age_hours:.1f} hours)")
|
| 1522 |
+
progress_message += f"📋 **Using Cached Research** ({cache_age_hours:.1f} hours old)\n\n"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1523 |
|
| 1524 |
+
# Load cached results automatically
|
| 1525 |
+
with open(cache_filepath, 'r') as f:
|
| 1526 |
+
cached_data = json.load(f)
|
| 1527 |
|
| 1528 |
+
self.demo_builder.company_analysis_results = cached_data.get('company_summary', '')
|
| 1529 |
+
self.demo_builder.industry_research_results = cached_data.get('research_paper', '')
|
| 1530 |
+
self.demo_builder.combined_research_results = self.demo_builder.get_research_context()
|
| 1531 |
+
self.demo_builder.company_url = cached_data.get('url', url)
|
| 1532 |
+
self.demo_builder.advance_stage()
|
| 1533 |
+
|
| 1534 |
+
progress_message += "✅ **Research loaded from cache!**\n\n"
|
| 1535 |
+
progress_message += "Proceeding to DDL generation...\n"
|
| 1536 |
+
|
| 1537 |
+
self.log_feedback("✅ Research loaded from cache, generating DDL")
|
| 1538 |
yield progress_message
|
| 1539 |
+
|
| 1540 |
+
# Automatically trigger DDL generation
|
| 1541 |
+
try:
|
| 1542 |
+
response, ddl_code = self.run_ddl_creation()
|
| 1543 |
+
yield response
|
| 1544 |
+
except Exception as e:
|
| 1545 |
+
import traceback
|
| 1546 |
+
error_msg = f"❌ DDL generation failed: {str(e)}\n{traceback.format_exc()}"
|
| 1547 |
+
self.log_feedback(error_msg)
|
| 1548 |
+
yield error_msg
|
| 1549 |
+
return
|
| 1550 |
else:
|
| 1551 |
+
self.log_feedback(f"📋 Cache too old ({cache_age_hours:.1f} hours), running fresh research")
|
| 1552 |
+
progress_message += f"📋 Cache expired ({cache_age_hours:.1f} hours old), running fresh research...\n"
|
| 1553 |
yield progress_message
|
| 1554 |
except Exception as e:
|
| 1555 |
self.log_feedback(f"⚠️ Could not load cache: {str(e)}")
|
|
|
|
| 1709 |
'use_case': use_case,
|
| 1710 |
'generated_at': datetime.now().isoformat(),
|
| 1711 |
}
|
| 1712 |
+
os.makedirs(results_dir, exist_ok=True)
|
| 1713 |
+
ResultsManager.save_results(research_results, cache_filename, results_dir)
|
| 1714 |
progress_message += "💾 Cached research results for future use!\n\n"
|
| 1715 |
yield progress_message
|
| 1716 |
except Exception as e:
|
|
|
|
| 2126 |
self.log_feedback("Generating DDL...")
|
| 2127 |
ddl_result = researcher.make_request(messages, temperature=0.2, max_tokens=4000, stream=False)
|
| 2128 |
|
| 2129 |
+
# Validate DDL result
|
| 2130 |
+
if not ddl_result or not isinstance(ddl_result, str) or 'CREATE TABLE' not in ddl_result.upper():
|
| 2131 |
+
raise Exception(f"DDL generation failed or produced invalid output. Result: {ddl_result[:200] if ddl_result else 'None'}")
|
| 2132 |
+
|
| 2133 |
# Store in demo_builder
|
| 2134 |
self.demo_builder.schema_generation_results = ddl_result
|
| 2135 |
self.ddl_code = ddl_result
|
|
|
|
| 2158 |
import traceback
|
| 2159 |
error_msg = f"❌ DDL creation failed: {str(e)}\n{traceback.format_exc()}"
|
| 2160 |
self.log_feedback(error_msg)
|
| 2161 |
+
# Set schema_generation_results to empty string so it's not None
|
| 2162 |
+
self.demo_builder.schema_generation_results = ""
|
| 2163 |
+
self.ddl_code = ""
|
| 2164 |
return error_msg, ""
|
| 2165 |
|
| 2166 |
def get_fallback_population_code(self, schema_info, fact_rows=10000, dim_rows=100):
|
|
|
|
| 2532 |
self.log_feedback("🔢 Starting data population...")
|
| 2533 |
|
| 2534 |
try:
|
|
|
|
| 2535 |
from schema_utils import parse_ddl_schema, generate_schema_constrained_prompt
|
| 2536 |
import re
|
| 2537 |
|
| 2538 |
+
# Parse use case into vertical and function
|
| 2539 |
+
vertical, function = parse_use_case(self.demo_builder.use_case)
|
| 2540 |
+
config = get_use_case_config(vertical or "Generic", function or "Generic")
|
| 2541 |
|
| 2542 |
# Build business context for population
|
| 2543 |
+
# Handle both new config structure and backward compatibility
|
| 2544 |
+
target_persona = config.get('target_persona', 'Business Leader')
|
| 2545 |
+
business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
|
| 2546 |
+
demo_objectives = config.get('demo_objectives', 'Show self-service analytics and business insights')
|
| 2547 |
+
|
| 2548 |
+
# For generic cases, use the use_case_name
|
| 2549 |
+
use_case_display = config.get('use_case_name', self.demo_builder.use_case)
|
| 2550 |
+
|
| 2551 |
business_context = f"""
|
| 2552 |
BUSINESS CONTEXT:
|
| 2553 |
+
- Use Case: {use_case_display}
|
| 2554 |
+
- Target Persona: {target_persona}
|
| 2555 |
+
- Business Problem: {business_problem}
|
| 2556 |
+
- Demo Objectives: {demo_objectives}
|
| 2557 |
|
| 2558 |
MANDATORY CONNECTION CODE (MUST BE COMPLETE):
|
| 2559 |
```python
|
|
|
|
| 2715 |
self.demo_builder.schema_generation_results
|
| 2716 |
)
|
| 2717 |
|
| 2718 |
+
# DEBUG: Log what was passed
|
| 2719 |
+
ddl_passed = self.demo_builder.schema_generation_results
|
| 2720 |
+
log_progress(f"[DEBUG] DDL type passed to deployer: {type(ddl_passed)}")
|
| 2721 |
+
log_progress(f"[DEBUG] DDL is None: {ddl_passed is None}")
|
| 2722 |
+
if ddl_passed:
|
| 2723 |
+
log_progress(f"[DEBUG] DDL length: {len(ddl_passed)}")
|
| 2724 |
+
log_progress(f"[DEBUG] DDL first 100 chars: {ddl_passed[:100]}")
|
| 2725 |
+
|
| 2726 |
if not success:
|
| 2727 |
log_progress(f"[ERROR] DDL Deployment failed!")
|
| 2728 |
raise Exception(f"Schema deployment failed: {deploy_message}")
|
|
|
|
| 2780 |
|
| 2781 |
def run_population():
|
| 2782 |
try:
|
| 2783 |
+
# Validate DDL before passing to legitdata
|
| 2784 |
+
ddl = self.demo_builder.schema_generation_results
|
| 2785 |
+
if not ddl or not isinstance(ddl, str):
|
| 2786 |
+
raise Exception(f"DDL is invalid (type: {type(ddl)}). Cannot populate data. Please regenerate DDL.")
|
| 2787 |
+
|
| 2788 |
+
# Check if DDL contains the word "None" which would indicate AI generated bad SQL
|
| 2789 |
+
if ddl == "None" or ddl.strip() == "None":
|
| 2790 |
+
raise Exception("DDL generation returned 'None'. Please regenerate DDL with a different prompt or model.")
|
| 2791 |
+
|
| 2792 |
success, message, results = populate_demo_data(
|
| 2793 |
+
ddl_content=ddl,
|
| 2794 |
company_url=self.demo_builder.company_url,
|
| 2795 |
use_case=self.demo_builder.use_case,
|
| 2796 |
schema_name=schema_name,
|
|
|
|
| 2948 |
ts_secret = os.getenv('THOUGHTSPOT_SECRET_KEY')
|
| 2949 |
|
| 2950 |
liveboard_method = self.settings.get('liveboard_method', 'HYBRID')
|
| 2951 |
+
|
| 2952 |
+
# Clean company name for display (strip .com, .org, etc)
|
| 2953 |
+
clean_company = company.split('.')[0].title() if '.' in company else company
|
| 2954 |
+
liveboard_name = self.settings.get('liveboard_name', '') or f"{clean_company} - {use_case}"
|
| 2955 |
|
| 2956 |
# Get company data for liveboard
|
| 2957 |
company_data = {
|
| 2958 |
+
'name': clean_company,
|
| 2959 |
'url': getattr(self.demo_builder, 'company_url', company),
|
| 2960 |
'logo_url': getattr(self.demo_builder, 'logo_url', None),
|
| 2961 |
'primary_color': getattr(self.demo_builder, 'primary_color', '#3498db'),
|
|
|
|
| 3317 |
try:
|
| 3318 |
from smart_data_adjuster import SmartDataAdjuster
|
| 3319 |
|
| 3320 |
+
# Pass the selected LLM model to the adjuster
|
| 3321 |
+
llm_model = self.settings.get('model', 'claude-sonnet-4')
|
| 3322 |
+
adjuster = SmartDataAdjuster(database, schema_name, liveboard_guid, llm_model=llm_model)
|
| 3323 |
adjuster.connect()
|
| 3324 |
|
| 3325 |
if adjuster.load_liveboard_context():
|
|
|
|
| 4331 |
|
| 4332 |
app.launch(
|
| 4333 |
server_name="0.0.0.0",
|
| 4334 |
+
server_port=7863, # Different port from main app (7860) and old chat (7861)
|
| 4335 |
share=False,
|
| 4336 |
inbrowser=True,
|
| 4337 |
debug=True,
|
|
@@ -5,6 +5,276 @@ All persona data and prompt templates for use case-driven demo preparation
|
|
| 5 |
|
| 6 |
from schema_utils import extract_key_business_terms
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
# Use Case Persona Configurations
|
| 9 |
USE_CASE_PERSONAS = {
|
| 10 |
"Merchandising": {
|
|
@@ -613,9 +883,41 @@ def get_persona_config(use_case):
|
|
| 613 |
|
| 614 |
def build_company_analysis_prompt(use_case, website_title, website_url, website_content, css_count, logo_candidates):
|
| 615 |
"""Build dynamic company analysis prompt based on use case"""
|
| 616 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 617 |
|
| 618 |
-
system_prompt = COMPANY_ANALYSIS_TEMPLATE.format(
|
| 619 |
|
| 620 |
# Extract key business terms instead of raw content dump
|
| 621 |
key_terms = extract_key_business_terms(website_content, max_chars=1000)
|
|
@@ -629,36 +931,87 @@ VISUAL ASSETS SUMMARY:
|
|
| 629 |
CSS Resources: {css_count} stylesheets detected
|
| 630 |
Logo Assets: {len(logo_candidates)} logo variations found
|
| 631 |
|
| 632 |
-
Conduct analysis specifically for {
|
| 633 |
|
| 634 |
-
Extract specific, quantifiable information wherever possible that relates to {
|
| 635 |
|
| 636 |
return system_prompt, user_prompt
|
| 637 |
|
| 638 |
def build_industry_research_prompt(use_case, company_analysis_results):
|
| 639 |
"""Build dynamic industry research prompt based on use case and company analysis"""
|
| 640 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 641 |
|
| 642 |
-
#
|
| 643 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 644 |
|
| 645 |
-
system_prompt = INDUSTRY_RESEARCH_TEMPLATE.format(
|
| 646 |
-
use_case=use_case,
|
| 647 |
-
research_focus_formatted=research_focus_formatted,
|
| 648 |
-
**config
|
| 649 |
-
)
|
| 650 |
|
| 651 |
-
user_prompt = f"""Conduct comprehensive {
|
| 652 |
|
| 653 |
COMPANY ANALYSIS RESULTS:
|
| 654 |
{company_analysis_results}
|
| 655 |
|
| 656 |
-
Focus specifically on creating realistic demo scenarios that showcase how ThoughtSpot's {
|
| 657 |
|
| 658 |
Provide specific recommendations for:
|
| 659 |
1. Database schemas and table structures
|
| 660 |
2. Realistic data patterns and volumes
|
| 661 |
3. Compelling outlier scenarios
|
| 662 |
-
4. Success metrics that prove ROI: {
|
| 663 |
|
| 664 |
return system_prompt, user_prompt
|
|
|
|
| 5 |
|
| 6 |
from schema_utils import extract_key_business_terms
|
| 7 |
|
| 8 |
+
# ============================================================================
|
| 9 |
+
# VERTICAL × FUNCTION MATRIX SYSTEM (Phase 1 - February 2026)
|
| 10 |
+
# ============================================================================
|
| 11 |
+
# New composable system replacing flat USE_CASE_PERSONAS
|
| 12 |
+
# Keep USE_CASE_PERSONAS below for backward compatibility during transition
|
| 13 |
+
# ============================================================================
|
| 14 |
+
|
| 15 |
+
# VERTICALS: Industry-specific context
|
| 16 |
+
VERTICALS = {
|
| 17 |
+
"Retail": {
|
| 18 |
+
"typical_entities": ["Store", "Product", "Category", "Region", "Customer"],
|
| 19 |
+
"industry_terms": ["SKU", "basket", "shrink", "markdown", "comp sales", "footfall"],
|
| 20 |
+
"data_patterns": ["seasonality", "holiday_spikes", "weather_impact", "back_to_school"],
|
| 21 |
+
},
|
| 22 |
+
"Banking": {
|
| 23 |
+
"typical_entities": ["Account", "Customer", "Branch", "Product", "Loan"],
|
| 24 |
+
"industry_terms": ["AUM", "NIM", "deposits", "charge-off", "delinquency", "APR"],
|
| 25 |
+
"data_patterns": ["month_end_spikes", "rate_sensitivity", "quarter_close"],
|
| 26 |
+
},
|
| 27 |
+
"Software": {
|
| 28 |
+
"typical_entities": ["Account", "User", "Subscription", "Feature", "License"],
|
| 29 |
+
"industry_terms": ["ARR", "MRR", "churn", "NRR", "seats", "expansion"],
|
| 30 |
+
"data_patterns": ["renewal_cycles", "usage_spikes", "trial_conversion"],
|
| 31 |
+
},
|
| 32 |
+
"Manufacturing": {
|
| 33 |
+
"typical_entities": ["Plant", "Line", "Product", "Supplier", "Shift"],
|
| 34 |
+
"industry_terms": ["OEE", "yield", "scrap", "downtime", "throughput", "WIP"],
|
| 35 |
+
"data_patterns": ["shift_patterns", "maintenance_cycles", "supply_disruptions"],
|
| 36 |
+
},
|
| 37 |
+
}
|
| 38 |
+
|
| 39 |
+
# FUNCTIONS: Department-specific KPIs, visualizations, and patterns
|
| 40 |
+
FUNCTIONS = {
|
| 41 |
+
"Sales": {
|
| 42 |
+
"kpis": ["Dollar Sales", "Unit Sales", "ASP"],
|
| 43 |
+
"kpi_definitions": {
|
| 44 |
+
"Dollar Sales": "Total revenue ($)",
|
| 45 |
+
"Unit Sales": "Total units sold",
|
| 46 |
+
"ASP": "Dollar Sales ÷ Unit Sales (Average Selling Price)",
|
| 47 |
+
},
|
| 48 |
+
"viz_types": ["KPI_sparkline", "trend", "by_region", "by_product", "vs_target"],
|
| 49 |
+
"outlier_categories": ["surge", "decline", "pricing_anomaly", "regional_variance"],
|
| 50 |
+
"spotter_templates": [
|
| 51 |
+
"Which {entity} had the highest {kpi} last {period}?",
|
| 52 |
+
"Show me {kpi} trend by {dimension}",
|
| 53 |
+
"Why did {kpi} drop last month?",
|
| 54 |
+
"Compare {kpi} across {dimension}",
|
| 55 |
+
],
|
| 56 |
+
},
|
| 57 |
+
"Supply Chain": {
|
| 58 |
+
"kpis": ["Avg Inventory", "OTIF", "Days on Hand", "Stockout Rate"],
|
| 59 |
+
"kpi_definitions": {
|
| 60 |
+
"Avg Inventory": "(Beginning Inventory + Ending Inventory) ÷ 2",
|
| 61 |
+
"OTIF": "On-Time In-Full delivery rate",
|
| 62 |
+
"Days on Hand": "Inventory ÷ Daily Usage",
|
| 63 |
+
"Stockout Rate": "% of SKUs with zero inventory",
|
| 64 |
+
},
|
| 65 |
+
"viz_types": ["inventory_levels", "stockout_risk", "supplier_perf", "trend"],
|
| 66 |
+
"outlier_categories": ["stockout", "overstock", "lead_time_spike", "supplier_issue"],
|
| 67 |
+
"spotter_templates": [
|
| 68 |
+
"Which {entity} is at risk of stockout?",
|
| 69 |
+
"Show inventory levels by {dimension}",
|
| 70 |
+
"Which suppliers have the longest lead times?",
|
| 71 |
+
],
|
| 72 |
+
},
|
| 73 |
+
"Marketing": {
|
| 74 |
+
"kpis": ["CTR", "Bounce Rate", "Fill Rate", "Approval Rate"],
|
| 75 |
+
"kpi_definitions": {
|
| 76 |
+
"CTR": "Clicks ÷ Impressions (Click-Through Rate)",
|
| 77 |
+
"Bounce Rate": "% leaving landing page without action",
|
| 78 |
+
"Fill Rate": "% completing application/form",
|
| 79 |
+
"Approval Rate": "% of applications approved",
|
| 80 |
+
},
|
| 81 |
+
"viz_types": ["funnel", "channel_comparison", "trend", "by_campaign"],
|
| 82 |
+
"outlier_categories": ["conversion_drop", "channel_spike", "cost_anomaly"],
|
| 83 |
+
"spotter_templates": [
|
| 84 |
+
"What is our conversion rate by {channel}?",
|
| 85 |
+
"Show me the funnel for {campaign}",
|
| 86 |
+
"Which channel has the highest CTR?",
|
| 87 |
+
],
|
| 88 |
+
},
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
+
# MATRIX_OVERRIDES: Specific Vertical × Function combinations
|
| 92 |
+
# Only specify what differs from the base vertical + function merge
|
| 93 |
+
MATRIX_OVERRIDES = {
|
| 94 |
+
("Retail", "Sales"): {
|
| 95 |
+
"add_kpis": ["Basket Size", "Items per Transaction"],
|
| 96 |
+
"add_kpi_definitions": {
|
| 97 |
+
"Basket Size": "Dollar Sales ÷ Transactions",
|
| 98 |
+
"Items per Transaction": "Unit Sales ÷ Transactions",
|
| 99 |
+
},
|
| 100 |
+
"add_viz": ["by_store", "by_category"],
|
| 101 |
+
"target_persona": "VP Merchandising, Retail Sales Leader",
|
| 102 |
+
"business_problem": "$1T lost annually to stockouts and overstock",
|
| 103 |
+
},
|
| 104 |
+
("Banking", "Marketing"): {
|
| 105 |
+
"add_kpis": ["Application Fill Rate", "Cost per Acquisition"],
|
| 106 |
+
"add_kpi_definitions": {
|
| 107 |
+
"Application Fill Rate": "% completing loan/account application",
|
| 108 |
+
"Cost per Acquisition": "Marketing spend ÷ New customers acquired",
|
| 109 |
+
},
|
| 110 |
+
"rename_kpis": {"CTR": "Click-through Rate"},
|
| 111 |
+
"target_persona": "CMO, VP Digital Marketing",
|
| 112 |
+
"business_problem": "High cost per acquisition, low funnel conversion",
|
| 113 |
+
},
|
| 114 |
+
("Software", "Sales"): {
|
| 115 |
+
"add_kpis": ["ARR", "Net Revenue Retention", "Pipeline Coverage"],
|
| 116 |
+
"add_kpi_definitions": {
|
| 117 |
+
"ARR": "Annual Recurring Revenue",
|
| 118 |
+
"Net Revenue Retention": "(Starting ARR + Expansion - Churn) ÷ Starting ARR",
|
| 119 |
+
"Pipeline Coverage": "Pipeline value ÷ Quota",
|
| 120 |
+
},
|
| 121 |
+
"add_viz": ["by_segment", "by_rep"],
|
| 122 |
+
"target_persona": "CRO, VP Sales",
|
| 123 |
+
},
|
| 124 |
+
}
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
def parse_use_case(user_input: str) -> tuple[str | None, str | None]:
|
| 128 |
+
"""
|
| 129 |
+
Parse user input string like "Retail Sales" into (vertical, function) tuple.
|
| 130 |
+
|
| 131 |
+
Checks for known patterns by testing against VERTICALS.keys() and FUNCTIONS.keys().
|
| 132 |
+
Handles case-insensitive matching.
|
| 133 |
+
|
| 134 |
+
Args:
|
| 135 |
+
user_input: User input string like "Retail Sales", "Banking Marketing", etc.
|
| 136 |
+
|
| 137 |
+
Returns:
|
| 138 |
+
Tuple of (vertical, function) like ("Retail", "Sales")
|
| 139 |
+
Returns (None, None) for unclear inputs
|
| 140 |
+
"""
|
| 141 |
+
if not user_input or not user_input.strip():
|
| 142 |
+
return (None, None)
|
| 143 |
+
|
| 144 |
+
user_input_lower = user_input.strip().lower()
|
| 145 |
+
|
| 146 |
+
# Try to find both vertical and function in the input
|
| 147 |
+
found_vertical = None
|
| 148 |
+
found_function = None
|
| 149 |
+
|
| 150 |
+
# Check for known verticals (case-insensitive)
|
| 151 |
+
for vertical in VERTICALS.keys():
|
| 152 |
+
if vertical.lower() in user_input_lower:
|
| 153 |
+
found_vertical = vertical
|
| 154 |
+
break
|
| 155 |
+
|
| 156 |
+
# Check for known functions (case-insensitive)
|
| 157 |
+
for function in FUNCTIONS.keys():
|
| 158 |
+
if function.lower() in user_input_lower:
|
| 159 |
+
found_function = function
|
| 160 |
+
break
|
| 161 |
+
|
| 162 |
+
# If we found both, return them
|
| 163 |
+
if found_vertical and found_function:
|
| 164 |
+
return (found_vertical, found_function)
|
| 165 |
+
|
| 166 |
+
# If we found only one, return it with None for the other
|
| 167 |
+
if found_vertical:
|
| 168 |
+
return (found_vertical, None)
|
| 169 |
+
if found_function:
|
| 170 |
+
return (None, found_function)
|
| 171 |
+
|
| 172 |
+
# If we found neither, return (None, None)
|
| 173 |
+
return (None, None)
|
| 174 |
+
|
| 175 |
+
|
| 176 |
+
def get_use_case_config(vertical: str, function: str) -> dict:
|
| 177 |
+
"""
|
| 178 |
+
Merge vertical + function + overrides into final configuration.
|
| 179 |
+
Handles known combinations, partial matches, and fully generic cases.
|
| 180 |
+
|
| 181 |
+
Args:
|
| 182 |
+
vertical: Industry vertical (e.g., "Retail")
|
| 183 |
+
function: Functional department (e.g., "Sales")
|
| 184 |
+
|
| 185 |
+
Returns:
|
| 186 |
+
Complete configuration dict with all fields merged
|
| 187 |
+
"""
|
| 188 |
+
v = VERTICALS.get(vertical, {})
|
| 189 |
+
f = FUNCTIONS.get(function, {})
|
| 190 |
+
override = MATRIX_OVERRIDES.get((vertical, function), {})
|
| 191 |
+
|
| 192 |
+
# Determine if this is a known, partial, or generic case
|
| 193 |
+
is_known_vertical = vertical in VERTICALS
|
| 194 |
+
is_known_function = function in FUNCTIONS
|
| 195 |
+
|
| 196 |
+
# Build base config
|
| 197 |
+
config = {
|
| 198 |
+
# Metadata
|
| 199 |
+
"vertical": vertical,
|
| 200 |
+
"function": function,
|
| 201 |
+
"use_case_name": f"{vertical} {function}",
|
| 202 |
+
|
| 203 |
+
# From vertical
|
| 204 |
+
"entities": v.get("typical_entities", []).copy(),
|
| 205 |
+
"industry_terms": v.get("industry_terms", []).copy(),
|
| 206 |
+
"data_patterns": v.get("data_patterns", []).copy(),
|
| 207 |
+
|
| 208 |
+
# From function (copy to allow modification)
|
| 209 |
+
"kpis": f.get("kpis", []).copy(),
|
| 210 |
+
"kpi_definitions": f.get("kpi_definitions", {}).copy(),
|
| 211 |
+
"viz_types": f.get("viz_types", []).copy(),
|
| 212 |
+
"outlier_categories": f.get("outlier_categories", []).copy(),
|
| 213 |
+
"spotter_templates": f.get("spotter_templates", []).copy(),
|
| 214 |
+
|
| 215 |
+
# Flags
|
| 216 |
+
"is_generic": False,
|
| 217 |
+
"ai_should_determine": [],
|
| 218 |
+
}
|
| 219 |
+
|
| 220 |
+
# Apply overrides
|
| 221 |
+
if override.get("add_kpis"):
|
| 222 |
+
config["kpis"].extend(override["add_kpis"])
|
| 223 |
+
if override.get("add_kpi_definitions"):
|
| 224 |
+
config["kpi_definitions"].update(override["add_kpi_definitions"])
|
| 225 |
+
if override.get("add_viz"):
|
| 226 |
+
config["viz_types"].extend(override["add_viz"])
|
| 227 |
+
if override.get("rename_kpis"):
|
| 228 |
+
for old, new in override["rename_kpis"].items():
|
| 229 |
+
if old in config["kpis"]:
|
| 230 |
+
idx = config["kpis"].index(old)
|
| 231 |
+
config["kpis"][idx] = new
|
| 232 |
+
if override.get("target_persona"):
|
| 233 |
+
config["target_persona"] = override["target_persona"]
|
| 234 |
+
if override.get("business_problem"):
|
| 235 |
+
config["business_problem"] = override["business_problem"]
|
| 236 |
+
|
| 237 |
+
# Handle generic cases
|
| 238 |
+
if not is_known_vertical and not is_known_function:
|
| 239 |
+
# Fully generic
|
| 240 |
+
config["is_generic"] = True
|
| 241 |
+
config["ai_should_determine"] = ["entities", "industry_terms", "kpis", "viz_types", "outliers"]
|
| 242 |
+
config["prompt_user_for"] = ["key_metrics", "target_persona", "business_questions"]
|
| 243 |
+
elif not is_known_vertical:
|
| 244 |
+
# Known function, unknown vertical
|
| 245 |
+
config["is_generic"] = True
|
| 246 |
+
config["ai_should_determine"] = ["entities", "industry_terms", "data_patterns"]
|
| 247 |
+
elif not is_known_function:
|
| 248 |
+
# Known vertical, unknown function
|
| 249 |
+
config["is_generic"] = True
|
| 250 |
+
config["ai_should_determine"] = ["kpis", "viz_types", "outliers"]
|
| 251 |
+
|
| 252 |
+
# Add legacy fields for backward compatibility with existing prompts
|
| 253 |
+
if "demo_objectives" not in config:
|
| 254 |
+
config["demo_objectives"] = f"Demonstrate {function} analytics capabilities with {vertical}-specific insights"
|
| 255 |
+
if "key_metrics" not in config:
|
| 256 |
+
config["key_metrics"] = ", ".join(config["kpis"][:5]) if config["kpis"] else "revenue, growth, efficiency"
|
| 257 |
+
if "research_focus" not in config:
|
| 258 |
+
config["research_focus"] = config["industry_terms"][:5] if config["industry_terms"] else []
|
| 259 |
+
if "thoughtspot_solution" not in config:
|
| 260 |
+
config["thoughtspot_solution"] = f"Self-service analytics for {vertical} {function} teams"
|
| 261 |
+
if "persona_focus" not in config:
|
| 262 |
+
config["persona_focus"] = f"{function} optimization and decision-making"
|
| 263 |
+
if "cost_impact" not in config:
|
| 264 |
+
config["cost_impact"] = "Significant business impact through data-driven decisions"
|
| 265 |
+
if "success_outcomes" not in config:
|
| 266 |
+
config["success_outcomes"] = f"Improved {function.lower()} performance and faster insights"
|
| 267 |
+
|
| 268 |
+
return config
|
| 269 |
+
|
| 270 |
+
|
| 271 |
+
# ============================================================================
|
| 272 |
+
# LEGACY USE CASE PERSONAS (Backward Compatibility)
|
| 273 |
+
# ============================================================================
|
| 274 |
+
# Keep for backward compatibility during transition
|
| 275 |
+
# New code should use get_use_case_config() instead
|
| 276 |
+
# ============================================================================
|
| 277 |
+
|
| 278 |
# Use Case Persona Configurations
|
| 279 |
USE_CASE_PERSONAS = {
|
| 280 |
"Merchandising": {
|
|
|
|
| 883 |
|
| 884 |
def build_company_analysis_prompt(use_case, website_title, website_url, website_content, css_count, logo_candidates):
|
| 885 |
"""Build dynamic company analysis prompt based on use case"""
|
| 886 |
+
# Parse use case into vertical and function
|
| 887 |
+
vertical, function = parse_use_case(use_case)
|
| 888 |
+
|
| 889 |
+
# Get config from new system, fallback to legacy if needed
|
| 890 |
+
if vertical or function:
|
| 891 |
+
config = get_use_case_config(vertical or "Generic", function or "Generic")
|
| 892 |
+
# Map new config fields to legacy template fields
|
| 893 |
+
use_case_display = config.get('use_case_name', use_case)
|
| 894 |
+
target_persona = config.get('target_persona', 'Business Leader')
|
| 895 |
+
business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
|
| 896 |
+
# Convert KPIs list to key_metrics string
|
| 897 |
+
kpis = config.get('kpis', [])
|
| 898 |
+
key_metrics = ', '.join(kpis) if kpis else 'key operational metrics'
|
| 899 |
+
# Use function as persona_focus, or derive from vertical
|
| 900 |
+
persona_focus = function or vertical or 'operational efficiency, data-driven decisions'
|
| 901 |
+
else:
|
| 902 |
+
# Fallback to legacy system for unrecognized use cases
|
| 903 |
+
config = get_persona_config(use_case)
|
| 904 |
+
use_case_display = use_case
|
| 905 |
+
target_persona = config.get('target_persona', 'Business Leader')
|
| 906 |
+
business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
|
| 907 |
+
key_metrics = config.get('key_metrics', 'key operational metrics')
|
| 908 |
+
persona_focus = config.get('persona_focus', 'operational efficiency, data-driven decisions')
|
| 909 |
+
|
| 910 |
+
# Build template dict with mapped fields
|
| 911 |
+
template_dict = {
|
| 912 |
+
'use_case': use_case_display,
|
| 913 |
+
'target_persona': target_persona,
|
| 914 |
+
'business_problem': business_problem,
|
| 915 |
+
'key_metrics': key_metrics,
|
| 916 |
+
'persona_focus': persona_focus,
|
| 917 |
+
'cost_impact': config.get('cost_impact', 'Lost opportunities from data bottlenecks'),
|
| 918 |
+
}
|
| 919 |
|
| 920 |
+
system_prompt = COMPANY_ANALYSIS_TEMPLATE.format(**template_dict)
|
| 921 |
|
| 922 |
# Extract key business terms instead of raw content dump
|
| 923 |
key_terms = extract_key_business_terms(website_content, max_chars=1000)
|
|
|
|
| 931 |
CSS Resources: {css_count} stylesheets detected
|
| 932 |
Logo Assets: {len(logo_candidates)} logo variations found
|
| 933 |
|
| 934 |
+
Conduct analysis specifically for {use_case_display} use case targeting {target_persona} who needs to solve: {business_problem}
|
| 935 |
|
| 936 |
+
Extract specific, quantifiable information wherever possible that relates to {key_metrics} and {persona_focus}."""
|
| 937 |
|
| 938 |
return system_prompt, user_prompt
|
| 939 |
|
| 940 |
def build_industry_research_prompt(use_case, company_analysis_results):
|
| 941 |
"""Build dynamic industry research prompt based on use case and company analysis"""
|
| 942 |
+
# Parse use case into vertical and function
|
| 943 |
+
vertical, function = parse_use_case(use_case)
|
| 944 |
+
|
| 945 |
+
# Get config from new system, fallback to legacy if needed
|
| 946 |
+
if vertical or function:
|
| 947 |
+
config = get_use_case_config(vertical or "Generic", function or "Generic")
|
| 948 |
+
# Map new config fields to legacy template fields
|
| 949 |
+
use_case_display = config.get('use_case_name', use_case)
|
| 950 |
+
target_persona = config.get('target_persona', 'Business Leader')
|
| 951 |
+
business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
|
| 952 |
+
# Convert KPIs list to key_metrics string
|
| 953 |
+
kpis = config.get('kpis', [])
|
| 954 |
+
key_metrics = ', '.join(kpis) if kpis else 'key operational metrics'
|
| 955 |
+
# Use function as persona_focus, or derive from vertical
|
| 956 |
+
persona_focus = function or vertical or 'operational efficiency, data-driven decisions'
|
| 957 |
+
# Build research focus from entities, industry_terms, and data_patterns
|
| 958 |
+
entities = config.get('entities', [])
|
| 959 |
+
industry_terms = config.get('industry_terms', [])
|
| 960 |
+
data_patterns = config.get('data_patterns', [])
|
| 961 |
+
research_focus_list = []
|
| 962 |
+
if entities:
|
| 963 |
+
research_focus_list.append(f"Core entities: {', '.join(entities[:5])}")
|
| 964 |
+
if industry_terms:
|
| 965 |
+
research_focus_list.append(f"Industry terminology: {', '.join(industry_terms[:5])}")
|
| 966 |
+
if data_patterns:
|
| 967 |
+
research_focus_list.append(f"Data patterns: {', '.join(data_patterns[:3])}")
|
| 968 |
+
if not research_focus_list:
|
| 969 |
+
research_focus_list = ["core business processes", "key operational metrics", "competitive positioning"]
|
| 970 |
+
research_focus_formatted = "\n".join([f"- {focus}" for focus in research_focus_list])
|
| 971 |
+
# Default values for fields not in new system
|
| 972 |
+
thoughtspot_solution = f"AI-powered analytics for {use_case_display}"
|
| 973 |
+
success_outcomes = "Faster insights, improved decision making, operational efficiency gains"
|
| 974 |
+
demo_objectives = f"Show self-service analytics for {use_case_display}"
|
| 975 |
+
else:
|
| 976 |
+
# Fallback to legacy system for unrecognized use cases
|
| 977 |
+
config = get_persona_config(use_case)
|
| 978 |
+
use_case_display = use_case
|
| 979 |
+
target_persona = config.get('target_persona', 'Business Leader')
|
| 980 |
+
business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
|
| 981 |
+
key_metrics = config.get('key_metrics', 'key operational metrics')
|
| 982 |
+
persona_focus = config.get('persona_focus', 'operational efficiency, data-driven decisions')
|
| 983 |
+
research_focus_formatted = "\n".join([f"- {focus}" for focus in config.get('research_focus', [])])
|
| 984 |
+
thoughtspot_solution = config.get('thoughtspot_solution', 'Self-service analytics platform')
|
| 985 |
+
success_outcomes = config.get('success_outcomes', 'Faster insights, improved decision making')
|
| 986 |
+
demo_objectives = config.get('demo_objectives', 'Show self-service analytics')
|
| 987 |
|
| 988 |
+
# Build template dict with mapped fields
|
| 989 |
+
template_dict = {
|
| 990 |
+
'use_case': use_case_display,
|
| 991 |
+
'target_persona': target_persona,
|
| 992 |
+
'business_problem': business_problem,
|
| 993 |
+
'key_metrics': key_metrics,
|
| 994 |
+
'persona_focus': persona_focus,
|
| 995 |
+
'research_focus_formatted': research_focus_formatted,
|
| 996 |
+
'thoughtspot_solution': thoughtspot_solution,
|
| 997 |
+
'success_outcomes': success_outcomes,
|
| 998 |
+
'demo_objectives': demo_objectives,
|
| 999 |
+
'cost_impact': config.get('cost_impact', 'Lost opportunities from data bottlenecks'),
|
| 1000 |
+
}
|
| 1001 |
|
| 1002 |
+
system_prompt = INDUSTRY_RESEARCH_TEMPLATE.format(**template_dict)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1003 |
|
| 1004 |
+
user_prompt = f"""Conduct comprehensive {use_case_display} research based on this company analysis:
|
| 1005 |
|
| 1006 |
COMPANY ANALYSIS RESULTS:
|
| 1007 |
{company_analysis_results}
|
| 1008 |
|
| 1009 |
+
Focus specifically on creating realistic demo scenarios that showcase how ThoughtSpot's {thoughtspot_solution} solves {business_problem} for {target_persona}.
|
| 1010 |
|
| 1011 |
Provide specific recommendations for:
|
| 1012 |
1. Database schemas and table structures
|
| 1013 |
2. Realistic data patterns and volumes
|
| 1014 |
3. Compelling outlier scenarios
|
| 1015 |
+
4. Success metrics that prove ROI: {success_outcomes}"""
|
| 1016 |
|
| 1017 |
return system_prompt, user_prompt
|
|
@@ -2548,8 +2548,7 @@ Schema Validation: Will be checked next...
|
|
| 2548 |
value="*Database schema will appear here after Create stage*",
|
| 2549 |
language="sql",
|
| 2550 |
interactive=False,
|
| 2551 |
-
lines=20
|
| 2552 |
-
max_lines=30
|
| 2553 |
)
|
| 2554 |
with gr.Column(scale=1):
|
| 2555 |
edit_ddl_btn = gr.Button("🔍 DDL", elem_classes=["edit-btn"])
|
|
@@ -2579,8 +2578,7 @@ Schema Validation: Will be checked next...
|
|
| 2579 |
value="Generated Python code will appear here after population step",
|
| 2580 |
language="python",
|
| 2581 |
interactive=False,
|
| 2582 |
-
lines=10
|
| 2583 |
-
max_lines=15,
|
| 2584 |
)
|
| 2585 |
|
| 2586 |
with gr.Row():
|
|
|
|
| 2548 |
value="*Database schema will appear here after Create stage*",
|
| 2549 |
language="sql",
|
| 2550 |
interactive=False,
|
| 2551 |
+
lines=20
|
|
|
|
| 2552 |
)
|
| 2553 |
with gr.Column(scale=1):
|
| 2554 |
edit_ddl_btn = gr.Button("🔍 DDL", elem_classes=["edit-btn"])
|
|
|
|
| 2578 |
value="Generated Python code will appear here after population step",
|
| 2579 |
language="python",
|
| 2580 |
interactive=False,
|
| 2581 |
+
lines=10
|
|
|
|
| 2582 |
)
|
| 2583 |
|
| 2584 |
with gr.Row():
|
|
@@ -27,6 +27,61 @@ _direct_api_token = None
|
|
| 27 |
_direct_api_session = None
|
| 28 |
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
def _get_direct_api_session():
|
| 31 |
"""
|
| 32 |
Get or create an authenticated session for direct ThoughtSpot API calls.
|
|
@@ -2815,49 +2870,21 @@ def create_liveboard_from_model_mcp(
|
|
| 2815 |
print(f"[MCP] Starting async MCP liveboard creation...")
|
| 2816 |
try:
|
| 2817 |
print(f"[MCP] Importing MCP modules...")
|
| 2818 |
-
from mcp import ClientSession
|
| 2819 |
-
from mcp.client.
|
| 2820 |
print(f"[MCP] MCP modules imported successfully")
|
| 2821 |
|
| 2822 |
-
#
|
| 2823 |
-
# This
|
| 2824 |
-
print(f"[MCP]
|
| 2825 |
-
|
| 2826 |
-
|
| 2827 |
-
|
| 2828 |
-
|
| 2829 |
-
print(f"[MCP ERROR] Failed to get auth token for bearer auth")
|
| 2830 |
-
return {
|
| 2831 |
-
'success': False,
|
| 2832 |
-
'error': 'Failed to authenticate for MCP bearer auth'
|
| 2833 |
-
}
|
| 2834 |
-
|
| 2835 |
-
ts_host = os.getenv('THOUGHTSPOT_URL', '').rstrip('/').replace('https://', '').replace('http://', '')
|
| 2836 |
-
bearer_token = _direct_api_token
|
| 2837 |
-
|
| 2838 |
-
# Bearer auth format: "Bearer {token}@{host}"
|
| 2839 |
-
# This is ThoughtSpot's MCP server format for bearer endpoint
|
| 2840 |
-
auth_header = f"Bearer {bearer_token}@{ts_host}"
|
| 2841 |
-
# Use /bearer/mcp endpoint (Streamable HTTP transport, not SSE)
|
| 2842 |
-
mcp_endpoint = "https://agent.thoughtspot.app/bearer/mcp"
|
| 2843 |
-
|
| 2844 |
-
print(f"[MCP] Bearer endpoint: {mcp_endpoint}")
|
| 2845 |
-
print(f"[MCP] Host: {ts_host}")
|
| 2846 |
-
print(f"[MCP] Token: {bearer_token[:20]}...")
|
| 2847 |
-
|
| 2848 |
-
# Use Streamable HTTP client with bearer auth headers
|
| 2849 |
-
# This bypasses OAuth and uses our trusted auth token directly
|
| 2850 |
-
headers = {"Authorization": auth_header}
|
| 2851 |
|
| 2852 |
-
|
| 2853 |
-
async with streamablehttp_client(mcp_endpoint, headers=headers) as (read, write, _get_session_id):
|
| 2854 |
-
print(f"DEBUG: Streamable HTTP client context established")
|
| 2855 |
-
print(f"DEBUG: Creating ClientSession...")
|
| 2856 |
async with ClientSession(read, write) as session:
|
| 2857 |
-
print(f"DEBUG: ClientSession context entered")
|
| 2858 |
-
print(f"DEBUG: Calling session.initialize()...")
|
| 2859 |
await session.initialize()
|
| 2860 |
-
print(f"DEBUG: session.initialize() completed")
|
| 2861 |
|
| 2862 |
# Verify connection with ping
|
| 2863 |
print(f"Pinging MCP server...")
|
|
@@ -2955,6 +2982,9 @@ def create_liveboard_from_model_mcp(
|
|
| 2955 |
# Use direct ThoughtSpot API (bypasses MCP proxy issues)
|
| 2956 |
answer_data = _get_answer_direct(question_text, model_id)
|
| 2957 |
if answer_data:
|
|
|
|
|
|
|
|
|
|
| 2958 |
print(f" 🔍 DEBUG: Direct API answer keys: {list(answer_data.keys())}")
|
| 2959 |
answers.append(answer_data)
|
| 2960 |
print(f" ✅ Answer retrieved (direct API)", flush=True)
|
|
@@ -2966,6 +2996,9 @@ def create_liveboard_from_model_mcp(
|
|
| 2966 |
"datasourceId": model_id
|
| 2967 |
})
|
| 2968 |
answer_data = json.loads(answer_result.content[0].text)
|
|
|
|
|
|
|
|
|
|
| 2969 |
answers.append(answer_data)
|
| 2970 |
print(f" ✅ Answer retrieved (MCP fallback)", flush=True)
|
| 2971 |
else:
|
|
@@ -2982,6 +3015,9 @@ def create_liveboard_from_model_mcp(
|
|
| 2982 |
|
| 2983 |
# Parse answer data
|
| 2984 |
answer_data = json.loads(answer_result.content[0].text)
|
|
|
|
|
|
|
|
|
|
| 2985 |
print(f" 🔍 DEBUG: Answer keys: {list(answer_data.keys())}")
|
| 2986 |
answers.append(answer_data)
|
| 2987 |
print(f" ✅ Answer retrieved", flush=True)
|
|
@@ -3243,6 +3279,57 @@ def create_liveboard_from_model_mcp(
|
|
| 3243 |
})
|
| 3244 |
print(f" ✓ Added dark theme style to Viz_1")
|
| 3245 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3246 |
# Re-import fixed TML using authenticated session
|
| 3247 |
import_response = ts_client.session.post(
|
| 3248 |
f"{ts_base_url}/api/rest/2.0/metadata/tml/import",
|
|
|
|
| 27 |
_direct_api_session = None
|
| 28 |
|
| 29 |
|
| 30 |
+
def _clean_viz_title(title: str) -> str:
|
| 31 |
+
"""
|
| 32 |
+
Clean up visualization titles to be more readable.
|
| 33 |
+
|
| 34 |
+
Examples:
|
| 35 |
+
'shipping_cost by month last 18 months' → 'Shipping Cost by Month'
|
| 36 |
+
'Top 15 product_name by quantity_shipped' → 'Top 15 Products by Quantity Shipped'
|
| 37 |
+
'total_revenue weekly' → 'Total Revenue Weekly'
|
| 38 |
+
"""
|
| 39 |
+
if not title:
|
| 40 |
+
return title
|
| 41 |
+
|
| 42 |
+
# Remove date filter suffixes
|
| 43 |
+
date_filters = [
|
| 44 |
+
' last 18 months', ' last 12 months', ' last 2 years', ' last year',
|
| 45 |
+
' last 6 months', ' last 3 months', ' last 30 days', ' last 90 days'
|
| 46 |
+
]
|
| 47 |
+
for filter_str in date_filters:
|
| 48 |
+
if title.lower().endswith(filter_str):
|
| 49 |
+
title = title[:-len(filter_str)]
|
| 50 |
+
|
| 51 |
+
# Replace underscores with spaces
|
| 52 |
+
title = title.replace('_', ' ')
|
| 53 |
+
|
| 54 |
+
# Clean up common column name patterns
|
| 55 |
+
replacements = {
|
| 56 |
+
'product name': 'Products',
|
| 57 |
+
'supplier name': 'Suppliers',
|
| 58 |
+
'warehouse name': 'Warehouses',
|
| 59 |
+
'customer name': 'Customers',
|
| 60 |
+
'brand name': 'Brands',
|
| 61 |
+
'store name': 'Stores',
|
| 62 |
+
'category name': 'Categories',
|
| 63 |
+
'region name': 'Regions',
|
| 64 |
+
}
|
| 65 |
+
title_lower = title.lower()
|
| 66 |
+
for old, new in replacements.items():
|
| 67 |
+
if old in title_lower:
|
| 68 |
+
# Case-insensitive replace
|
| 69 |
+
import re
|
| 70 |
+
title = re.sub(re.escape(old), new, title, flags=re.IGNORECASE)
|
| 71 |
+
|
| 72 |
+
# Title case the result, but preserve words like 'by', 'vs', 'and'
|
| 73 |
+
words = title.split()
|
| 74 |
+
result = []
|
| 75 |
+
small_words = {'by', 'vs', 'and', 'or', 'the', 'a', 'an', 'of', 'in', 'on', 'to'}
|
| 76 |
+
for i, word in enumerate(words):
|
| 77 |
+
if i == 0 or word.lower() not in small_words:
|
| 78 |
+
result.append(word.capitalize())
|
| 79 |
+
else:
|
| 80 |
+
result.append(word.lower())
|
| 81 |
+
|
| 82 |
+
return ' '.join(result)
|
| 83 |
+
|
| 84 |
+
|
| 85 |
def _get_direct_api_session():
|
| 86 |
"""
|
| 87 |
Get or create an authenticated session for direct ThoughtSpot API calls.
|
|
|
|
| 2870 |
print(f"[MCP] Starting async MCP liveboard creation...")
|
| 2871 |
try:
|
| 2872 |
print(f"[MCP] Importing MCP modules...")
|
| 2873 |
+
from mcp import ClientSession, StdioServerParameters
|
| 2874 |
+
from mcp.client.stdio import stdio_client
|
| 2875 |
print(f"[MCP] MCP modules imported successfully")
|
| 2876 |
|
| 2877 |
+
# Use stdio client with npx mcp-remote proxy
|
| 2878 |
+
# This connects to ThoughtSpot's public MCP endpoint via npx proxy
|
| 2879 |
+
print(f"[MCP] Initializing stdio connection via npx mcp-remote...")
|
| 2880 |
+
server_params = StdioServerParameters(
|
| 2881 |
+
command="npx",
|
| 2882 |
+
args=["mcp-remote@latest", "https://agent.thoughtspot.app/mcp"]
|
| 2883 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2884 |
|
| 2885 |
+
async with stdio_client(server_params) as (read, write):
|
|
|
|
|
|
|
|
|
|
| 2886 |
async with ClientSession(read, write) as session:
|
|
|
|
|
|
|
| 2887 |
await session.initialize()
|
|
|
|
| 2888 |
|
| 2889 |
# Verify connection with ping
|
| 2890 |
print(f"Pinging MCP server...")
|
|
|
|
| 2982 |
# Use direct ThoughtSpot API (bypasses MCP proxy issues)
|
| 2983 |
answer_data = _get_answer_direct(question_text, model_id)
|
| 2984 |
if answer_data:
|
| 2985 |
+
# Clean up the viz title
|
| 2986 |
+
if 'question' in answer_data:
|
| 2987 |
+
answer_data['question'] = _clean_viz_title(answer_data['question'])
|
| 2988 |
print(f" 🔍 DEBUG: Direct API answer keys: {list(answer_data.keys())}")
|
| 2989 |
answers.append(answer_data)
|
| 2990 |
print(f" ✅ Answer retrieved (direct API)", flush=True)
|
|
|
|
| 2996 |
"datasourceId": model_id
|
| 2997 |
})
|
| 2998 |
answer_data = json.loads(answer_result.content[0].text)
|
| 2999 |
+
# Clean up the viz title
|
| 3000 |
+
if 'question' in answer_data:
|
| 3001 |
+
answer_data['question'] = _clean_viz_title(answer_data['question'])
|
| 3002 |
answers.append(answer_data)
|
| 3003 |
print(f" ✅ Answer retrieved (MCP fallback)", flush=True)
|
| 3004 |
else:
|
|
|
|
| 3015 |
|
| 3016 |
# Parse answer data
|
| 3017 |
answer_data = json.loads(answer_result.content[0].text)
|
| 3018 |
+
# Clean up the viz title
|
| 3019 |
+
if 'question' in answer_data:
|
| 3020 |
+
answer_data['question'] = _clean_viz_title(answer_data['question'])
|
| 3021 |
print(f" 🔍 DEBUG: Answer keys: {list(answer_data.keys())}")
|
| 3022 |
answers.append(answer_data)
|
| 3023 |
print(f" ✅ Answer retrieved", flush=True)
|
|
|
|
| 3279 |
})
|
| 3280 |
print(f" ✓ Added dark theme style to Viz_1")
|
| 3281 |
|
| 3282 |
+
# Convert time-series visualizations to KPIs with sparklines
|
| 3283 |
+
print(f" 🔄 Converting time-series charts to KPIs...")
|
| 3284 |
+
kpi_count = 0
|
| 3285 |
+
for viz in visualizations:
|
| 3286 |
+
if viz.get('id') == 'Viz_1':
|
| 3287 |
+
continue # Skip note tile
|
| 3288 |
+
|
| 3289 |
+
answer = viz.get('answer', {})
|
| 3290 |
+
viz_name = answer.get('name', '').lower()
|
| 3291 |
+
search_query = answer.get('search_query', '').lower()
|
| 3292 |
+
|
| 3293 |
+
# Check if this is a time-series viz (weekly, monthly, daily patterns)
|
| 3294 |
+
time_patterns = ['weekly', 'monthly', 'daily', 'quarterly', 'yearly', '.week', '.month', '.day', '.quarter', '.year']
|
| 3295 |
+
is_time_series = any(p in viz_name or p in search_query for p in time_patterns)
|
| 3296 |
+
|
| 3297 |
+
if is_time_series and 'chart' in answer:
|
| 3298 |
+
# Convert to KPI
|
| 3299 |
+
answer['chart']['type'] = 'KPI'
|
| 3300 |
+
|
| 3301 |
+
# Add KPI-specific settings for sparkline and comparison
|
| 3302 |
+
kpi_settings = {
|
| 3303 |
+
"showLabel": True,
|
| 3304 |
+
"showComparison": True,
|
| 3305 |
+
"showSparkline": True,
|
| 3306 |
+
"showAnomalies": False,
|
| 3307 |
+
"showBounds": False,
|
| 3308 |
+
"customCompare": "PREV_AVAILABLE",
|
| 3309 |
+
"showOnlyLatestAnomaly": False
|
| 3310 |
+
}
|
| 3311 |
+
|
| 3312 |
+
# Update client_state_v2 with KPI settings
|
| 3313 |
+
import json as json_module
|
| 3314 |
+
client_state = answer['chart'].get('client_state_v2', '{}')
|
| 3315 |
+
try:
|
| 3316 |
+
cs = json_module.loads(client_state) if client_state else {}
|
| 3317 |
+
if 'chartProperties' not in cs:
|
| 3318 |
+
cs['chartProperties'] = {}
|
| 3319 |
+
if 'chartSpecific' not in cs['chartProperties']:
|
| 3320 |
+
cs['chartProperties']['chartSpecific'] = {}
|
| 3321 |
+
cs['chartProperties']['chartSpecific']['customProps'] = json_module.dumps(kpi_settings)
|
| 3322 |
+
cs['chartProperties']['chartSpecific']['dataFieldArea'] = 'column'
|
| 3323 |
+
answer['chart']['client_state_v2'] = json_module.dumps(cs)
|
| 3324 |
+
except:
|
| 3325 |
+
pass # Keep existing if parsing fails
|
| 3326 |
+
|
| 3327 |
+
kpi_count += 1
|
| 3328 |
+
print(f" ✓ Converted '{answer.get('name', '?')}' to KPI")
|
| 3329 |
+
|
| 3330 |
+
if kpi_count > 0:
|
| 3331 |
+
print(f" ✅ Converted {kpi_count} visualizations to KPIs with sparklines")
|
| 3332 |
+
|
| 3333 |
# Re-import fixed TML using authenticated session
|
| 3334 |
import_response = ts_client.session.post(
|
| 3335 |
f"{ts_base_url}/api/rest/2.0/metadata/tml/import",
|
|
@@ -25,13 +25,205 @@ Usage:
|
|
| 25 |
import re
|
| 26 |
import os
|
| 27 |
from typing import Dict, List, Optional, Tuple
|
| 28 |
-
from dataclasses import dataclass
|
| 29 |
from datetime import datetime
|
| 30 |
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
@dataclass
|
| 33 |
class OutlierPattern:
|
| 34 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
title: str
|
| 36 |
description: str
|
| 37 |
sql_update: str
|
|
@@ -115,7 +307,7 @@ class OutlierGenerator:
|
|
| 115 |
target_table, target_column, conditions, pattern_description
|
| 116 |
)
|
| 117 |
|
| 118 |
-
return
|
| 119 |
title=parsed.get('title', pattern_description[:50]),
|
| 120 |
description=pattern_description,
|
| 121 |
sql_update=sql,
|
|
@@ -422,7 +614,7 @@ WHERE product_id IN (
|
|
| 422 |
|
| 423 |
def apply_outliers(
|
| 424 |
snowflake_conn,
|
| 425 |
-
outliers: List[
|
| 426 |
schema_name: str,
|
| 427 |
dry_run: bool = False
|
| 428 |
) -> List[Dict]:
|
|
@@ -483,7 +675,7 @@ def apply_outliers(
|
|
| 483 |
|
| 484 |
|
| 485 |
def generate_demo_pack(
|
| 486 |
-
outliers: List[
|
| 487 |
company_name: str,
|
| 488 |
use_case: str
|
| 489 |
) -> str:
|
|
|
|
| 25 |
import re
|
| 26 |
import os
|
| 27 |
from typing import Dict, List, Optional, Tuple
|
| 28 |
+
from dataclasses import dataclass, field
|
| 29 |
from datetime import datetime
|
| 30 |
|
| 31 |
|
| 32 |
+
# ============================================================================
|
| 33 |
+
# Phase 1: New Structured Outlier System (February 2026 Sprint)
|
| 34 |
+
# ============================================================================
|
| 35 |
+
|
| 36 |
@dataclass
|
| 37 |
class OutlierPattern:
|
| 38 |
+
"""
|
| 39 |
+
Defines a single outlier pattern that serves three purposes:
|
| 40 |
+
1. Liveboard visualizations
|
| 41 |
+
2. Spotter questions
|
| 42 |
+
3. Demo talking points
|
| 43 |
+
"""
|
| 44 |
+
# Identity
|
| 45 |
+
name: str # "ASP Decline"
|
| 46 |
+
category: str # "pricing", "volume", "inventory"
|
| 47 |
+
|
| 48 |
+
# For LIVEBOARD (visualization)
|
| 49 |
+
viz_type: str # "KPI", "COLUMN", "LINE"
|
| 50 |
+
viz_question: str # "ASP weekly"
|
| 51 |
+
viz_talking_point: str # "ASP dropped 12% — excessive discounting"
|
| 52 |
+
|
| 53 |
+
# For SPOTTER (ad-hoc questions)
|
| 54 |
+
spotter_questions: List[str] = field(default_factory=list)
|
| 55 |
+
spotter_followups: List[str] = field(default_factory=list)
|
| 56 |
+
|
| 57 |
+
# For DATA INJECTION (SQL generation)
|
| 58 |
+
sql_template: str = "" # "UPDATE {fact_table} SET {column} = ..."
|
| 59 |
+
affected_columns: List[str] = field(default_factory=list)
|
| 60 |
+
magnitude: str = "" # "15% below normal"
|
| 61 |
+
target_filter: str = "" # "WHERE REGION = 'West'"
|
| 62 |
+
|
| 63 |
+
# For DEMO NOTES
|
| 64 |
+
demo_setup: str = "" # "Start by showing overall sales are UP"
|
| 65 |
+
demo_payoff: str = "" # "Then reveal ASP is DOWN — 'at what cost?'"
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
@dataclass
|
| 69 |
+
class OutlierConfig:
|
| 70 |
+
"""
|
| 71 |
+
Configuration for outliers per use case.
|
| 72 |
+
Combines required patterns, optional patterns, and AI generation guidance.
|
| 73 |
+
"""
|
| 74 |
+
required: List[OutlierPattern] = field(default_factory=list) # Always include
|
| 75 |
+
optional: List[OutlierPattern] = field(default_factory=list) # AI picks 1-2
|
| 76 |
+
allow_ai_generated: bool = True # AI can create 1 custom
|
| 77 |
+
ai_guidance: str = "" # Hint for AI generation
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
OUTLIER_CONFIGS = {
|
| 81 |
+
("Retail", "Sales"): OutlierConfig(
|
| 82 |
+
required=[
|
| 83 |
+
OutlierPattern(
|
| 84 |
+
name="ASP Decline",
|
| 85 |
+
category="pricing",
|
| 86 |
+
viz_type="KPI",
|
| 87 |
+
viz_question="ASP weekly",
|
| 88 |
+
viz_talking_point="ASP dropped 12% even though revenue is up — we're discounting too heavily",
|
| 89 |
+
spotter_questions=[
|
| 90 |
+
"Why did ASP drop last month?",
|
| 91 |
+
"Which products have the biggest discount?",
|
| 92 |
+
"Show me ASP by region",
|
| 93 |
+
],
|
| 94 |
+
spotter_followups=[
|
| 95 |
+
"Compare to same period last year",
|
| 96 |
+
"Which stores are discounting most?",
|
| 97 |
+
],
|
| 98 |
+
sql_template="UPDATE {fact_table} SET UNIT_PRICE = UNIT_PRICE * 0.85 WHERE REGION = 'West' AND {date_column} > '{recent_date}'",
|
| 99 |
+
affected_columns=["UNIT_PRICE", "DISCOUNT_PCT"],
|
| 100 |
+
magnitude="15% below normal",
|
| 101 |
+
target_filter="WHERE REGION = 'West'",
|
| 102 |
+
demo_setup="Start by showing overall sales are UP — everything looks good",
|
| 103 |
+
demo_payoff="Then reveal ASP is DOWN — 'but at what cost?' moment",
|
| 104 |
+
),
|
| 105 |
+
OutlierPattern(
|
| 106 |
+
name="Regional Variance",
|
| 107 |
+
category="geographic",
|
| 108 |
+
viz_type="COLUMN",
|
| 109 |
+
viz_question="Dollar Sales by Region",
|
| 110 |
+
viz_talking_point="West region outperforming by 40% — what are they doing differently?",
|
| 111 |
+
spotter_questions=[
|
| 112 |
+
"Which region has the highest sales?",
|
| 113 |
+
"Compare West to East performance",
|
| 114 |
+
],
|
| 115 |
+
spotter_followups=[
|
| 116 |
+
"What products are driving West?",
|
| 117 |
+
"Show me the trend for West region",
|
| 118 |
+
],
|
| 119 |
+
sql_template="UPDATE {fact_table} SET QUANTITY = QUANTITY * 1.4 WHERE REGION = 'West'",
|
| 120 |
+
affected_columns=["QUANTITY", "REVENUE"],
|
| 121 |
+
magnitude="40% above other regions",
|
| 122 |
+
target_filter="WHERE REGION = 'West'",
|
| 123 |
+
demo_setup="Show overall sales by region",
|
| 124 |
+
demo_payoff="West is crushing it — drill in to find out why",
|
| 125 |
+
),
|
| 126 |
+
],
|
| 127 |
+
optional=[
|
| 128 |
+
OutlierPattern(
|
| 129 |
+
name="Seasonal Spike",
|
| 130 |
+
category="temporal",
|
| 131 |
+
viz_type="LINE",
|
| 132 |
+
viz_question="Dollar Sales trend by month",
|
| 133 |
+
viz_talking_point="Holiday surge 3x normal — were we prepared?",
|
| 134 |
+
spotter_questions=["Show me sales trend for Q4", "When was our peak sales day?"],
|
| 135 |
+
spotter_followups=[],
|
| 136 |
+
sql_template="UPDATE {fact_table} SET QUANTITY = QUANTITY * 3 WHERE MONTH IN (11, 12)",
|
| 137 |
+
affected_columns=["QUANTITY", "REVENUE"],
|
| 138 |
+
magnitude="3x normal",
|
| 139 |
+
target_filter="WHERE MONTH IN (11, 12)",
|
| 140 |
+
demo_setup="",
|
| 141 |
+
demo_payoff="",
|
| 142 |
+
),
|
| 143 |
+
OutlierPattern(
|
| 144 |
+
name="Category Surge",
|
| 145 |
+
category="product",
|
| 146 |
+
viz_type="COLUMN",
|
| 147 |
+
viz_question="Dollar Sales by Category",
|
| 148 |
+
viz_talking_point="Electronics up 60% YoY while Apparel flat",
|
| 149 |
+
spotter_questions=["Which category grew fastest?", "Compare Electronics to Apparel"],
|
| 150 |
+
spotter_followups=[],
|
| 151 |
+
sql_template="",
|
| 152 |
+
affected_columns=[],
|
| 153 |
+
magnitude="60% YoY",
|
| 154 |
+
target_filter="",
|
| 155 |
+
demo_setup="",
|
| 156 |
+
demo_payoff="",
|
| 157 |
+
),
|
| 158 |
+
],
|
| 159 |
+
allow_ai_generated=True,
|
| 160 |
+
ai_guidance="If company has sustainability initiatives, create outlier around eco-friendly product sales",
|
| 161 |
+
),
|
| 162 |
+
|
| 163 |
+
("Banking", "Marketing"): OutlierConfig(
|
| 164 |
+
required=[
|
| 165 |
+
OutlierPattern(
|
| 166 |
+
name="Funnel Drop-off",
|
| 167 |
+
category="conversion",
|
| 168 |
+
viz_type="COLUMN",
|
| 169 |
+
viz_question="Conversion rate by funnel stage",
|
| 170 |
+
viz_talking_point="70% drop-off at application page — UX issue?",
|
| 171 |
+
spotter_questions=[
|
| 172 |
+
"Where is our biggest funnel drop-off?",
|
| 173 |
+
"What's our application completion rate?",
|
| 174 |
+
],
|
| 175 |
+
spotter_followups=[],
|
| 176 |
+
sql_template="",
|
| 177 |
+
affected_columns=[],
|
| 178 |
+
magnitude="70% drop-off",
|
| 179 |
+
target_filter="",
|
| 180 |
+
demo_setup="Show the full funnel from impression to approval",
|
| 181 |
+
demo_payoff="The application page is killing conversions",
|
| 182 |
+
),
|
| 183 |
+
],
|
| 184 |
+
optional=[
|
| 185 |
+
OutlierPattern(
|
| 186 |
+
name="Channel Performance",
|
| 187 |
+
category="channel",
|
| 188 |
+
viz_type="COLUMN",
|
| 189 |
+
viz_question="CTR by channel",
|
| 190 |
+
viz_talking_point="Mobile CTR 2x desktop — shift budget?",
|
| 191 |
+
spotter_questions=["Which channel has the best CTR?"],
|
| 192 |
+
spotter_followups=[],
|
| 193 |
+
sql_template="",
|
| 194 |
+
affected_columns=[],
|
| 195 |
+
magnitude="2x desktop",
|
| 196 |
+
target_filter="",
|
| 197 |
+
demo_setup="",
|
| 198 |
+
demo_payoff="",
|
| 199 |
+
),
|
| 200 |
+
],
|
| 201 |
+
allow_ai_generated=True,
|
| 202 |
+
ai_guidance="Consider seasonal patterns in loan applications",
|
| 203 |
+
),
|
| 204 |
+
}
|
| 205 |
+
|
| 206 |
+
|
| 207 |
+
def get_outliers_for_use_case(vertical: str, function: str) -> OutlierConfig:
|
| 208 |
+
"""Get outlier configuration for a use case, with fallback to empty config."""
|
| 209 |
+
return OUTLIER_CONFIGS.get(
|
| 210 |
+
(vertical, function),
|
| 211 |
+
OutlierConfig(
|
| 212 |
+
required=[],
|
| 213 |
+
optional=[],
|
| 214 |
+
allow_ai_generated=True,
|
| 215 |
+
ai_guidance=f"Generate outliers appropriate for {vertical} {function}"
|
| 216 |
+
)
|
| 217 |
+
)
|
| 218 |
+
|
| 219 |
+
|
| 220 |
+
# ============================================================================
|
| 221 |
+
# Legacy Outlier System (existing code below)
|
| 222 |
+
# ============================================================================
|
| 223 |
+
|
| 224 |
+
@dataclass
|
| 225 |
+
class LegacyOutlierPattern:
|
| 226 |
+
"""Represents a data pattern to inject (legacy structure)."""
|
| 227 |
title: str
|
| 228 |
description: str
|
| 229 |
sql_update: str
|
|
|
|
| 307 |
target_table, target_column, conditions, pattern_description
|
| 308 |
)
|
| 309 |
|
| 310 |
+
return LegacyOutlierPattern(
|
| 311 |
title=parsed.get('title', pattern_description[:50]),
|
| 312 |
description=pattern_description,
|
| 313 |
sql_update=sql,
|
|
|
|
| 614 |
|
| 615 |
def apply_outliers(
|
| 616 |
snowflake_conn,
|
| 617 |
+
outliers: List[LegacyOutlierPattern],
|
| 618 |
schema_name: str,
|
| 619 |
dry_run: bool = False
|
| 620 |
) -> List[Dict]:
|
|
|
|
| 675 |
|
| 676 |
|
| 677 |
def generate_demo_pack(
|
| 678 |
+
outliers: List[LegacyOutlierPattern],
|
| 679 |
company_name: str,
|
| 680 |
use_case: str
|
| 681 |
) -> str:
|
|
@@ -408,4 +408,159 @@ REQUIREMENTS:
|
|
| 408 |
- Add data validation and error handling
|
| 409 |
- Generate complete .env file template
|
| 410 |
|
| 411 |
-
Generate executable code that creates compelling {use_case} demo data for {company_name}."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 408 |
- Add data validation and error handling
|
| 409 |
- Generate complete .env file template
|
| 410 |
|
| 411 |
+
Generate executable code that creates compelling {use_case} demo data for {company_name}."""
|
| 412 |
+
|
| 413 |
+
# ============================================================================
|
| 414 |
+
# UNIFIED PROMPT BUILDING SYSTEM (Phase 1 - February 2026)
|
| 415 |
+
# ============================================================================
|
| 416 |
+
# New composable prompt construction system that assembles context sections
|
| 417 |
+
# consistently across all stages (research, DDL, liveboard, demo notes)
|
| 418 |
+
# ============================================================================
|
| 419 |
+
|
| 420 |
+
def build_prompt(
|
| 421 |
+
stage: str,
|
| 422 |
+
vertical: str,
|
| 423 |
+
function: str,
|
| 424 |
+
company_context: str,
|
| 425 |
+
user_overrides: str = None,
|
| 426 |
+
) -> str:
|
| 427 |
+
"""
|
| 428 |
+
Build a complete prompt by assembling context sections.
|
| 429 |
+
|
| 430 |
+
Args:
|
| 431 |
+
stage: One of "research", "ddl", "liveboard", "demo_notes"
|
| 432 |
+
vertical: Industry vertical (e.g., "Retail")
|
| 433 |
+
function: Functional department (e.g., "Sales")
|
| 434 |
+
company_context: Text from website research
|
| 435 |
+
user_overrides: Optional user requirements that override defaults
|
| 436 |
+
|
| 437 |
+
Returns:
|
| 438 |
+
Complete prompt string ready for LLM
|
| 439 |
+
"""
|
| 440 |
+
from demo_personas import get_use_case_config
|
| 441 |
+
from outlier_system import get_outliers_for_use_case
|
| 442 |
+
|
| 443 |
+
# Get merged configuration
|
| 444 |
+
config = get_use_case_config(vertical, function)
|
| 445 |
+
outliers = get_outliers_for_use_case(vertical, function)
|
| 446 |
+
|
| 447 |
+
# Build sections
|
| 448 |
+
sections = []
|
| 449 |
+
|
| 450 |
+
# Section A: Company Context
|
| 451 |
+
sections.append(f"""## COMPANY CONTEXT
|
| 452 |
+
{company_context}""")
|
| 453 |
+
|
| 454 |
+
# Section B: Use Case Framework
|
| 455 |
+
persona = config.get("target_persona", "Business Leader")
|
| 456 |
+
problem = config.get("business_problem", "Need for faster, data-driven decisions")
|
| 457 |
+
sections.append(f"""## USE CASE
|
| 458 |
+
- **Name:** {vertical} {function}
|
| 459 |
+
- **Target Persona:** {persona}
|
| 460 |
+
- **Business Problem:** {problem}
|
| 461 |
+
- **Industry Terms:** {', '.join(config.get('industry_terms', []))}
|
| 462 |
+
- **Typical Entities:** {', '.join(config.get('entities', []))}""")
|
| 463 |
+
|
| 464 |
+
# Section C: Required KPIs and Visualizations
|
| 465 |
+
kpi_text = "\n".join([f"- {kpi}: {config['kpi_definitions'].get(kpi, '')}" for kpi in config.get('kpis', [])])
|
| 466 |
+
sections.append(f"""## REQUIRED KPIs
|
| 467 |
+
{kpi_text}
|
| 468 |
+
|
| 469 |
+
## REQUIRED VISUALIZATIONS
|
| 470 |
+
{', '.join(config.get('viz_types', []))}""")
|
| 471 |
+
|
| 472 |
+
# Section D: Outlier Patterns
|
| 473 |
+
if outliers.required:
|
| 474 |
+
outlier_text = "\n".join([f"- **{o.name}:** {o.viz_talking_point}" for o in outliers.required])
|
| 475 |
+
sections.append(f"""## DATA STORIES TO CREATE
|
| 476 |
+
{outlier_text}""")
|
| 477 |
+
|
| 478 |
+
# Section E: Spotter Questions
|
| 479 |
+
spotter_qs = []
|
| 480 |
+
for o in outliers.required:
|
| 481 |
+
spotter_qs.extend(o.spotter_questions[:2]) # Top 2 from each required outlier
|
| 482 |
+
if spotter_qs:
|
| 483 |
+
sections.append(f"""## SPOTTER QUESTIONS TO ENABLE
|
| 484 |
+
{chr(10).join(['- ' + q for q in spotter_qs[:6]])}""")
|
| 485 |
+
|
| 486 |
+
# Section F: User Overrides
|
| 487 |
+
if user_overrides:
|
| 488 |
+
sections.append(f"""## USER REQUIREMENTS (override defaults)
|
| 489 |
+
{user_overrides}""")
|
| 490 |
+
|
| 491 |
+
# Section G: AI Guidance
|
| 492 |
+
if config.get("is_generic"):
|
| 493 |
+
ai_tasks = config.get("ai_should_determine", [])
|
| 494 |
+
sections.append(f"""## AI TASKS (Generic Use Case)
|
| 495 |
+
This is a generic use case without pre-defined configuration.
|
| 496 |
+
Please determine the following based on company context:
|
| 497 |
+
{chr(10).join(['- ' + task for task in ai_tasks])}""")
|
| 498 |
+
else:
|
| 499 |
+
sections.append("""## AI GUIDANCE
|
| 500 |
+
- Include all REQUIRED KPIs and visualizations listed above
|
| 501 |
+
- You may add 2-3 additional items if valuable for this specific company
|
| 502 |
+
- If you add something, briefly explain why""")
|
| 503 |
+
|
| 504 |
+
# Assemble final prompt
|
| 505 |
+
context_block = "\n\n---\n\n".join(sections)
|
| 506 |
+
|
| 507 |
+
# Get stage-specific template
|
| 508 |
+
template = STAGE_TEMPLATES.get(stage, DEFAULT_TEMPLATE)
|
| 509 |
+
|
| 510 |
+
return template.format(
|
| 511 |
+
context=context_block,
|
| 512 |
+
vertical=vertical,
|
| 513 |
+
function=function,
|
| 514 |
+
)
|
| 515 |
+
|
| 516 |
+
|
| 517 |
+
# Stage-specific templates
|
| 518 |
+
STAGE_TEMPLATES = {
|
| 519 |
+
"research": """You are a business intelligence analyst researching a company for demo preparation.
|
| 520 |
+
|
| 521 |
+
{context}
|
| 522 |
+
|
| 523 |
+
---
|
| 524 |
+
|
| 525 |
+
Provide comprehensive research focusing on information that will help create a compelling {vertical} {function} demo.""",
|
| 526 |
+
|
| 527 |
+
"ddl": """You are a database architect creating a schema for a {vertical} {function} demo.
|
| 528 |
+
|
| 529 |
+
{context}
|
| 530 |
+
|
| 531 |
+
---
|
| 532 |
+
|
| 533 |
+
Create Snowflake DDL that supports all the KPIs, visualizations, and data stories listed above.
|
| 534 |
+
Follow star schema design with clear fact and dimension tables.""",
|
| 535 |
+
|
| 536 |
+
"liveboard": """You are creating a ThoughtSpot liveboard for a {vertical} {function} demo.
|
| 537 |
+
|
| 538 |
+
{context}
|
| 539 |
+
|
| 540 |
+
---
|
| 541 |
+
|
| 542 |
+
Generate visualization questions that will create a compelling liveboard.
|
| 543 |
+
The first two questions MUST be KPIs with sparklines (format: "{{measure}} weekly" or "{{measure}} monthly").
|
| 544 |
+
Include visualizations that enable the data stories and Spotter questions listed above.""",
|
| 545 |
+
|
| 546 |
+
"demo_notes": """You are creating demo talking points for a {vertical} {function} demo.
|
| 547 |
+
|
| 548 |
+
{context}
|
| 549 |
+
|
| 550 |
+
---
|
| 551 |
+
|
| 552 |
+
Create a bullet outline demo script with:
|
| 553 |
+
- Opening hook and problem statement
|
| 554 |
+
- Key visualizations to show with talking points
|
| 555 |
+
- The "aha moment" reveal
|
| 556 |
+
- Spotter questions to ask live
|
| 557 |
+
- Closing value proposition""",
|
| 558 |
+
}
|
| 559 |
+
|
| 560 |
+
DEFAULT_TEMPLATE = """You are helping create a {vertical} {function} demo.
|
| 561 |
+
|
| 562 |
+
{context}
|
| 563 |
+
|
| 564 |
+
---
|
| 565 |
+
|
| 566 |
+
Provide output appropriate for this use case."""
|
|
@@ -7,7 +7,6 @@ Bundles confirmations into one step when confident.
|
|
| 7 |
|
| 8 |
import os
|
| 9 |
from typing import Dict, List, Optional, Tuple
|
| 10 |
-
from openai import OpenAI
|
| 11 |
from snowflake_auth import get_snowflake_connection
|
| 12 |
from thoughtspot_deployer import ThoughtSpotDeployer
|
| 13 |
import json
|
|
@@ -16,18 +15,67 @@ import json
|
|
| 16 |
class SmartDataAdjuster:
|
| 17 |
"""Smart adjuster with liveboard context and conversational flow"""
|
| 18 |
|
| 19 |
-
def __init__(self, database: str, schema: str, liveboard_guid: str):
|
| 20 |
self.database = database
|
| 21 |
self.schema = schema
|
| 22 |
self.liveboard_guid = liveboard_guid
|
| 23 |
self.conn = None
|
| 24 |
self.ts_client = None
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
# Context about the liveboard
|
| 28 |
self.liveboard_name = None
|
| 29 |
self.visualizations = [] # List of viz metadata
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
def connect(self):
|
| 32 |
"""Connect to Snowflake and ThoughtSpot"""
|
| 33 |
# Snowflake
|
|
@@ -300,13 +348,7 @@ CRITICAL: target_value and percentage must be numbers, never strings.
|
|
| 300 |
If unsure about ANY field, set confidence to "low" or "medium".
|
| 301 |
"""
|
| 302 |
|
| 303 |
-
|
| 304 |
-
model="gpt-4o",
|
| 305 |
-
messages=[{"role": "user", "content": prompt}],
|
| 306 |
-
temperature=0
|
| 307 |
-
)
|
| 308 |
-
|
| 309 |
-
content = response.choices[0].message.content
|
| 310 |
if content.startswith('```'):
|
| 311 |
lines = content.split('\n')
|
| 312 |
content = '\n'.join(lines[1:-1])
|
|
|
|
| 7 |
|
| 8 |
import os
|
| 9 |
from typing import Dict, List, Optional, Tuple
|
|
|
|
| 10 |
from snowflake_auth import get_snowflake_connection
|
| 11 |
from thoughtspot_deployer import ThoughtSpotDeployer
|
| 12 |
import json
|
|
|
|
| 15 |
class SmartDataAdjuster:
|
| 16 |
"""Smart adjuster with liveboard context and conversational flow"""
|
| 17 |
|
| 18 |
+
def __init__(self, database: str, schema: str, liveboard_guid: str, llm_model: str = None):
|
| 19 |
self.database = database
|
| 20 |
self.schema = schema
|
| 21 |
self.liveboard_guid = liveboard_guid
|
| 22 |
self.conn = None
|
| 23 |
self.ts_client = None
|
| 24 |
+
|
| 25 |
+
# LLM setup - use provided model or default to Claude
|
| 26 |
+
self.llm_model = llm_model or os.getenv('DEFAULT_LLM', 'claude-sonnet-4')
|
| 27 |
+
self._llm_client = None
|
| 28 |
|
| 29 |
# Context about the liveboard
|
| 30 |
self.liveboard_name = None
|
| 31 |
self.visualizations = [] # List of viz metadata
|
| 32 |
|
| 33 |
+
def _call_llm(self, prompt: str) -> str:
|
| 34 |
+
"""Call the configured LLM (Anthropic or OpenAI)"""
|
| 35 |
+
# Determine provider from model name
|
| 36 |
+
model_lower = self.llm_model.lower()
|
| 37 |
+
|
| 38 |
+
if 'claude' in model_lower or 'anthropic' in model_lower:
|
| 39 |
+
# Use Anthropic
|
| 40 |
+
import anthropic
|
| 41 |
+
client = anthropic.Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))
|
| 42 |
+
|
| 43 |
+
# Map display names to API model names
|
| 44 |
+
model_map = {
|
| 45 |
+
'claude-sonnet-4': 'claude-sonnet-4-20250514',
|
| 46 |
+
'claude-sonnet-4.5': 'claude-sonnet-4-20250514',
|
| 47 |
+
'claude-3.5-sonnet': 'claude-3-5-sonnet-20241022',
|
| 48 |
+
'claude-3-opus': 'claude-3-opus-20240229',
|
| 49 |
+
}
|
| 50 |
+
api_model = model_map.get(self.llm_model, 'claude-sonnet-4-20250514')
|
| 51 |
+
|
| 52 |
+
response = client.messages.create(
|
| 53 |
+
model=api_model,
|
| 54 |
+
max_tokens=2000,
|
| 55 |
+
messages=[{"role": "user", "content": prompt}]
|
| 56 |
+
)
|
| 57 |
+
return response.content[0].text
|
| 58 |
+
else:
|
| 59 |
+
# Use OpenAI
|
| 60 |
+
from openai import OpenAI
|
| 61 |
+
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
|
| 62 |
+
|
| 63 |
+
# Map display names to API model names
|
| 64 |
+
model_map = {
|
| 65 |
+
'gpt-4o': 'gpt-4o',
|
| 66 |
+
'gpt-4': 'gpt-4',
|
| 67 |
+
'gpt-4-turbo': 'gpt-4-turbo',
|
| 68 |
+
'gpt-3.5-turbo': 'gpt-3.5-turbo',
|
| 69 |
+
}
|
| 70 |
+
api_model = model_map.get(self.llm_model, 'gpt-4o')
|
| 71 |
+
|
| 72 |
+
response = client.chat.completions.create(
|
| 73 |
+
model=api_model,
|
| 74 |
+
messages=[{"role": "user", "content": prompt}],
|
| 75 |
+
temperature=0
|
| 76 |
+
)
|
| 77 |
+
return response.choices[0].message.content
|
| 78 |
+
|
| 79 |
def connect(self):
|
| 80 |
"""Connect to Snowflake and ThoughtSpot"""
|
| 81 |
# Snowflake
|
|
|
|
| 348 |
If unsure about ANY field, set confidence to "low" or "medium".
|
| 349 |
"""
|
| 350 |
|
| 351 |
+
content = self._call_llm(prompt)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 352 |
if content.startswith('```'):
|
| 353 |
lines = content.split('\n')
|
| 354 |
content = '\n'.join(lines[1:-1])
|
|
@@ -68,6 +68,9 @@
|
|
| 68 |
## Tasks
|
| 69 |
|
| 70 |
### To Do
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
#### LegitData Improvements (from REI demo learnings)
|
| 73 |
- [ ] **Fix DAYSONHAND generation** - Currently random, needs business logic:
|
|
@@ -143,3 +146,145 @@
|
|
| 143 |
|
| 144 |
## Notes
|
| 145 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
## Tasks
|
| 69 |
|
| 70 |
### To Do
|
| 71 |
+
- [ ] **Fix tag assignment to models** - Returns 404 error, works for tables but not models
|
| 72 |
+
- [ ] **CRITICAL: Fix MCP bearer auth for on-prem deployments** - OAuth workaround works for cloud but bearer auth needed for on-prem instances that OAuth can't reach (see detailed notes below)
|
| 73 |
+
- [ ] **Fix research cache not loading** - Cache files exist but aren't found due to relative path issue (fix ready, needs restart to test)
|
| 74 |
|
| 75 |
#### LegitData Improvements (from REI demo learnings)
|
| 76 |
- [ ] **Fix DAYSONHAND generation** - Currently random, needs business logic:
|
|
|
|
| 146 |
|
| 147 |
## Notes
|
| 148 |
|
| 149 |
+
### Feb 3, 2026 - ThoughtSpot Model Validation & MCP Import Fix
|
| 150 |
+
|
| 151 |
+
**Issue 1: ThoughtSpot Model Validation Failed (Error 13124)**
|
| 152 |
+
- Model TML was missing ID columns (CUSTOMER_ID, STORE_ID, etc.) that were referenced in joins
|
| 153 |
+
- Joins validated but columns section didn't include the join keys
|
| 154 |
+
|
| 155 |
+
**Root Cause:**
|
| 156 |
+
- Code in `thoughtspot_deployer.py` (lines 976-984) was intentionally skipping FK/PK columns to "clean up" the model
|
| 157 |
+
- Logic: "nobody searches for customer 23455 so hide ID columns"
|
| 158 |
+
- But ThoughtSpot requires columns used in joins to be present in the model, even if users don't search them
|
| 159 |
+
|
| 160 |
+
**Solution:**
|
| 161 |
+
- Commented out the skip logic for FK/PK columns in `_create_model_with_constraints()`
|
| 162 |
+
- ID columns now included in model's `columns:` section
|
| 163 |
+
- Model deploys successfully with all 54 columns including IDs
|
| 164 |
+
|
| 165 |
+
**Issue 2: MCP Import Error**
|
| 166 |
+
- `from mcp.client.streamable_http import streamablehttp_client` failed
|
| 167 |
+
- ModuleNotFoundError: No module named 'mcp.client.streamable_http'
|
| 168 |
+
|
| 169 |
+
**Root Cause:**
|
| 170 |
+
- MCP package upgraded from 0.x to 1.0.0
|
| 171 |
+
- Module structure changed: `streamable_http` → `sse` (Server-Sent Events)
|
| 172 |
+
|
| 173 |
+
**Solution:**
|
| 174 |
+
- Updated import: `from mcp.client.sse import sse_client`
|
| 175 |
+
- Updated client usage: `sse_client()` instead of `streamablehttp_client()`
|
| 176 |
+
|
| 177 |
+
---
|
| 178 |
+
|
| 179 |
+
### Feb 3, 2026 - Supabase Compatibility Fix
|
| 180 |
+
**Issue:** Supabase module import failing with `ModuleNotFoundError: No module named 'websockets.asyncio'` causing app to not load settings and default to OpenAI (which had exceeded quota).
|
| 181 |
+
|
| 182 |
+
**Root Cause:**
|
| 183 |
+
- Gradio 4.44.0 requires `websockets<13.0`
|
| 184 |
+
- Newer Supabase versions (2.10+) require `websockets>=11` but pull realtime 2.x which needs `websockets.asyncio` (only in 13+)
|
| 185 |
+
- Version conflict prevented Supabase from loading
|
| 186 |
+
|
| 187 |
+
**Solution:** Downgraded to compatible version set:
|
| 188 |
+
- `supabase==1.2.0`
|
| 189 |
+
- `realtime==1.0.6`
|
| 190 |
+
- `websockets==12.0`
|
| 191 |
+
- `httpx==0.24.1` (already had this)
|
| 192 |
+
- `gradio==4.44.0` (unchanged)
|
| 193 |
+
|
| 194 |
+
**Impact:** Settings now load properly from Supabase, app uses correct LLM model from user settings instead of falling back to OpenAI.
|
| 195 |
+
|
| 196 |
+
---
|
| 197 |
+
|
| 198 |
+
### Feb 3, 2026 - MCP Bearer Auth vs OAuth Investigation
|
| 199 |
+
|
| 200 |
+
**Context:** MCP liveboard creation was working previously with on-prem ThoughtSpot instances that can't be reached via OAuth. This means bearer auth was the working solution. However, current implementation fails with 400 Bad Request.
|
| 201 |
+
|
| 202 |
+
**Problem Statement:**
|
| 203 |
+
- MCP endpoint `https://agent.thoughtspot.app/bearer/mcp` returns 400 Bad Request when using SSE or streamable_http clients
|
| 204 |
+
- OAuth via stdio works but only for cloud instances accessible from internet
|
| 205 |
+
- Need bearer auth for on-prem deployments
|
| 206 |
+
|
| 207 |
+
**Investigation Timeline:**
|
| 208 |
+
|
| 209 |
+
1. **Initial Error (Feb 3 AM):**
|
| 210 |
+
- Error: `HTTPStatusError: Client error '400 Bad Request' for url 'https://agent.thoughtspot.app/bearer/mcp'`
|
| 211 |
+
- Code was using `from mcp.client.sse import sse_client` (MCP 1.0)
|
| 212 |
+
- Bearer auth header format: `Bearer {token}@{host}`
|
| 213 |
+
|
| 214 |
+
2. **First Attempted Fix - Downgrade to MCP 0.9.1:**
|
| 215 |
+
- Reasoning: Maybe MCP 1.0's SSE client doesn't work with bearer endpoint
|
| 216 |
+
- Result: MCP 0.9.1 doesn't have `streamable_http` module either - only has `sse` and `stdio`
|
| 217 |
+
- **Learning:** `streamable_http` never existed in any released MCP version we can access
|
| 218 |
+
|
| 219 |
+
3. **Git History Investigation:**
|
| 220 |
+
- Commit `f10a9f5` (Jan 27): Added `from mcp.client.streamable_http import streamablehttp_client` with bearer auth
|
| 221 |
+
- requirements.txt at that time: `mcp==1.0.0`
|
| 222 |
+
- But MCP 1.0.0 doesn't actually have `streamable_http` module!
|
| 223 |
+
- **Learning:** That code was committed but never successfully tested/deployed
|
| 224 |
+
|
| 225 |
+
4. **Found Working Implementation:**
|
| 226 |
+
- Commit `d26f47e` (earlier): Used `stdio_client` with `npx mcp-remote` proxy
|
| 227 |
+
- Code:
|
| 228 |
+
```python
|
| 229 |
+
from mcp import ClientSession, StdioServerParameters
|
| 230 |
+
from mcp.client.stdio import stdio_client
|
| 231 |
+
|
| 232 |
+
server_params = StdioServerParameters(
|
| 233 |
+
command="npx",
|
| 234 |
+
args=["mcp-remote@latest", "https://agent.thoughtspot.app/mcp"]
|
| 235 |
+
)
|
| 236 |
+
async with stdio_client(server_params) as (read, write):
|
| 237 |
+
async with ClientSession(read, write) as session:
|
| 238 |
+
await session.initialize()
|
| 239 |
+
```
|
| 240 |
+
- This approach uses OAuth but works
|
| 241 |
+
|
| 242 |
+
5. **Current Workaround (OAuth via stdio):**
|
| 243 |
+
- Reverted to stdio_client approach from commit d26f47e
|
| 244 |
+
- Tested successfully: Created liveboard b6cc9cad-ff91-4dd4-aec5-091984c2afd2
|
| 245 |
+
- OAuth flow opens browser for authorization
|
| 246 |
+
- Works for cloud instances only
|
| 247 |
+
|
| 248 |
+
**Technical Details:**
|
| 249 |
+
|
| 250 |
+
**Bearer Auth Endpoint (Not Working):**
|
| 251 |
+
- URL: `https://agent.thoughtspot.app/bearer/mcp`
|
| 252 |
+
- Auth header: `Bearer {token}@{host}`
|
| 253 |
+
- Transport: Unknown (streamable_http doesn't exist, SSE returns 400)
|
| 254 |
+
- Status: 400 Bad Request - endpoint rejects SSE connection attempts
|
| 255 |
+
|
| 256 |
+
**OAuth Endpoint (Currently Working):**
|
| 257 |
+
- URL: `https://agent.thoughtspot.app/mcp`
|
| 258 |
+
- Proxy: `npx mcp-remote@latest`
|
| 259 |
+
- Transport: stdio → npx → StreamableHTTPClientTransport (handled by mcp-remote)
|
| 260 |
+
- Auth: Browser OAuth flow
|
| 261 |
+
- Limitation: Requires internet-accessible ThoughtSpot instance
|
| 262 |
+
|
| 263 |
+
**The Problem:**
|
| 264 |
+
- User confirmed it was working with on-prem instances before
|
| 265 |
+
- On-prem instances can't complete OAuth (not internet-accessible)
|
| 266 |
+
- Therefore, bearer auth must have been working at some point
|
| 267 |
+
- But no evidence in git history of working bearer auth code
|
| 268 |
+
- `mcp-remote` proxy shows it connects using `StreamableHTTPClientTransport` after OAuth
|
| 269 |
+
- The bearer endpoint might require the same transport but with bearer auth headers instead of OAuth
|
| 270 |
+
|
| 271 |
+
**Possible Solutions to Investigate:**
|
| 272 |
+
1. **Use mcp-remote with bearer auth**: See if `npx mcp-remote` supports bearer token parameter
|
| 273 |
+
2. **Direct StreamableHTTPClientTransport**: Find/install the transport library that mcp-remote uses internally
|
| 274 |
+
3. **MCP pre-1.0 version**: Search for alpha/beta versions before 0.9.1 that might have streamable_http
|
| 275 |
+
4. **ThoughtSpot-specific MCP package**: Check if ThoughtSpot provides their own MCP client library
|
| 276 |
+
5. **Raw HTTP requests**: Bypass MCP library and make direct HTTP calls to bearer endpoint
|
| 277 |
+
|
| 278 |
+
**Current State:**
|
| 279 |
+
- OAuth via stdio works for cloud instances
|
| 280 |
+
- Bearer auth needed for on-prem but implementation unclear
|
| 281 |
+
- Temporary workaround: Using OAuth approach (works for testing/development)
|
| 282 |
+
- **BLOCKER for on-prem deployments**
|
| 283 |
+
|
| 284 |
+
**Next Steps:**
|
| 285 |
+
- [ ] Contact ThoughtSpot to ask about bearer auth implementation
|
| 286 |
+
- [ ] Investigate mcp-remote source code to see how it handles StreamableHTTPClientTransport
|
| 287 |
+
- [ ] Test if mcp-remote accepts bearer token as parameter
|
| 288 |
+
- [ ] Look for ThoughtSpot-specific documentation on MCP bearer auth
|
| 289 |
+
|
| 290 |
+
|
|
@@ -380,7 +380,6 @@ def load_gradio_settings(email: str) -> Dict[str, Any]:
|
|
| 380 |
"column_naming_style": "snake_case", # Options: snake_case, camelCase, PascalCase, UPPER_CASE, original
|
| 381 |
|
| 382 |
# Liveboard Creation
|
| 383 |
-
"liveboard_method": "HYBRID",
|
| 384 |
"geo_scope": "USA Only",
|
| 385 |
"validation_mode": "Off",
|
| 386 |
|
|
|
|
| 380 |
"column_naming_style": "snake_case", # Options: snake_case, camelCase, PascalCase, UPPER_CASE, original
|
| 381 |
|
| 382 |
# Liveboard Creation
|
|
|
|
| 383 |
"geo_scope": "USA Only",
|
| 384 |
"validation_mode": "Off",
|
| 385 |
|
|
@@ -973,15 +973,19 @@ class ThoughtSpotDeployer:
|
|
| 973 |
col_name = col['name'].upper()
|
| 974 |
original_col_name = col.get('original_name', col['name']) # Use original casing for display
|
| 975 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 976 |
# SKIP foreign key columns - they're join keys, not analytics columns
|
| 977 |
-
if self._is_foreign_key_column(col_name, table_name_upper, foreign_keys):
|
| 978 |
-
|
| 979 |
-
|
| 980 |
-
|
| 981 |
# SKIP surrogate primary keys (numeric IDs) - nobody searches "customer 23455"
|
| 982 |
-
if self._is_surrogate_primary_key(col, col_name):
|
| 983 |
-
|
| 984 |
-
|
| 985 |
|
| 986 |
# Start with basic conflict resolution
|
| 987 |
display_name = self._resolve_column_name_conflict(
|
|
@@ -1646,6 +1650,9 @@ class ThoughtSpotDeployer:
|
|
| 1646 |
return True
|
| 1647 |
else:
|
| 1648 |
print(f"[ThoughtSpot] ⚠️ Tag assignment failed: {assign_response.status_code}", flush=True)
|
|
|
|
|
|
|
|
|
|
| 1649 |
return False
|
| 1650 |
|
| 1651 |
except Exception as e:
|
|
@@ -2126,8 +2133,10 @@ class ThoughtSpotDeployer:
|
|
| 2126 |
|
| 2127 |
try:
|
| 2128 |
# Build company data from parameters
|
|
|
|
|
|
|
| 2129 |
company_data = {
|
| 2130 |
-
'name':
|
| 2131 |
'use_case': use_case or 'General Analytics'
|
| 2132 |
}
|
| 2133 |
|
|
|
|
| 973 |
col_name = col['name'].upper()
|
| 974 |
original_col_name = col.get('original_name', col['name']) # Use original casing for display
|
| 975 |
|
| 976 |
+
# NOTE: We used to skip FK/PK columns, but ThoughtSpot requires them for joins
|
| 977 |
+
# Even though users don't search "customer 23455", the join columns must be present
|
| 978 |
+
# in the model's columns section for the joins to work properly.
|
| 979 |
+
#
|
| 980 |
# SKIP foreign key columns - they're join keys, not analytics columns
|
| 981 |
+
# if self._is_foreign_key_column(col_name, table_name_upper, foreign_keys):
|
| 982 |
+
# print(f" ⏭️ Skipping FK column: {table_name_upper}.{col_name}")
|
| 983 |
+
# continue
|
| 984 |
+
#
|
| 985 |
# SKIP surrogate primary keys (numeric IDs) - nobody searches "customer 23455"
|
| 986 |
+
# if self._is_surrogate_primary_key(col, col_name):
|
| 987 |
+
# print(f" ⏭️ Skipping surrogate PK: {table_name_upper}.{col_name}")
|
| 988 |
+
# continue
|
| 989 |
|
| 990 |
# Start with basic conflict resolution
|
| 991 |
display_name = self._resolve_column_name_conflict(
|
|
|
|
| 1650 |
return True
|
| 1651 |
else:
|
| 1652 |
print(f"[ThoughtSpot] ⚠️ Tag assignment failed: {assign_response.status_code}", flush=True)
|
| 1653 |
+
print(f"[ThoughtSpot] DEBUG: Response text: {assign_response.text[:500]}", flush=True)
|
| 1654 |
+
print(f"[ThoughtSpot] DEBUG: Object GUIDs: {object_guids}", flush=True)
|
| 1655 |
+
print(f"[ThoughtSpot] DEBUG: Object type: {object_type}", flush=True)
|
| 1656 |
return False
|
| 1657 |
|
| 1658 |
except Exception as e:
|
|
|
|
| 2133 |
|
| 2134 |
try:
|
| 2135 |
# Build company data from parameters
|
| 2136 |
+
# Clean company name for display (strip .com, .org, etc)
|
| 2137 |
+
clean_company = company_name.split('.')[0].title() if company_name and '.' in company_name else (company_name or 'Demo Company')
|
| 2138 |
company_data = {
|
| 2139 |
+
'name': clean_company,
|
| 2140 |
'use_case': use_case or 'General Analytics'
|
| 2141 |
}
|
| 2142 |
|