mikeboone Cursor commited on
Commit
3b2cd7b
·
1 Parent(s): b5feaff

Feb sprint: vertical×function matrix, structured outliers, unified prompts

Browse files

- demo_personas.py: VERTICALS, FUNCTIONS, MATRIX_OVERRIDES dicts with get_use_case_config() and parse_use_case()
- outlier_system.py: OutlierPattern/OutlierConfig dataclasses, OUTLIER_CONFIGS with Retail Sales patterns
- prompts.py: build_prompt() composable system, STAGE_TEMPLATES for research/ddl/liveboard/demo_notes
- chat_interface.py: research cache fix (absolute paths), auto-use cache, DDL failure guard
- liveboard_creator.py: _clean_viz_title() helper, revert MCP to working stdio/npx approach
- smart_data_adjuster.py: multi-LLM support (Claude + OpenAI) via _call_llm()
- thoughtspot_deployer.py: fix model validation by keeping FK/PK columns, tag debug logging
- CLAUDE.md/PROJECT_STATUS.md: simplify liveboard docs to unified process
- demo_prep.py: remove unsupported max_lines from gr.Code()

Co-authored-by: Cursor <cursoragent@cursor.com>

CLAUDE.md CHANGED
@@ -111,7 +111,7 @@ Example:
111
 
112
  ```bash
113
  # Run the app properly
114
- source ./demoprep/bin/activate && python demo_prep.py
115
 
116
  # Check git changes
117
  git diff --stat
@@ -201,51 +201,21 @@ DO NOT use create_visualization_tml() directly - that's internal low-level code
201
 
202
  ---
203
 
204
- ## Liveboard Creation - Three-Method System
205
-
206
- **PRIMARY GOAL: All three methods (TML, MCP, HYBRID) can be selected via Settings UI**
207
-
208
- ### Method Selection
209
- - **Settings UI:** Admin tab "Liveboard Creation Method" dropdown
210
- - **Environment variable:** `LIVEBOARD_METHOD=TML|MCP|HYBRID`
211
- - **Legacy:** `USE_MCP_LIVEBOARD=true/false` still works for backwards compatibility
212
- - **Default:** HYBRID (recommended)
213
- - **Entry point:** `thoughtspot_deployer.py` deploy_all() function
214
-
215
- ### The Three Methods
216
-
217
- | Method | Speed | Quality | Control | Best For |
218
- |--------|-------|---------|---------|----------|
219
- | **TML** | ~20s | High (with tuning) | Full | Precise control, debugging |
220
- | **MCP** | ~60s | Basic | None | Quick prototypes |
221
- | **HYBRID** | ~90s | Best | Via post-processing | Production demos |
222
-
223
- ### TML Method (Template-Based)
224
- - Builds ThoughtSpot Modeling Language (YAML) structures directly
225
- - Full control over chart types, layout, colors
226
- - REST API with token auth
227
- - **Main function:** `create_liveboard_from_model()` in liveboard_creator.py
228
- - **Class:** `LiveboardCreator`
229
-
230
- ### MCP Method (AI-Driven)
231
- - Uses Model Context Protocol with ThoughtSpot's agent.thoughtspot.app
232
- - Leverages ThoughtSpot's AI for smart question generation
233
- - Natural language questions → ThoughtSpot creates visualizations
234
- - OAuth authentication, requires npx/Node.js
235
- - **Main function:** `create_liveboard_from_model_mcp()` in liveboard_creator.py
236
-
237
- ### HYBRID Method (Recommended)
238
- - **Step 1:** MCP creates liveboard quickly with AI-driven questions
239
- - **Step 2:** TML post-processing enhances with:
240
- - Groups (tabs) for organization
241
- - KPI sparkline fixes
242
- - Brand color styling
243
- - **Main functions:**
244
- - `create_liveboard_from_model_mcp()` for creation
245
- - `enhance_mcp_liveboard()` for post-processing
246
-
247
- ### enhance_mcp_liveboard() Function
248
- Located in `liveboard_creator.py`, this function:
249
  1. Exports the MCP-created liveboard TML
250
  2. Classifies visualizations by type (KPI, trend, categorical)
251
  3. Adds Groups (tabs) to organize by type
@@ -253,18 +223,16 @@ Located in `liveboard_creator.py`, this function:
253
  5. Applies brand colors to groups and tiles
254
  6. Re-imports the enhanced TML
255
 
256
- ### KPI Requirements (All methods need these)
257
  - **For sparklines and percent change comparisons:**
258
  - Must include time dimension (date column)
259
  - Must specify granularity (daily, weekly, monthly, quarterly, yearly)
260
  - Example: `[Total_revenue] [Order_date].monthly`
261
- - **MCP:** Natural language includes time context
262
- - **TML:** Search query must have `[measure] [date_column].granularity`
263
- - **HYBRID:** Post-processing adds sparkline settings automatically
264
 
265
  ### Terminology (Important!)
266
- - **Outliers** = Interesting data points in existing data (works with all methods)
267
- - **Data Adjuster** = Modifying data values (NOT possible with MCP, needs Snowflake views)
268
 
269
  ### Golden Demo Structure
270
  - **Location:** `dev_notes/liveboard_demogold2/🏬 Global Retail Apparel Sales (New).liveboard.tml`
@@ -273,11 +241,6 @@ Located in `liveboard_creator.py`, this function:
273
  - Brand colors via style_properties (GBC_A-J for groups, TBC_A-J for tiles)
274
  - KPI structure: `[sales] [date].weekly [date].'last 8 quarters'`
275
 
276
- ### Testing Strategy
277
- - Test all three methods when changing shared code
278
- - HYBRID should be the default for most testing
279
- - Use TML for debugging visualization issues
280
-
281
  ---
282
 
283
  ## Frustration Points (AVOID)
@@ -301,5 +264,5 @@ User gets frustrated when you:
301
 
302
  ---
303
 
304
- *Last Updated: January 13, 2026*
305
  *This is the source of truth - update rules here, not in .cursorrules*
 
111
 
112
  ```bash
113
  # Run the app properly
114
+ source ./demoprep/bin/activate && python chat_interface.py
115
 
116
  # Check git changes
117
  git diff --stat
 
201
 
202
  ---
203
 
204
+ ## Liveboard Creation
205
+
206
+ Liveboard creation is a single unified process with two phases:
207
+
208
+ 1. **MCP Creation** - Uses ThoughtSpot's AI (via Model Context Protocol at `agent.thoughtspot.app`) to generate smart visualizations from natural language questions
209
+ 2. **TML Post-Processing** - Enhances the AI-created liveboard with groups, KPI sparklines, brand colors, and layout refinement
210
+
211
+ These are implemented as separate functions but are **one process** - do NOT treat them as separate "methods" or offer the user a choice between them.
212
+
213
+ ### Key Functions (liveboard_creator.py)
214
+ - **`create_liveboard_from_model_mcp()`** - Main entry point. Handles MCP creation.
215
+ - **`enhance_mcp_liveboard()`** - Post-processing. Exports TML, enhances, re-imports.
216
+ - **`LiveboardCreator` class** - TML utilities used during post-processing.
217
+
218
+ ### enhance_mcp_liveboard() Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
219
  1. Exports the MCP-created liveboard TML
220
  2. Classifies visualizations by type (KPI, trend, categorical)
221
  3. Adds Groups (tabs) to organize by type
 
223
  5. Applies brand colors to groups and tiles
224
  6. Re-imports the enhanced TML
225
 
226
+ ### KPI Requirements
227
  - **For sparklines and percent change comparisons:**
228
  - Must include time dimension (date column)
229
  - Must specify granularity (daily, weekly, monthly, quarterly, yearly)
230
  - Example: `[Total_revenue] [Order_date].monthly`
231
+ - Post-processing adds sparkline settings automatically
 
 
232
 
233
  ### Terminology (Important!)
234
+ - **Outliers** = Interesting data points in existing data
235
+ - **Data Adjuster** = Modifying data values (needs Snowflake views)
236
 
237
  ### Golden Demo Structure
238
  - **Location:** `dev_notes/liveboard_demogold2/🏬 Global Retail Apparel Sales (New).liveboard.tml`
 
241
  - Brand colors via style_properties (GBC_A-J for groups, TBC_A-J for tiles)
242
  - KPI structure: `[sales] [date].weekly [date].'last 8 quarters'`
243
 
 
 
 
 
 
244
  ---
245
 
246
  ## Frustration Points (AVOID)
 
264
 
265
  ---
266
 
267
+ *Last Updated: February 4, 2026*
268
  *This is the source of truth - update rules here, not in .cursorrules*
PROJECT_STATUS.md CHANGED
@@ -40,15 +40,13 @@ An AI-powered demo builder for ThoughtSpot that automatically creates complete d
40
 
41
  **Working:**
42
  - End-to-end demo creation via chat interface
43
- - Three-method liveboard creation (TML, MCP, HYBRID)
44
- - HYBRID method: MCP creates + TML post-processing for Groups, KPIs, colors
45
- - Settings UI for method selection
46
  - LegitData for realistic data generation
47
  - Supabase settings persistence
48
  - ThoughtSpot authentication and deployment
49
 
50
  **Needs Work:**
51
- - Outliers not working well with MCP method
52
  - Data adjuster has column matching issues
53
  - Tags not assigning to objects
54
 
@@ -56,10 +54,9 @@ An AI-powered demo builder for ThoughtSpot that automatically creates complete d
56
 
57
  ## Key Technical Decisions
58
 
59
- **Liveboard Creation**: Three-method system (configurable via Settings UI)
60
- - TML: Template-based, full control over visualizations
61
- - MCP: AI-driven, fast creation, basic quality
62
- - HYBRID (default): MCP creates + TML post-processing (recommended)
63
 
64
  **Data Generation**: LegitData
65
  - Uses AI + web search for realistic data
@@ -79,10 +76,10 @@ An AI-powered demo builder for ThoughtSpot that automatically creates complete d
79
 
80
  ## Sprint History
81
 
82
- - **Sprint Jan 2026**: Making it better (current) - see `sprint_2026_01.md` in root
 
83
  - *(Previous sprints archived in dev_notes/archive/)*
84
- - *(Sprint files are gitignored - local working docs)*
85
 
86
  ---
87
 
88
- *Last Updated: January 12, 2026*
 
40
 
41
  **Working:**
42
  - End-to-end demo creation via chat interface
43
+ - Liveboard creation: MCP creates visualizations + TML post-processing for Groups, KPIs, colors
 
 
44
  - LegitData for realistic data generation
45
  - Supabase settings persistence
46
  - ThoughtSpot authentication and deployment
47
 
48
  **Needs Work:**
49
+ - Outliers need better integration into liveboard creation
50
  - Data adjuster has column matching issues
51
  - Tags not assigning to objects
52
 
 
54
 
55
  ## Key Technical Decisions
56
 
57
+ **Liveboard Creation**: MCP creation + TML post-processing
58
+ - MCP (via `agent.thoughtspot.app`) generates AI-driven visualizations
59
+ - TML post-processing adds Groups, KPI sparklines, brand colors, layout refinement
 
60
 
61
  **Data Generation**: LegitData
62
  - Uses AI + web search for realistic data
 
76
 
77
  ## Sprint History
78
 
79
+ - **Sprint Feb 2026**: Current - see `sprint_2026_02.md` in root
80
+ - **Sprint Jan 2026**: Closed - see `sprint_2026_01.md` in root
81
  - *(Previous sprints archived in dev_notes/archive/)*
 
82
 
83
  ---
84
 
85
+ *Last Updated: February 4, 2026*
chat_interface.py CHANGED
@@ -9,11 +9,21 @@ warnings.filterwarnings('ignore', message='.*tuples.*format.*chatbot.*deprecated
9
  import gradio as gr
10
  import os
11
  import sys
 
 
 
12
  from dotenv import load_dotenv
13
  from demo_builder_class import DemoBuilder
14
  from supabase_client import load_gradio_settings
15
  from main_research import MultiLLMResearcher, Website
16
- from demo_personas import build_company_analysis_prompt, build_industry_research_prompt
 
 
 
 
 
 
 
17
  from demo_prep import map_llm_display_to_provider
18
 
19
  load_dotenv(override=True)
@@ -497,6 +507,13 @@ Watch the AI Feedback tab for real-time progress!"""
497
 
498
  # Auto-create DDL
499
  ddl_response, ddl_code = self.run_ddl_creation()
 
 
 
 
 
 
 
500
  chat_history[-1] = (message, f"✅ DDL Created\n\n🚀 **Deploying to Snowflake...**")
501
  yield chat_history, current_stage, current_model, company, use_case, ""
502
 
@@ -1402,6 +1419,10 @@ To change settings, use:
1402
  use_case: Use case name
1403
  generic_context: Additional context provided by user for generic use cases
1404
  """
 
 
 
 
1405
  import time
1406
  import os
1407
  from main_research import ResultsManager
@@ -1454,25 +1475,41 @@ To change settings, use:
1454
  use_case_safe = use_case.lower().replace(' ', '_').replace('/', '_')
1455
 
1456
  # Try new format first (with use case)
 
 
 
1457
  cache_filename = f"{safe_domain}_{use_case_safe}.json"
1458
- cache_filepath = os.path.join("results", cache_filename)
1459
 
1460
- # If new format doesn't exist, try old format (without use case)
1461
  if not os.path.exists(cache_filepath):
1462
- old_cache_filename = f"research_{safe_domain}.json"
1463
- old_cache_filepath = os.path.join("results", old_cache_filename)
1464
- if os.path.exists(old_cache_filepath):
1465
- cache_filename = old_cache_filename
1466
- cache_filepath = old_cache_filepath
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1467
 
1468
  cached_results = None
1469
  cache_age_hours = None
1470
 
1471
- # Allow cache for generic use cases during testing (was: skip cache for fresh research)
1472
- # if self.is_generic_use_case:
1473
- # self.log_feedback(f"🔄 Generic use case detected - skipping cache, running fresh research")
1474
- # progress_message += f"🔄 **Generic use case** - running fresh research for custom context...\n"
1475
- # yield progress_message
1476
  if os.path.exists(cache_filepath):
1477
  try:
1478
  # Check cache age (5 day expiry)
@@ -1481,25 +1518,38 @@ To change settings, use:
1481
  cache_age_hours = cache_age / 3600 # Convert to hours
1482
 
1483
  if cache_age_hours <= 120: # Cache valid for 5 days (120 hours)
1484
- self.log_feedback(f"📋 Found cached research (age: {cache_age_hours:.1f} hours)")
1485
- progress_message += f"📋 **Found Cached Research!**\n\n"
1486
- progress_message += f"**Age:** {cache_age_hours:.1f} hours old\n"
1487
- progress_message += f"**Company:** {domain}\n"
1488
- progress_message += f"**Use Case:** {use_case}\n\n"
1489
- progress_message += "**Would you like to use the cached results?**\n"
1490
- progress_message += "- Type 'yes' to use cache (instant)\n"
1491
- progress_message += "- Type 'no' to run fresh research (2-3 minutes)\n"
1492
 
1493
- # Store cache info for later use
1494
- self._cached_research_path = cache_filepath
1495
- self._cache_available = True
1496
 
1497
- # Yield with "yes" pre-filled
 
 
 
 
 
 
 
 
 
1498
  yield progress_message
1499
- return # Wait for user response
 
 
 
 
 
 
 
 
 
 
1500
  else:
1501
- self.log_feedback(f"📋 Found cached research but it's too old ({cache_age_hours:.1f} hours)")
1502
- progress_message += f"📋 Cache too old ({cache_age_hours:.1f} hours), running fresh research...\n"
1503
  yield progress_message
1504
  except Exception as e:
1505
  self.log_feedback(f"⚠️ Could not load cache: {str(e)}")
@@ -1659,8 +1709,8 @@ To change settings, use:
1659
  'use_case': use_case,
1660
  'generated_at': datetime.now().isoformat(),
1661
  }
1662
- os.makedirs("results", exist_ok=True)
1663
- ResultsManager.save_results(research_results, cache_filename, "results")
1664
  progress_message += "💾 Cached research results for future use!\n\n"
1665
  yield progress_message
1666
  except Exception as e:
@@ -2076,6 +2126,10 @@ Generate complete CREATE TABLE statements with proper Snowflake syntax and depen
2076
  self.log_feedback("Generating DDL...")
2077
  ddl_result = researcher.make_request(messages, temperature=0.2, max_tokens=4000, stream=False)
2078
 
 
 
 
 
2079
  # Store in demo_builder
2080
  self.demo_builder.schema_generation_results = ddl_result
2081
  self.ddl_code = ddl_result
@@ -2104,6 +2158,9 @@ Generate complete CREATE TABLE statements with proper Snowflake syntax and depen
2104
  import traceback
2105
  error_msg = f"❌ DDL creation failed: {str(e)}\n{traceback.format_exc()}"
2106
  self.log_feedback(error_msg)
 
 
 
2107
  return error_msg, ""
2108
 
2109
  def get_fallback_population_code(self, schema_info, fact_rows=10000, dim_rows=100):
@@ -2475,19 +2532,28 @@ Generate complete CREATE TABLE statements with proper Snowflake syntax and depen
2475
  self.log_feedback("🔢 Starting data population...")
2476
 
2477
  try:
2478
- from demo_personas import get_persona_config
2479
  from schema_utils import parse_ddl_schema, generate_schema_constrained_prompt
2480
  import re
2481
 
2482
- persona_config = get_persona_config(self.demo_builder.use_case)
 
 
2483
 
2484
  # Build business context for population
 
 
 
 
 
 
 
 
2485
  business_context = f"""
2486
  BUSINESS CONTEXT:
2487
- - Use Case: {self.demo_builder.use_case}
2488
- - Target Persona: {persona_config['target_persona']}
2489
- - Business Problem: {persona_config['business_problem']}
2490
- - Demo Objectives: {persona_config['demo_objectives']}
2491
 
2492
  MANDATORY CONNECTION CODE (MUST BE COMPLETE):
2493
  ```python
@@ -2649,6 +2715,14 @@ LegitData will generate realistic, AI-powered data.
2649
  self.demo_builder.schema_generation_results
2650
  )
2651
 
 
 
 
 
 
 
 
 
2652
  if not success:
2653
  log_progress(f"[ERROR] DDL Deployment failed!")
2654
  raise Exception(f"Schema deployment failed: {deploy_message}")
@@ -2706,8 +2780,17 @@ LegitData will generate realistic, AI-powered data.
2706
 
2707
  def run_population():
2708
  try:
 
 
 
 
 
 
 
 
 
2709
  success, message, results = populate_demo_data(
2710
- ddl_content=self.demo_builder.schema_generation_results,
2711
  company_url=self.demo_builder.company_url,
2712
  use_case=self.demo_builder.use_case,
2713
  schema_name=schema_name,
@@ -2865,11 +2948,14 @@ Tables: Created and populated
2865
  ts_secret = os.getenv('THOUGHTSPOT_SECRET_KEY')
2866
 
2867
  liveboard_method = self.settings.get('liveboard_method', 'HYBRID')
2868
- liveboard_name = self.settings.get('liveboard_name', '') or f"{company} - {use_case}"
 
 
 
2869
 
2870
  # Get company data for liveboard
2871
  company_data = {
2872
- 'name': company,
2873
  'url': getattr(self.demo_builder, 'company_url', company),
2874
  'logo_url': getattr(self.demo_builder, 'logo_url', None),
2875
  'primary_color': getattr(self.demo_builder, 'primary_color', '#3498db'),
@@ -3231,7 +3317,9 @@ Ask these questions to showcase ThoughtSpot's AI capabilities:
3231
  try:
3232
  from smart_data_adjuster import SmartDataAdjuster
3233
 
3234
- adjuster = SmartDataAdjuster(database, schema_name, liveboard_guid)
 
 
3235
  adjuster.connect()
3236
 
3237
  if adjuster.load_liveboard_context():
@@ -4243,7 +4331,7 @@ if __name__ == "__main__":
4243
 
4244
  app.launch(
4245
  server_name="0.0.0.0",
4246
- server_port=int(os.environ.get('PORT', 7863)), # Reads from .env, defaults to 7863
4247
  share=False,
4248
  inbrowser=True,
4249
  debug=True,
 
9
  import gradio as gr
10
  import os
11
  import sys
12
+ import json
13
+ import time
14
+ import glob
15
  from dotenv import load_dotenv
16
  from demo_builder_class import DemoBuilder
17
  from supabase_client import load_gradio_settings
18
  from main_research import MultiLLMResearcher, Website
19
+ from demo_personas import (
20
+ build_company_analysis_prompt,
21
+ build_industry_research_prompt,
22
+ VERTICALS,
23
+ FUNCTIONS,
24
+ get_use_case_config,
25
+ parse_use_case
26
+ )
27
  from demo_prep import map_llm_display_to_provider
28
 
29
  load_dotenv(override=True)
 
507
 
508
  # Auto-create DDL
509
  ddl_response, ddl_code = self.run_ddl_creation()
510
+
511
+ # Check if DDL creation failed
512
+ if not ddl_code or ddl_code.strip() == "":
513
+ chat_history[-1] = (message, f"{ddl_response}\n\n❌ **Cannot proceed without valid DDL.** Please fix the error and try again.")
514
+ yield chat_history, current_stage, current_model, company, use_case, ""
515
+ return
516
+
517
  chat_history[-1] = (message, f"✅ DDL Created\n\n🚀 **Deploying to Snowflake...**")
518
  yield chat_history, current_stage, current_model, company, use_case, ""
519
 
 
1419
  use_case: Use case name
1420
  generic_context: Additional context provided by user for generic use cases
1421
  """
1422
+ print(f"\n\n[CACHE DEBUG] === run_research_streaming called ===")
1423
+ print(f"[CACHE DEBUG] company: {company}")
1424
+ print(f"[CACHE DEBUG] use_case: {use_case}\n\n")
1425
+
1426
  import time
1427
  import os
1428
  from main_research import ResultsManager
 
1475
  use_case_safe = use_case.lower().replace(' ', '_').replace('/', '_')
1476
 
1477
  # Try new format first (with use case)
1478
+ # Use absolute path to ensure we find cache regardless of CWD
1479
+ script_dir = os.path.dirname(os.path.abspath(__file__))
1480
+ results_dir = os.path.join(script_dir, "results")
1481
  cache_filename = f"{safe_domain}_{use_case_safe}.json"
1482
+ cache_filepath = os.path.join(results_dir, cache_filename)
1483
 
1484
+ # If exact match doesn't exist, try fuzzy matching for similar use cases
1485
  if not os.path.exists(cache_filepath):
1486
+ import glob
1487
+ print(f"[CACHE DEBUG] Current working directory: {os.getcwd()}")
1488
+ print(f"[CACHE DEBUG] Script directory: {script_dir}")
1489
+ print(f"[CACHE DEBUG] Results directory: {results_dir}")
1490
+ similar_files = glob.glob(os.path.join(results_dir, f"{safe_domain}_*.json"))
1491
+ print(f"[CACHE DEBUG] Exact file {cache_filepath} not found")
1492
+ print(f"[CACHE DEBUG] Glob pattern: {results_dir}/{safe_domain}_*.json")
1493
+ print(f"[CACHE DEBUG] Similar files found: {similar_files}")
1494
+ if similar_files:
1495
+ # Found similar cache files for this company
1496
+ cache_filepath = similar_files[0] # Use the first one found
1497
+ cache_filename = os.path.basename(cache_filepath)
1498
+ print(f"[CACHE DEBUG] Using similar file: {cache_filename}")
1499
+ self.log_feedback(f"📋 Found similar cache file: {cache_filename}")
1500
+ elif not os.path.exists(cache_filepath):
1501
+ # Try old format (without use case)
1502
+ old_cache_filename = f"research_{safe_domain}.json"
1503
+ old_cache_filepath = os.path.join(results_dir, old_cache_filename)
1504
+ if os.path.exists(old_cache_filepath):
1505
+ cache_filename = old_cache_filename
1506
+ cache_filepath = old_cache_filepath
1507
 
1508
  cached_results = None
1509
  cache_age_hours = None
1510
 
1511
+ # Check for cached research and use automatically if valid
1512
+ print(f"[CACHE DEBUG] Final cache_filepath: {cache_filepath}, exists: {os.path.exists(cache_filepath)}")
 
 
 
1513
  if os.path.exists(cache_filepath):
1514
  try:
1515
  # Check cache age (5 day expiry)
 
1518
  cache_age_hours = cache_age / 3600 # Convert to hours
1519
 
1520
  if cache_age_hours <= 120: # Cache valid for 5 days (120 hours)
1521
+ self.log_feedback(f"📋 Using cached research (age: {cache_age_hours:.1f} hours)")
1522
+ progress_message += f"📋 **Using Cached Research** ({cache_age_hours:.1f} hours old)\n\n"
 
 
 
 
 
 
1523
 
1524
+ # Load cached results automatically
1525
+ with open(cache_filepath, 'r') as f:
1526
+ cached_data = json.load(f)
1527
 
1528
+ self.demo_builder.company_analysis_results = cached_data.get('company_summary', '')
1529
+ self.demo_builder.industry_research_results = cached_data.get('research_paper', '')
1530
+ self.demo_builder.combined_research_results = self.demo_builder.get_research_context()
1531
+ self.demo_builder.company_url = cached_data.get('url', url)
1532
+ self.demo_builder.advance_stage()
1533
+
1534
+ progress_message += "✅ **Research loaded from cache!**\n\n"
1535
+ progress_message += "Proceeding to DDL generation...\n"
1536
+
1537
+ self.log_feedback("✅ Research loaded from cache, generating DDL")
1538
  yield progress_message
1539
+
1540
+ # Automatically trigger DDL generation
1541
+ try:
1542
+ response, ddl_code = self.run_ddl_creation()
1543
+ yield response
1544
+ except Exception as e:
1545
+ import traceback
1546
+ error_msg = f"❌ DDL generation failed: {str(e)}\n{traceback.format_exc()}"
1547
+ self.log_feedback(error_msg)
1548
+ yield error_msg
1549
+ return
1550
  else:
1551
+ self.log_feedback(f"📋 Cache too old ({cache_age_hours:.1f} hours), running fresh research")
1552
+ progress_message += f"📋 Cache expired ({cache_age_hours:.1f} hours old), running fresh research...\n"
1553
  yield progress_message
1554
  except Exception as e:
1555
  self.log_feedback(f"⚠️ Could not load cache: {str(e)}")
 
1709
  'use_case': use_case,
1710
  'generated_at': datetime.now().isoformat(),
1711
  }
1712
+ os.makedirs(results_dir, exist_ok=True)
1713
+ ResultsManager.save_results(research_results, cache_filename, results_dir)
1714
  progress_message += "💾 Cached research results for future use!\n\n"
1715
  yield progress_message
1716
  except Exception as e:
 
2126
  self.log_feedback("Generating DDL...")
2127
  ddl_result = researcher.make_request(messages, temperature=0.2, max_tokens=4000, stream=False)
2128
 
2129
+ # Validate DDL result
2130
+ if not ddl_result or not isinstance(ddl_result, str) or 'CREATE TABLE' not in ddl_result.upper():
2131
+ raise Exception(f"DDL generation failed or produced invalid output. Result: {ddl_result[:200] if ddl_result else 'None'}")
2132
+
2133
  # Store in demo_builder
2134
  self.demo_builder.schema_generation_results = ddl_result
2135
  self.ddl_code = ddl_result
 
2158
  import traceback
2159
  error_msg = f"❌ DDL creation failed: {str(e)}\n{traceback.format_exc()}"
2160
  self.log_feedback(error_msg)
2161
+ # Set schema_generation_results to empty string so it's not None
2162
+ self.demo_builder.schema_generation_results = ""
2163
+ self.ddl_code = ""
2164
  return error_msg, ""
2165
 
2166
  def get_fallback_population_code(self, schema_info, fact_rows=10000, dim_rows=100):
 
2532
  self.log_feedback("🔢 Starting data population...")
2533
 
2534
  try:
 
2535
  from schema_utils import parse_ddl_schema, generate_schema_constrained_prompt
2536
  import re
2537
 
2538
+ # Parse use case into vertical and function
2539
+ vertical, function = parse_use_case(self.demo_builder.use_case)
2540
+ config = get_use_case_config(vertical or "Generic", function or "Generic")
2541
 
2542
  # Build business context for population
2543
+ # Handle both new config structure and backward compatibility
2544
+ target_persona = config.get('target_persona', 'Business Leader')
2545
+ business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
2546
+ demo_objectives = config.get('demo_objectives', 'Show self-service analytics and business insights')
2547
+
2548
+ # For generic cases, use the use_case_name
2549
+ use_case_display = config.get('use_case_name', self.demo_builder.use_case)
2550
+
2551
  business_context = f"""
2552
  BUSINESS CONTEXT:
2553
+ - Use Case: {use_case_display}
2554
+ - Target Persona: {target_persona}
2555
+ - Business Problem: {business_problem}
2556
+ - Demo Objectives: {demo_objectives}
2557
 
2558
  MANDATORY CONNECTION CODE (MUST BE COMPLETE):
2559
  ```python
 
2715
  self.demo_builder.schema_generation_results
2716
  )
2717
 
2718
+ # DEBUG: Log what was passed
2719
+ ddl_passed = self.demo_builder.schema_generation_results
2720
+ log_progress(f"[DEBUG] DDL type passed to deployer: {type(ddl_passed)}")
2721
+ log_progress(f"[DEBUG] DDL is None: {ddl_passed is None}")
2722
+ if ddl_passed:
2723
+ log_progress(f"[DEBUG] DDL length: {len(ddl_passed)}")
2724
+ log_progress(f"[DEBUG] DDL first 100 chars: {ddl_passed[:100]}")
2725
+
2726
  if not success:
2727
  log_progress(f"[ERROR] DDL Deployment failed!")
2728
  raise Exception(f"Schema deployment failed: {deploy_message}")
 
2780
 
2781
  def run_population():
2782
  try:
2783
+ # Validate DDL before passing to legitdata
2784
+ ddl = self.demo_builder.schema_generation_results
2785
+ if not ddl or not isinstance(ddl, str):
2786
+ raise Exception(f"DDL is invalid (type: {type(ddl)}). Cannot populate data. Please regenerate DDL.")
2787
+
2788
+ # Check if DDL contains the word "None" which would indicate AI generated bad SQL
2789
+ if ddl == "None" or ddl.strip() == "None":
2790
+ raise Exception("DDL generation returned 'None'. Please regenerate DDL with a different prompt or model.")
2791
+
2792
  success, message, results = populate_demo_data(
2793
+ ddl_content=ddl,
2794
  company_url=self.demo_builder.company_url,
2795
  use_case=self.demo_builder.use_case,
2796
  schema_name=schema_name,
 
2948
  ts_secret = os.getenv('THOUGHTSPOT_SECRET_KEY')
2949
 
2950
  liveboard_method = self.settings.get('liveboard_method', 'HYBRID')
2951
+
2952
+ # Clean company name for display (strip .com, .org, etc)
2953
+ clean_company = company.split('.')[0].title() if '.' in company else company
2954
+ liveboard_name = self.settings.get('liveboard_name', '') or f"{clean_company} - {use_case}"
2955
 
2956
  # Get company data for liveboard
2957
  company_data = {
2958
+ 'name': clean_company,
2959
  'url': getattr(self.demo_builder, 'company_url', company),
2960
  'logo_url': getattr(self.demo_builder, 'logo_url', None),
2961
  'primary_color': getattr(self.demo_builder, 'primary_color', '#3498db'),
 
3317
  try:
3318
  from smart_data_adjuster import SmartDataAdjuster
3319
 
3320
+ # Pass the selected LLM model to the adjuster
3321
+ llm_model = self.settings.get('model', 'claude-sonnet-4')
3322
+ adjuster = SmartDataAdjuster(database, schema_name, liveboard_guid, llm_model=llm_model)
3323
  adjuster.connect()
3324
 
3325
  if adjuster.load_liveboard_context():
 
4331
 
4332
  app.launch(
4333
  server_name="0.0.0.0",
4334
+ server_port=7863, # Different port from main app (7860) and old chat (7861)
4335
  share=False,
4336
  inbrowser=True,
4337
  debug=True,
demo_personas.py CHANGED
@@ -5,6 +5,276 @@ All persona data and prompt templates for use case-driven demo preparation
5
 
6
  from schema_utils import extract_key_business_terms
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  # Use Case Persona Configurations
9
  USE_CASE_PERSONAS = {
10
  "Merchandising": {
@@ -613,9 +883,41 @@ def get_persona_config(use_case):
613
 
614
  def build_company_analysis_prompt(use_case, website_title, website_url, website_content, css_count, logo_candidates):
615
  """Build dynamic company analysis prompt based on use case"""
616
- config = get_persona_config(use_case)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
617
 
618
- system_prompt = COMPANY_ANALYSIS_TEMPLATE.format(use_case=use_case, **config)
619
 
620
  # Extract key business terms instead of raw content dump
621
  key_terms = extract_key_business_terms(website_content, max_chars=1000)
@@ -629,36 +931,87 @@ VISUAL ASSETS SUMMARY:
629
  CSS Resources: {css_count} stylesheets detected
630
  Logo Assets: {len(logo_candidates)} logo variations found
631
 
632
- Conduct analysis specifically for {use_case} use case targeting {config['target_persona']} who needs to solve: {config['business_problem']}
633
 
634
- Extract specific, quantifiable information wherever possible that relates to {config['key_metrics']} and {config['persona_focus']}."""
635
 
636
  return system_prompt, user_prompt
637
 
638
  def build_industry_research_prompt(use_case, company_analysis_results):
639
  """Build dynamic industry research prompt based on use case and company analysis"""
640
- config = get_persona_config(use_case)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
641
 
642
- # Format research focus areas as bulleted list
643
- research_focus_formatted = "\n".join([f"- {focus}" for focus in config['research_focus']])
 
 
 
 
 
 
 
 
 
 
 
644
 
645
- system_prompt = INDUSTRY_RESEARCH_TEMPLATE.format(
646
- use_case=use_case,
647
- research_focus_formatted=research_focus_formatted,
648
- **config
649
- )
650
 
651
- user_prompt = f"""Conduct comprehensive {use_case} research based on this company analysis:
652
 
653
  COMPANY ANALYSIS RESULTS:
654
  {company_analysis_results}
655
 
656
- Focus specifically on creating realistic demo scenarios that showcase how ThoughtSpot's {config['thoughtspot_solution']} solves {config['business_problem']} for {config['target_persona']}.
657
 
658
  Provide specific recommendations for:
659
  1. Database schemas and table structures
660
  2. Realistic data patterns and volumes
661
  3. Compelling outlier scenarios
662
- 4. Success metrics that prove ROI: {config['success_outcomes']}"""
663
 
664
  return system_prompt, user_prompt
 
5
 
6
  from schema_utils import extract_key_business_terms
7
 
8
+ # ============================================================================
9
+ # VERTICAL × FUNCTION MATRIX SYSTEM (Phase 1 - February 2026)
10
+ # ============================================================================
11
+ # New composable system replacing flat USE_CASE_PERSONAS
12
+ # Keep USE_CASE_PERSONAS below for backward compatibility during transition
13
+ # ============================================================================
14
+
15
+ # VERTICALS: Industry-specific context
16
+ VERTICALS = {
17
+ "Retail": {
18
+ "typical_entities": ["Store", "Product", "Category", "Region", "Customer"],
19
+ "industry_terms": ["SKU", "basket", "shrink", "markdown", "comp sales", "footfall"],
20
+ "data_patterns": ["seasonality", "holiday_spikes", "weather_impact", "back_to_school"],
21
+ },
22
+ "Banking": {
23
+ "typical_entities": ["Account", "Customer", "Branch", "Product", "Loan"],
24
+ "industry_terms": ["AUM", "NIM", "deposits", "charge-off", "delinquency", "APR"],
25
+ "data_patterns": ["month_end_spikes", "rate_sensitivity", "quarter_close"],
26
+ },
27
+ "Software": {
28
+ "typical_entities": ["Account", "User", "Subscription", "Feature", "License"],
29
+ "industry_terms": ["ARR", "MRR", "churn", "NRR", "seats", "expansion"],
30
+ "data_patterns": ["renewal_cycles", "usage_spikes", "trial_conversion"],
31
+ },
32
+ "Manufacturing": {
33
+ "typical_entities": ["Plant", "Line", "Product", "Supplier", "Shift"],
34
+ "industry_terms": ["OEE", "yield", "scrap", "downtime", "throughput", "WIP"],
35
+ "data_patterns": ["shift_patterns", "maintenance_cycles", "supply_disruptions"],
36
+ },
37
+ }
38
+
39
+ # FUNCTIONS: Department-specific KPIs, visualizations, and patterns
40
+ FUNCTIONS = {
41
+ "Sales": {
42
+ "kpis": ["Dollar Sales", "Unit Sales", "ASP"],
43
+ "kpi_definitions": {
44
+ "Dollar Sales": "Total revenue ($)",
45
+ "Unit Sales": "Total units sold",
46
+ "ASP": "Dollar Sales ÷ Unit Sales (Average Selling Price)",
47
+ },
48
+ "viz_types": ["KPI_sparkline", "trend", "by_region", "by_product", "vs_target"],
49
+ "outlier_categories": ["surge", "decline", "pricing_anomaly", "regional_variance"],
50
+ "spotter_templates": [
51
+ "Which {entity} had the highest {kpi} last {period}?",
52
+ "Show me {kpi} trend by {dimension}",
53
+ "Why did {kpi} drop last month?",
54
+ "Compare {kpi} across {dimension}",
55
+ ],
56
+ },
57
+ "Supply Chain": {
58
+ "kpis": ["Avg Inventory", "OTIF", "Days on Hand", "Stockout Rate"],
59
+ "kpi_definitions": {
60
+ "Avg Inventory": "(Beginning Inventory + Ending Inventory) ÷ 2",
61
+ "OTIF": "On-Time In-Full delivery rate",
62
+ "Days on Hand": "Inventory ÷ Daily Usage",
63
+ "Stockout Rate": "% of SKUs with zero inventory",
64
+ },
65
+ "viz_types": ["inventory_levels", "stockout_risk", "supplier_perf", "trend"],
66
+ "outlier_categories": ["stockout", "overstock", "lead_time_spike", "supplier_issue"],
67
+ "spotter_templates": [
68
+ "Which {entity} is at risk of stockout?",
69
+ "Show inventory levels by {dimension}",
70
+ "Which suppliers have the longest lead times?",
71
+ ],
72
+ },
73
+ "Marketing": {
74
+ "kpis": ["CTR", "Bounce Rate", "Fill Rate", "Approval Rate"],
75
+ "kpi_definitions": {
76
+ "CTR": "Clicks ÷ Impressions (Click-Through Rate)",
77
+ "Bounce Rate": "% leaving landing page without action",
78
+ "Fill Rate": "% completing application/form",
79
+ "Approval Rate": "% of applications approved",
80
+ },
81
+ "viz_types": ["funnel", "channel_comparison", "trend", "by_campaign"],
82
+ "outlier_categories": ["conversion_drop", "channel_spike", "cost_anomaly"],
83
+ "spotter_templates": [
84
+ "What is our conversion rate by {channel}?",
85
+ "Show me the funnel for {campaign}",
86
+ "Which channel has the highest CTR?",
87
+ ],
88
+ },
89
+ }
90
+
91
+ # MATRIX_OVERRIDES: Specific Vertical × Function combinations
92
+ # Only specify what differs from the base vertical + function merge
93
+ MATRIX_OVERRIDES = {
94
+ ("Retail", "Sales"): {
95
+ "add_kpis": ["Basket Size", "Items per Transaction"],
96
+ "add_kpi_definitions": {
97
+ "Basket Size": "Dollar Sales ÷ Transactions",
98
+ "Items per Transaction": "Unit Sales ÷ Transactions",
99
+ },
100
+ "add_viz": ["by_store", "by_category"],
101
+ "target_persona": "VP Merchandising, Retail Sales Leader",
102
+ "business_problem": "$1T lost annually to stockouts and overstock",
103
+ },
104
+ ("Banking", "Marketing"): {
105
+ "add_kpis": ["Application Fill Rate", "Cost per Acquisition"],
106
+ "add_kpi_definitions": {
107
+ "Application Fill Rate": "% completing loan/account application",
108
+ "Cost per Acquisition": "Marketing spend ÷ New customers acquired",
109
+ },
110
+ "rename_kpis": {"CTR": "Click-through Rate"},
111
+ "target_persona": "CMO, VP Digital Marketing",
112
+ "business_problem": "High cost per acquisition, low funnel conversion",
113
+ },
114
+ ("Software", "Sales"): {
115
+ "add_kpis": ["ARR", "Net Revenue Retention", "Pipeline Coverage"],
116
+ "add_kpi_definitions": {
117
+ "ARR": "Annual Recurring Revenue",
118
+ "Net Revenue Retention": "(Starting ARR + Expansion - Churn) ÷ Starting ARR",
119
+ "Pipeline Coverage": "Pipeline value ÷ Quota",
120
+ },
121
+ "add_viz": ["by_segment", "by_rep"],
122
+ "target_persona": "CRO, VP Sales",
123
+ },
124
+ }
125
+
126
+
127
+ def parse_use_case(user_input: str) -> tuple[str | None, str | None]:
128
+ """
129
+ Parse user input string like "Retail Sales" into (vertical, function) tuple.
130
+
131
+ Checks for known patterns by testing against VERTICALS.keys() and FUNCTIONS.keys().
132
+ Handles case-insensitive matching.
133
+
134
+ Args:
135
+ user_input: User input string like "Retail Sales", "Banking Marketing", etc.
136
+
137
+ Returns:
138
+ Tuple of (vertical, function) like ("Retail", "Sales")
139
+ Returns (None, None) for unclear inputs
140
+ """
141
+ if not user_input or not user_input.strip():
142
+ return (None, None)
143
+
144
+ user_input_lower = user_input.strip().lower()
145
+
146
+ # Try to find both vertical and function in the input
147
+ found_vertical = None
148
+ found_function = None
149
+
150
+ # Check for known verticals (case-insensitive)
151
+ for vertical in VERTICALS.keys():
152
+ if vertical.lower() in user_input_lower:
153
+ found_vertical = vertical
154
+ break
155
+
156
+ # Check for known functions (case-insensitive)
157
+ for function in FUNCTIONS.keys():
158
+ if function.lower() in user_input_lower:
159
+ found_function = function
160
+ break
161
+
162
+ # If we found both, return them
163
+ if found_vertical and found_function:
164
+ return (found_vertical, found_function)
165
+
166
+ # If we found only one, return it with None for the other
167
+ if found_vertical:
168
+ return (found_vertical, None)
169
+ if found_function:
170
+ return (None, found_function)
171
+
172
+ # If we found neither, return (None, None)
173
+ return (None, None)
174
+
175
+
176
+ def get_use_case_config(vertical: str, function: str) -> dict:
177
+ """
178
+ Merge vertical + function + overrides into final configuration.
179
+ Handles known combinations, partial matches, and fully generic cases.
180
+
181
+ Args:
182
+ vertical: Industry vertical (e.g., "Retail")
183
+ function: Functional department (e.g., "Sales")
184
+
185
+ Returns:
186
+ Complete configuration dict with all fields merged
187
+ """
188
+ v = VERTICALS.get(vertical, {})
189
+ f = FUNCTIONS.get(function, {})
190
+ override = MATRIX_OVERRIDES.get((vertical, function), {})
191
+
192
+ # Determine if this is a known, partial, or generic case
193
+ is_known_vertical = vertical in VERTICALS
194
+ is_known_function = function in FUNCTIONS
195
+
196
+ # Build base config
197
+ config = {
198
+ # Metadata
199
+ "vertical": vertical,
200
+ "function": function,
201
+ "use_case_name": f"{vertical} {function}",
202
+
203
+ # From vertical
204
+ "entities": v.get("typical_entities", []).copy(),
205
+ "industry_terms": v.get("industry_terms", []).copy(),
206
+ "data_patterns": v.get("data_patterns", []).copy(),
207
+
208
+ # From function (copy to allow modification)
209
+ "kpis": f.get("kpis", []).copy(),
210
+ "kpi_definitions": f.get("kpi_definitions", {}).copy(),
211
+ "viz_types": f.get("viz_types", []).copy(),
212
+ "outlier_categories": f.get("outlier_categories", []).copy(),
213
+ "spotter_templates": f.get("spotter_templates", []).copy(),
214
+
215
+ # Flags
216
+ "is_generic": False,
217
+ "ai_should_determine": [],
218
+ }
219
+
220
+ # Apply overrides
221
+ if override.get("add_kpis"):
222
+ config["kpis"].extend(override["add_kpis"])
223
+ if override.get("add_kpi_definitions"):
224
+ config["kpi_definitions"].update(override["add_kpi_definitions"])
225
+ if override.get("add_viz"):
226
+ config["viz_types"].extend(override["add_viz"])
227
+ if override.get("rename_kpis"):
228
+ for old, new in override["rename_kpis"].items():
229
+ if old in config["kpis"]:
230
+ idx = config["kpis"].index(old)
231
+ config["kpis"][idx] = new
232
+ if override.get("target_persona"):
233
+ config["target_persona"] = override["target_persona"]
234
+ if override.get("business_problem"):
235
+ config["business_problem"] = override["business_problem"]
236
+
237
+ # Handle generic cases
238
+ if not is_known_vertical and not is_known_function:
239
+ # Fully generic
240
+ config["is_generic"] = True
241
+ config["ai_should_determine"] = ["entities", "industry_terms", "kpis", "viz_types", "outliers"]
242
+ config["prompt_user_for"] = ["key_metrics", "target_persona", "business_questions"]
243
+ elif not is_known_vertical:
244
+ # Known function, unknown vertical
245
+ config["is_generic"] = True
246
+ config["ai_should_determine"] = ["entities", "industry_terms", "data_patterns"]
247
+ elif not is_known_function:
248
+ # Known vertical, unknown function
249
+ config["is_generic"] = True
250
+ config["ai_should_determine"] = ["kpis", "viz_types", "outliers"]
251
+
252
+ # Add legacy fields for backward compatibility with existing prompts
253
+ if "demo_objectives" not in config:
254
+ config["demo_objectives"] = f"Demonstrate {function} analytics capabilities with {vertical}-specific insights"
255
+ if "key_metrics" not in config:
256
+ config["key_metrics"] = ", ".join(config["kpis"][:5]) if config["kpis"] else "revenue, growth, efficiency"
257
+ if "research_focus" not in config:
258
+ config["research_focus"] = config["industry_terms"][:5] if config["industry_terms"] else []
259
+ if "thoughtspot_solution" not in config:
260
+ config["thoughtspot_solution"] = f"Self-service analytics for {vertical} {function} teams"
261
+ if "persona_focus" not in config:
262
+ config["persona_focus"] = f"{function} optimization and decision-making"
263
+ if "cost_impact" not in config:
264
+ config["cost_impact"] = "Significant business impact through data-driven decisions"
265
+ if "success_outcomes" not in config:
266
+ config["success_outcomes"] = f"Improved {function.lower()} performance and faster insights"
267
+
268
+ return config
269
+
270
+
271
+ # ============================================================================
272
+ # LEGACY USE CASE PERSONAS (Backward Compatibility)
273
+ # ============================================================================
274
+ # Keep for backward compatibility during transition
275
+ # New code should use get_use_case_config() instead
276
+ # ============================================================================
277
+
278
  # Use Case Persona Configurations
279
  USE_CASE_PERSONAS = {
280
  "Merchandising": {
 
883
 
884
  def build_company_analysis_prompt(use_case, website_title, website_url, website_content, css_count, logo_candidates):
885
  """Build dynamic company analysis prompt based on use case"""
886
+ # Parse use case into vertical and function
887
+ vertical, function = parse_use_case(use_case)
888
+
889
+ # Get config from new system, fallback to legacy if needed
890
+ if vertical or function:
891
+ config = get_use_case_config(vertical or "Generic", function or "Generic")
892
+ # Map new config fields to legacy template fields
893
+ use_case_display = config.get('use_case_name', use_case)
894
+ target_persona = config.get('target_persona', 'Business Leader')
895
+ business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
896
+ # Convert KPIs list to key_metrics string
897
+ kpis = config.get('kpis', [])
898
+ key_metrics = ', '.join(kpis) if kpis else 'key operational metrics'
899
+ # Use function as persona_focus, or derive from vertical
900
+ persona_focus = function or vertical or 'operational efficiency, data-driven decisions'
901
+ else:
902
+ # Fallback to legacy system for unrecognized use cases
903
+ config = get_persona_config(use_case)
904
+ use_case_display = use_case
905
+ target_persona = config.get('target_persona', 'Business Leader')
906
+ business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
907
+ key_metrics = config.get('key_metrics', 'key operational metrics')
908
+ persona_focus = config.get('persona_focus', 'operational efficiency, data-driven decisions')
909
+
910
+ # Build template dict with mapped fields
911
+ template_dict = {
912
+ 'use_case': use_case_display,
913
+ 'target_persona': target_persona,
914
+ 'business_problem': business_problem,
915
+ 'key_metrics': key_metrics,
916
+ 'persona_focus': persona_focus,
917
+ 'cost_impact': config.get('cost_impact', 'Lost opportunities from data bottlenecks'),
918
+ }
919
 
920
+ system_prompt = COMPANY_ANALYSIS_TEMPLATE.format(**template_dict)
921
 
922
  # Extract key business terms instead of raw content dump
923
  key_terms = extract_key_business_terms(website_content, max_chars=1000)
 
931
  CSS Resources: {css_count} stylesheets detected
932
  Logo Assets: {len(logo_candidates)} logo variations found
933
 
934
+ Conduct analysis specifically for {use_case_display} use case targeting {target_persona} who needs to solve: {business_problem}
935
 
936
+ Extract specific, quantifiable information wherever possible that relates to {key_metrics} and {persona_focus}."""
937
 
938
  return system_prompt, user_prompt
939
 
940
  def build_industry_research_prompt(use_case, company_analysis_results):
941
  """Build dynamic industry research prompt based on use case and company analysis"""
942
+ # Parse use case into vertical and function
943
+ vertical, function = parse_use_case(use_case)
944
+
945
+ # Get config from new system, fallback to legacy if needed
946
+ if vertical or function:
947
+ config = get_use_case_config(vertical or "Generic", function or "Generic")
948
+ # Map new config fields to legacy template fields
949
+ use_case_display = config.get('use_case_name', use_case)
950
+ target_persona = config.get('target_persona', 'Business Leader')
951
+ business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
952
+ # Convert KPIs list to key_metrics string
953
+ kpis = config.get('kpis', [])
954
+ key_metrics = ', '.join(kpis) if kpis else 'key operational metrics'
955
+ # Use function as persona_focus, or derive from vertical
956
+ persona_focus = function or vertical or 'operational efficiency, data-driven decisions'
957
+ # Build research focus from entities, industry_terms, and data_patterns
958
+ entities = config.get('entities', [])
959
+ industry_terms = config.get('industry_terms', [])
960
+ data_patterns = config.get('data_patterns', [])
961
+ research_focus_list = []
962
+ if entities:
963
+ research_focus_list.append(f"Core entities: {', '.join(entities[:5])}")
964
+ if industry_terms:
965
+ research_focus_list.append(f"Industry terminology: {', '.join(industry_terms[:5])}")
966
+ if data_patterns:
967
+ research_focus_list.append(f"Data patterns: {', '.join(data_patterns[:3])}")
968
+ if not research_focus_list:
969
+ research_focus_list = ["core business processes", "key operational metrics", "competitive positioning"]
970
+ research_focus_formatted = "\n".join([f"- {focus}" for focus in research_focus_list])
971
+ # Default values for fields not in new system
972
+ thoughtspot_solution = f"AI-powered analytics for {use_case_display}"
973
+ success_outcomes = "Faster insights, improved decision making, operational efficiency gains"
974
+ demo_objectives = f"Show self-service analytics for {use_case_display}"
975
+ else:
976
+ # Fallback to legacy system for unrecognized use cases
977
+ config = get_persona_config(use_case)
978
+ use_case_display = use_case
979
+ target_persona = config.get('target_persona', 'Business Leader')
980
+ business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
981
+ key_metrics = config.get('key_metrics', 'key operational metrics')
982
+ persona_focus = config.get('persona_focus', 'operational efficiency, data-driven decisions')
983
+ research_focus_formatted = "\n".join([f"- {focus}" for focus in config.get('research_focus', [])])
984
+ thoughtspot_solution = config.get('thoughtspot_solution', 'Self-service analytics platform')
985
+ success_outcomes = config.get('success_outcomes', 'Faster insights, improved decision making')
986
+ demo_objectives = config.get('demo_objectives', 'Show self-service analytics')
987
 
988
+ # Build template dict with mapped fields
989
+ template_dict = {
990
+ 'use_case': use_case_display,
991
+ 'target_persona': target_persona,
992
+ 'business_problem': business_problem,
993
+ 'key_metrics': key_metrics,
994
+ 'persona_focus': persona_focus,
995
+ 'research_focus_formatted': research_focus_formatted,
996
+ 'thoughtspot_solution': thoughtspot_solution,
997
+ 'success_outcomes': success_outcomes,
998
+ 'demo_objectives': demo_objectives,
999
+ 'cost_impact': config.get('cost_impact', 'Lost opportunities from data bottlenecks'),
1000
+ }
1001
 
1002
+ system_prompt = INDUSTRY_RESEARCH_TEMPLATE.format(**template_dict)
 
 
 
 
1003
 
1004
+ user_prompt = f"""Conduct comprehensive {use_case_display} research based on this company analysis:
1005
 
1006
  COMPANY ANALYSIS RESULTS:
1007
  {company_analysis_results}
1008
 
1009
+ Focus specifically on creating realistic demo scenarios that showcase how ThoughtSpot's {thoughtspot_solution} solves {business_problem} for {target_persona}.
1010
 
1011
  Provide specific recommendations for:
1012
  1. Database schemas and table structures
1013
  2. Realistic data patterns and volumes
1014
  3. Compelling outlier scenarios
1015
+ 4. Success metrics that prove ROI: {success_outcomes}"""
1016
 
1017
  return system_prompt, user_prompt
demo_prep.py CHANGED
@@ -2548,8 +2548,7 @@ Schema Validation: Will be checked next...
2548
  value="*Database schema will appear here after Create stage*",
2549
  language="sql",
2550
  interactive=False,
2551
- lines=20,
2552
- max_lines=30
2553
  )
2554
  with gr.Column(scale=1):
2555
  edit_ddl_btn = gr.Button("🔍 DDL", elem_classes=["edit-btn"])
@@ -2579,8 +2578,7 @@ Schema Validation: Will be checked next...
2579
  value="Generated Python code will appear here after population step",
2580
  language="python",
2581
  interactive=False,
2582
- lines=10,
2583
- max_lines=15,
2584
  )
2585
 
2586
  with gr.Row():
 
2548
  value="*Database schema will appear here after Create stage*",
2549
  language="sql",
2550
  interactive=False,
2551
+ lines=20
 
2552
  )
2553
  with gr.Column(scale=1):
2554
  edit_ddl_btn = gr.Button("🔍 DDL", elem_classes=["edit-btn"])
 
2578
  value="Generated Python code will appear here after population step",
2579
  language="python",
2580
  interactive=False,
2581
+ lines=10
 
2582
  )
2583
 
2584
  with gr.Row():
liveboard_creator.py CHANGED
@@ -27,6 +27,61 @@ _direct_api_token = None
27
  _direct_api_session = None
28
 
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  def _get_direct_api_session():
31
  """
32
  Get or create an authenticated session for direct ThoughtSpot API calls.
@@ -2815,49 +2870,21 @@ def create_liveboard_from_model_mcp(
2815
  print(f"[MCP] Starting async MCP liveboard creation...")
2816
  try:
2817
  print(f"[MCP] Importing MCP modules...")
2818
- from mcp import ClientSession
2819
- from mcp.client.streamable_http import streamablehttp_client
2820
  print(f"[MCP] MCP modules imported successfully")
2821
 
2822
- # ALWAYS use bearer auth with trusted auth token
2823
- # This ensures MCP uses the same org as our table/model deployment
2824
- print(f"[MCP] Using bearer auth (same org as trusted auth)")
2825
-
2826
- # Get auth token from our direct API session (trusted auth)
2827
- session_obj = _get_direct_api_session()
2828
- if not session_obj or not _direct_api_token:
2829
- print(f"[MCP ERROR] Failed to get auth token for bearer auth")
2830
- return {
2831
- 'success': False,
2832
- 'error': 'Failed to authenticate for MCP bearer auth'
2833
- }
2834
-
2835
- ts_host = os.getenv('THOUGHTSPOT_URL', '').rstrip('/').replace('https://', '').replace('http://', '')
2836
- bearer_token = _direct_api_token
2837
-
2838
- # Bearer auth format: "Bearer {token}@{host}"
2839
- # This is ThoughtSpot's MCP server format for bearer endpoint
2840
- auth_header = f"Bearer {bearer_token}@{ts_host}"
2841
- # Use /bearer/mcp endpoint (Streamable HTTP transport, not SSE)
2842
- mcp_endpoint = "https://agent.thoughtspot.app/bearer/mcp"
2843
-
2844
- print(f"[MCP] Bearer endpoint: {mcp_endpoint}")
2845
- print(f"[MCP] Host: {ts_host}")
2846
- print(f"[MCP] Token: {bearer_token[:20]}...")
2847
-
2848
- # Use Streamable HTTP client with bearer auth headers
2849
- # This bypasses OAuth and uses our trusted auth token directly
2850
- headers = {"Authorization": auth_header}
2851
 
2852
- print(f"[MCP] Starting Streamable HTTP client with bearer auth...")
2853
- async with streamablehttp_client(mcp_endpoint, headers=headers) as (read, write, _get_session_id):
2854
- print(f"DEBUG: Streamable HTTP client context established")
2855
- print(f"DEBUG: Creating ClientSession...")
2856
  async with ClientSession(read, write) as session:
2857
- print(f"DEBUG: ClientSession context entered")
2858
- print(f"DEBUG: Calling session.initialize()...")
2859
  await session.initialize()
2860
- print(f"DEBUG: session.initialize() completed")
2861
 
2862
  # Verify connection with ping
2863
  print(f"Pinging MCP server...")
@@ -2955,6 +2982,9 @@ def create_liveboard_from_model_mcp(
2955
  # Use direct ThoughtSpot API (bypasses MCP proxy issues)
2956
  answer_data = _get_answer_direct(question_text, model_id)
2957
  if answer_data:
 
 
 
2958
  print(f" 🔍 DEBUG: Direct API answer keys: {list(answer_data.keys())}")
2959
  answers.append(answer_data)
2960
  print(f" ✅ Answer retrieved (direct API)", flush=True)
@@ -2966,6 +2996,9 @@ def create_liveboard_from_model_mcp(
2966
  "datasourceId": model_id
2967
  })
2968
  answer_data = json.loads(answer_result.content[0].text)
 
 
 
2969
  answers.append(answer_data)
2970
  print(f" ✅ Answer retrieved (MCP fallback)", flush=True)
2971
  else:
@@ -2982,6 +3015,9 @@ def create_liveboard_from_model_mcp(
2982
 
2983
  # Parse answer data
2984
  answer_data = json.loads(answer_result.content[0].text)
 
 
 
2985
  print(f" 🔍 DEBUG: Answer keys: {list(answer_data.keys())}")
2986
  answers.append(answer_data)
2987
  print(f" ✅ Answer retrieved", flush=True)
@@ -3243,6 +3279,57 @@ def create_liveboard_from_model_mcp(
3243
  })
3244
  print(f" ✓ Added dark theme style to Viz_1")
3245
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3246
  # Re-import fixed TML using authenticated session
3247
  import_response = ts_client.session.post(
3248
  f"{ts_base_url}/api/rest/2.0/metadata/tml/import",
 
27
  _direct_api_session = None
28
 
29
 
30
+ def _clean_viz_title(title: str) -> str:
31
+ """
32
+ Clean up visualization titles to be more readable.
33
+
34
+ Examples:
35
+ 'shipping_cost by month last 18 months' → 'Shipping Cost by Month'
36
+ 'Top 15 product_name by quantity_shipped' → 'Top 15 Products by Quantity Shipped'
37
+ 'total_revenue weekly' → 'Total Revenue Weekly'
38
+ """
39
+ if not title:
40
+ return title
41
+
42
+ # Remove date filter suffixes
43
+ date_filters = [
44
+ ' last 18 months', ' last 12 months', ' last 2 years', ' last year',
45
+ ' last 6 months', ' last 3 months', ' last 30 days', ' last 90 days'
46
+ ]
47
+ for filter_str in date_filters:
48
+ if title.lower().endswith(filter_str):
49
+ title = title[:-len(filter_str)]
50
+
51
+ # Replace underscores with spaces
52
+ title = title.replace('_', ' ')
53
+
54
+ # Clean up common column name patterns
55
+ replacements = {
56
+ 'product name': 'Products',
57
+ 'supplier name': 'Suppliers',
58
+ 'warehouse name': 'Warehouses',
59
+ 'customer name': 'Customers',
60
+ 'brand name': 'Brands',
61
+ 'store name': 'Stores',
62
+ 'category name': 'Categories',
63
+ 'region name': 'Regions',
64
+ }
65
+ title_lower = title.lower()
66
+ for old, new in replacements.items():
67
+ if old in title_lower:
68
+ # Case-insensitive replace
69
+ import re
70
+ title = re.sub(re.escape(old), new, title, flags=re.IGNORECASE)
71
+
72
+ # Title case the result, but preserve words like 'by', 'vs', 'and'
73
+ words = title.split()
74
+ result = []
75
+ small_words = {'by', 'vs', 'and', 'or', 'the', 'a', 'an', 'of', 'in', 'on', 'to'}
76
+ for i, word in enumerate(words):
77
+ if i == 0 or word.lower() not in small_words:
78
+ result.append(word.capitalize())
79
+ else:
80
+ result.append(word.lower())
81
+
82
+ return ' '.join(result)
83
+
84
+
85
  def _get_direct_api_session():
86
  """
87
  Get or create an authenticated session for direct ThoughtSpot API calls.
 
2870
  print(f"[MCP] Starting async MCP liveboard creation...")
2871
  try:
2872
  print(f"[MCP] Importing MCP modules...")
2873
+ from mcp import ClientSession, StdioServerParameters
2874
+ from mcp.client.stdio import stdio_client
2875
  print(f"[MCP] MCP modules imported successfully")
2876
 
2877
+ # Use stdio client with npx mcp-remote proxy
2878
+ # This connects to ThoughtSpot's public MCP endpoint via npx proxy
2879
+ print(f"[MCP] Initializing stdio connection via npx mcp-remote...")
2880
+ server_params = StdioServerParameters(
2881
+ command="npx",
2882
+ args=["mcp-remote@latest", "https://agent.thoughtspot.app/mcp"]
2883
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2884
 
2885
+ async with stdio_client(server_params) as (read, write):
 
 
 
2886
  async with ClientSession(read, write) as session:
 
 
2887
  await session.initialize()
 
2888
 
2889
  # Verify connection with ping
2890
  print(f"Pinging MCP server...")
 
2982
  # Use direct ThoughtSpot API (bypasses MCP proxy issues)
2983
  answer_data = _get_answer_direct(question_text, model_id)
2984
  if answer_data:
2985
+ # Clean up the viz title
2986
+ if 'question' in answer_data:
2987
+ answer_data['question'] = _clean_viz_title(answer_data['question'])
2988
  print(f" 🔍 DEBUG: Direct API answer keys: {list(answer_data.keys())}")
2989
  answers.append(answer_data)
2990
  print(f" ✅ Answer retrieved (direct API)", flush=True)
 
2996
  "datasourceId": model_id
2997
  })
2998
  answer_data = json.loads(answer_result.content[0].text)
2999
+ # Clean up the viz title
3000
+ if 'question' in answer_data:
3001
+ answer_data['question'] = _clean_viz_title(answer_data['question'])
3002
  answers.append(answer_data)
3003
  print(f" ✅ Answer retrieved (MCP fallback)", flush=True)
3004
  else:
 
3015
 
3016
  # Parse answer data
3017
  answer_data = json.loads(answer_result.content[0].text)
3018
+ # Clean up the viz title
3019
+ if 'question' in answer_data:
3020
+ answer_data['question'] = _clean_viz_title(answer_data['question'])
3021
  print(f" 🔍 DEBUG: Answer keys: {list(answer_data.keys())}")
3022
  answers.append(answer_data)
3023
  print(f" ✅ Answer retrieved", flush=True)
 
3279
  })
3280
  print(f" ✓ Added dark theme style to Viz_1")
3281
 
3282
+ # Convert time-series visualizations to KPIs with sparklines
3283
+ print(f" 🔄 Converting time-series charts to KPIs...")
3284
+ kpi_count = 0
3285
+ for viz in visualizations:
3286
+ if viz.get('id') == 'Viz_1':
3287
+ continue # Skip note tile
3288
+
3289
+ answer = viz.get('answer', {})
3290
+ viz_name = answer.get('name', '').lower()
3291
+ search_query = answer.get('search_query', '').lower()
3292
+
3293
+ # Check if this is a time-series viz (weekly, monthly, daily patterns)
3294
+ time_patterns = ['weekly', 'monthly', 'daily', 'quarterly', 'yearly', '.week', '.month', '.day', '.quarter', '.year']
3295
+ is_time_series = any(p in viz_name or p in search_query for p in time_patterns)
3296
+
3297
+ if is_time_series and 'chart' in answer:
3298
+ # Convert to KPI
3299
+ answer['chart']['type'] = 'KPI'
3300
+
3301
+ # Add KPI-specific settings for sparkline and comparison
3302
+ kpi_settings = {
3303
+ "showLabel": True,
3304
+ "showComparison": True,
3305
+ "showSparkline": True,
3306
+ "showAnomalies": False,
3307
+ "showBounds": False,
3308
+ "customCompare": "PREV_AVAILABLE",
3309
+ "showOnlyLatestAnomaly": False
3310
+ }
3311
+
3312
+ # Update client_state_v2 with KPI settings
3313
+ import json as json_module
3314
+ client_state = answer['chart'].get('client_state_v2', '{}')
3315
+ try:
3316
+ cs = json_module.loads(client_state) if client_state else {}
3317
+ if 'chartProperties' not in cs:
3318
+ cs['chartProperties'] = {}
3319
+ if 'chartSpecific' not in cs['chartProperties']:
3320
+ cs['chartProperties']['chartSpecific'] = {}
3321
+ cs['chartProperties']['chartSpecific']['customProps'] = json_module.dumps(kpi_settings)
3322
+ cs['chartProperties']['chartSpecific']['dataFieldArea'] = 'column'
3323
+ answer['chart']['client_state_v2'] = json_module.dumps(cs)
3324
+ except:
3325
+ pass # Keep existing if parsing fails
3326
+
3327
+ kpi_count += 1
3328
+ print(f" ✓ Converted '{answer.get('name', '?')}' to KPI")
3329
+
3330
+ if kpi_count > 0:
3331
+ print(f" ✅ Converted {kpi_count} visualizations to KPIs with sparklines")
3332
+
3333
  # Re-import fixed TML using authenticated session
3334
  import_response = ts_client.session.post(
3335
  f"{ts_base_url}/api/rest/2.0/metadata/tml/import",
outlier_system.py CHANGED
@@ -25,13 +25,205 @@ Usage:
25
  import re
26
  import os
27
  from typing import Dict, List, Optional, Tuple
28
- from dataclasses import dataclass
29
  from datetime import datetime
30
 
31
 
 
 
 
 
32
  @dataclass
33
  class OutlierPattern:
34
- """Represents a data pattern to inject."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  title: str
36
  description: str
37
  sql_update: str
@@ -115,7 +307,7 @@ class OutlierGenerator:
115
  target_table, target_column, conditions, pattern_description
116
  )
117
 
118
- return OutlierPattern(
119
  title=parsed.get('title', pattern_description[:50]),
120
  description=pattern_description,
121
  sql_update=sql,
@@ -422,7 +614,7 @@ WHERE product_id IN (
422
 
423
  def apply_outliers(
424
  snowflake_conn,
425
- outliers: List[OutlierPattern],
426
  schema_name: str,
427
  dry_run: bool = False
428
  ) -> List[Dict]:
@@ -483,7 +675,7 @@ def apply_outliers(
483
 
484
 
485
  def generate_demo_pack(
486
- outliers: List[OutlierPattern],
487
  company_name: str,
488
  use_case: str
489
  ) -> str:
 
25
  import re
26
  import os
27
  from typing import Dict, List, Optional, Tuple
28
+ from dataclasses import dataclass, field
29
  from datetime import datetime
30
 
31
 
32
+ # ============================================================================
33
+ # Phase 1: New Structured Outlier System (February 2026 Sprint)
34
+ # ============================================================================
35
+
36
  @dataclass
37
  class OutlierPattern:
38
+ """
39
+ Defines a single outlier pattern that serves three purposes:
40
+ 1. Liveboard visualizations
41
+ 2. Spotter questions
42
+ 3. Demo talking points
43
+ """
44
+ # Identity
45
+ name: str # "ASP Decline"
46
+ category: str # "pricing", "volume", "inventory"
47
+
48
+ # For LIVEBOARD (visualization)
49
+ viz_type: str # "KPI", "COLUMN", "LINE"
50
+ viz_question: str # "ASP weekly"
51
+ viz_talking_point: str # "ASP dropped 12% — excessive discounting"
52
+
53
+ # For SPOTTER (ad-hoc questions)
54
+ spotter_questions: List[str] = field(default_factory=list)
55
+ spotter_followups: List[str] = field(default_factory=list)
56
+
57
+ # For DATA INJECTION (SQL generation)
58
+ sql_template: str = "" # "UPDATE {fact_table} SET {column} = ..."
59
+ affected_columns: List[str] = field(default_factory=list)
60
+ magnitude: str = "" # "15% below normal"
61
+ target_filter: str = "" # "WHERE REGION = 'West'"
62
+
63
+ # For DEMO NOTES
64
+ demo_setup: str = "" # "Start by showing overall sales are UP"
65
+ demo_payoff: str = "" # "Then reveal ASP is DOWN — 'at what cost?'"
66
+
67
+
68
+ @dataclass
69
+ class OutlierConfig:
70
+ """
71
+ Configuration for outliers per use case.
72
+ Combines required patterns, optional patterns, and AI generation guidance.
73
+ """
74
+ required: List[OutlierPattern] = field(default_factory=list) # Always include
75
+ optional: List[OutlierPattern] = field(default_factory=list) # AI picks 1-2
76
+ allow_ai_generated: bool = True # AI can create 1 custom
77
+ ai_guidance: str = "" # Hint for AI generation
78
+
79
+
80
+ OUTLIER_CONFIGS = {
81
+ ("Retail", "Sales"): OutlierConfig(
82
+ required=[
83
+ OutlierPattern(
84
+ name="ASP Decline",
85
+ category="pricing",
86
+ viz_type="KPI",
87
+ viz_question="ASP weekly",
88
+ viz_talking_point="ASP dropped 12% even though revenue is up — we're discounting too heavily",
89
+ spotter_questions=[
90
+ "Why did ASP drop last month?",
91
+ "Which products have the biggest discount?",
92
+ "Show me ASP by region",
93
+ ],
94
+ spotter_followups=[
95
+ "Compare to same period last year",
96
+ "Which stores are discounting most?",
97
+ ],
98
+ sql_template="UPDATE {fact_table} SET UNIT_PRICE = UNIT_PRICE * 0.85 WHERE REGION = 'West' AND {date_column} > '{recent_date}'",
99
+ affected_columns=["UNIT_PRICE", "DISCOUNT_PCT"],
100
+ magnitude="15% below normal",
101
+ target_filter="WHERE REGION = 'West'",
102
+ demo_setup="Start by showing overall sales are UP — everything looks good",
103
+ demo_payoff="Then reveal ASP is DOWN — 'but at what cost?' moment",
104
+ ),
105
+ OutlierPattern(
106
+ name="Regional Variance",
107
+ category="geographic",
108
+ viz_type="COLUMN",
109
+ viz_question="Dollar Sales by Region",
110
+ viz_talking_point="West region outperforming by 40% — what are they doing differently?",
111
+ spotter_questions=[
112
+ "Which region has the highest sales?",
113
+ "Compare West to East performance",
114
+ ],
115
+ spotter_followups=[
116
+ "What products are driving West?",
117
+ "Show me the trend for West region",
118
+ ],
119
+ sql_template="UPDATE {fact_table} SET QUANTITY = QUANTITY * 1.4 WHERE REGION = 'West'",
120
+ affected_columns=["QUANTITY", "REVENUE"],
121
+ magnitude="40% above other regions",
122
+ target_filter="WHERE REGION = 'West'",
123
+ demo_setup="Show overall sales by region",
124
+ demo_payoff="West is crushing it — drill in to find out why",
125
+ ),
126
+ ],
127
+ optional=[
128
+ OutlierPattern(
129
+ name="Seasonal Spike",
130
+ category="temporal",
131
+ viz_type="LINE",
132
+ viz_question="Dollar Sales trend by month",
133
+ viz_talking_point="Holiday surge 3x normal — were we prepared?",
134
+ spotter_questions=["Show me sales trend for Q4", "When was our peak sales day?"],
135
+ spotter_followups=[],
136
+ sql_template="UPDATE {fact_table} SET QUANTITY = QUANTITY * 3 WHERE MONTH IN (11, 12)",
137
+ affected_columns=["QUANTITY", "REVENUE"],
138
+ magnitude="3x normal",
139
+ target_filter="WHERE MONTH IN (11, 12)",
140
+ demo_setup="",
141
+ demo_payoff="",
142
+ ),
143
+ OutlierPattern(
144
+ name="Category Surge",
145
+ category="product",
146
+ viz_type="COLUMN",
147
+ viz_question="Dollar Sales by Category",
148
+ viz_talking_point="Electronics up 60% YoY while Apparel flat",
149
+ spotter_questions=["Which category grew fastest?", "Compare Electronics to Apparel"],
150
+ spotter_followups=[],
151
+ sql_template="",
152
+ affected_columns=[],
153
+ magnitude="60% YoY",
154
+ target_filter="",
155
+ demo_setup="",
156
+ demo_payoff="",
157
+ ),
158
+ ],
159
+ allow_ai_generated=True,
160
+ ai_guidance="If company has sustainability initiatives, create outlier around eco-friendly product sales",
161
+ ),
162
+
163
+ ("Banking", "Marketing"): OutlierConfig(
164
+ required=[
165
+ OutlierPattern(
166
+ name="Funnel Drop-off",
167
+ category="conversion",
168
+ viz_type="COLUMN",
169
+ viz_question="Conversion rate by funnel stage",
170
+ viz_talking_point="70% drop-off at application page — UX issue?",
171
+ spotter_questions=[
172
+ "Where is our biggest funnel drop-off?",
173
+ "What's our application completion rate?",
174
+ ],
175
+ spotter_followups=[],
176
+ sql_template="",
177
+ affected_columns=[],
178
+ magnitude="70% drop-off",
179
+ target_filter="",
180
+ demo_setup="Show the full funnel from impression to approval",
181
+ demo_payoff="The application page is killing conversions",
182
+ ),
183
+ ],
184
+ optional=[
185
+ OutlierPattern(
186
+ name="Channel Performance",
187
+ category="channel",
188
+ viz_type="COLUMN",
189
+ viz_question="CTR by channel",
190
+ viz_talking_point="Mobile CTR 2x desktop — shift budget?",
191
+ spotter_questions=["Which channel has the best CTR?"],
192
+ spotter_followups=[],
193
+ sql_template="",
194
+ affected_columns=[],
195
+ magnitude="2x desktop",
196
+ target_filter="",
197
+ demo_setup="",
198
+ demo_payoff="",
199
+ ),
200
+ ],
201
+ allow_ai_generated=True,
202
+ ai_guidance="Consider seasonal patterns in loan applications",
203
+ ),
204
+ }
205
+
206
+
207
+ def get_outliers_for_use_case(vertical: str, function: str) -> OutlierConfig:
208
+ """Get outlier configuration for a use case, with fallback to empty config."""
209
+ return OUTLIER_CONFIGS.get(
210
+ (vertical, function),
211
+ OutlierConfig(
212
+ required=[],
213
+ optional=[],
214
+ allow_ai_generated=True,
215
+ ai_guidance=f"Generate outliers appropriate for {vertical} {function}"
216
+ )
217
+ )
218
+
219
+
220
+ # ============================================================================
221
+ # Legacy Outlier System (existing code below)
222
+ # ============================================================================
223
+
224
+ @dataclass
225
+ class LegacyOutlierPattern:
226
+ """Represents a data pattern to inject (legacy structure)."""
227
  title: str
228
  description: str
229
  sql_update: str
 
307
  target_table, target_column, conditions, pattern_description
308
  )
309
 
310
+ return LegacyOutlierPattern(
311
  title=parsed.get('title', pattern_description[:50]),
312
  description=pattern_description,
313
  sql_update=sql,
 
614
 
615
  def apply_outliers(
616
  snowflake_conn,
617
+ outliers: List[LegacyOutlierPattern],
618
  schema_name: str,
619
  dry_run: bool = False
620
  ) -> List[Dict]:
 
675
 
676
 
677
  def generate_demo_pack(
678
+ outliers: List[LegacyOutlierPattern],
679
  company_name: str,
680
  use_case: str
681
  ) -> str:
prompts.py CHANGED
@@ -408,4 +408,159 @@ REQUIREMENTS:
408
  - Add data validation and error handling
409
  - Generate complete .env file template
410
 
411
- Generate executable code that creates compelling {use_case} demo data for {company_name}."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
408
  - Add data validation and error handling
409
  - Generate complete .env file template
410
 
411
+ Generate executable code that creates compelling {use_case} demo data for {company_name}."""
412
+
413
+ # ============================================================================
414
+ # UNIFIED PROMPT BUILDING SYSTEM (Phase 1 - February 2026)
415
+ # ============================================================================
416
+ # New composable prompt construction system that assembles context sections
417
+ # consistently across all stages (research, DDL, liveboard, demo notes)
418
+ # ============================================================================
419
+
420
+ def build_prompt(
421
+ stage: str,
422
+ vertical: str,
423
+ function: str,
424
+ company_context: str,
425
+ user_overrides: str = None,
426
+ ) -> str:
427
+ """
428
+ Build a complete prompt by assembling context sections.
429
+
430
+ Args:
431
+ stage: One of "research", "ddl", "liveboard", "demo_notes"
432
+ vertical: Industry vertical (e.g., "Retail")
433
+ function: Functional department (e.g., "Sales")
434
+ company_context: Text from website research
435
+ user_overrides: Optional user requirements that override defaults
436
+
437
+ Returns:
438
+ Complete prompt string ready for LLM
439
+ """
440
+ from demo_personas import get_use_case_config
441
+ from outlier_system import get_outliers_for_use_case
442
+
443
+ # Get merged configuration
444
+ config = get_use_case_config(vertical, function)
445
+ outliers = get_outliers_for_use_case(vertical, function)
446
+
447
+ # Build sections
448
+ sections = []
449
+
450
+ # Section A: Company Context
451
+ sections.append(f"""## COMPANY CONTEXT
452
+ {company_context}""")
453
+
454
+ # Section B: Use Case Framework
455
+ persona = config.get("target_persona", "Business Leader")
456
+ problem = config.get("business_problem", "Need for faster, data-driven decisions")
457
+ sections.append(f"""## USE CASE
458
+ - **Name:** {vertical} {function}
459
+ - **Target Persona:** {persona}
460
+ - **Business Problem:** {problem}
461
+ - **Industry Terms:** {', '.join(config.get('industry_terms', []))}
462
+ - **Typical Entities:** {', '.join(config.get('entities', []))}""")
463
+
464
+ # Section C: Required KPIs and Visualizations
465
+ kpi_text = "\n".join([f"- {kpi}: {config['kpi_definitions'].get(kpi, '')}" for kpi in config.get('kpis', [])])
466
+ sections.append(f"""## REQUIRED KPIs
467
+ {kpi_text}
468
+
469
+ ## REQUIRED VISUALIZATIONS
470
+ {', '.join(config.get('viz_types', []))}""")
471
+
472
+ # Section D: Outlier Patterns
473
+ if outliers.required:
474
+ outlier_text = "\n".join([f"- **{o.name}:** {o.viz_talking_point}" for o in outliers.required])
475
+ sections.append(f"""## DATA STORIES TO CREATE
476
+ {outlier_text}""")
477
+
478
+ # Section E: Spotter Questions
479
+ spotter_qs = []
480
+ for o in outliers.required:
481
+ spotter_qs.extend(o.spotter_questions[:2]) # Top 2 from each required outlier
482
+ if spotter_qs:
483
+ sections.append(f"""## SPOTTER QUESTIONS TO ENABLE
484
+ {chr(10).join(['- ' + q for q in spotter_qs[:6]])}""")
485
+
486
+ # Section F: User Overrides
487
+ if user_overrides:
488
+ sections.append(f"""## USER REQUIREMENTS (override defaults)
489
+ {user_overrides}""")
490
+
491
+ # Section G: AI Guidance
492
+ if config.get("is_generic"):
493
+ ai_tasks = config.get("ai_should_determine", [])
494
+ sections.append(f"""## AI TASKS (Generic Use Case)
495
+ This is a generic use case without pre-defined configuration.
496
+ Please determine the following based on company context:
497
+ {chr(10).join(['- ' + task for task in ai_tasks])}""")
498
+ else:
499
+ sections.append("""## AI GUIDANCE
500
+ - Include all REQUIRED KPIs and visualizations listed above
501
+ - You may add 2-3 additional items if valuable for this specific company
502
+ - If you add something, briefly explain why""")
503
+
504
+ # Assemble final prompt
505
+ context_block = "\n\n---\n\n".join(sections)
506
+
507
+ # Get stage-specific template
508
+ template = STAGE_TEMPLATES.get(stage, DEFAULT_TEMPLATE)
509
+
510
+ return template.format(
511
+ context=context_block,
512
+ vertical=vertical,
513
+ function=function,
514
+ )
515
+
516
+
517
+ # Stage-specific templates
518
+ STAGE_TEMPLATES = {
519
+ "research": """You are a business intelligence analyst researching a company for demo preparation.
520
+
521
+ {context}
522
+
523
+ ---
524
+
525
+ Provide comprehensive research focusing on information that will help create a compelling {vertical} {function} demo.""",
526
+
527
+ "ddl": """You are a database architect creating a schema for a {vertical} {function} demo.
528
+
529
+ {context}
530
+
531
+ ---
532
+
533
+ Create Snowflake DDL that supports all the KPIs, visualizations, and data stories listed above.
534
+ Follow star schema design with clear fact and dimension tables.""",
535
+
536
+ "liveboard": """You are creating a ThoughtSpot liveboard for a {vertical} {function} demo.
537
+
538
+ {context}
539
+
540
+ ---
541
+
542
+ Generate visualization questions that will create a compelling liveboard.
543
+ The first two questions MUST be KPIs with sparklines (format: "{{measure}} weekly" or "{{measure}} monthly").
544
+ Include visualizations that enable the data stories and Spotter questions listed above.""",
545
+
546
+ "demo_notes": """You are creating demo talking points for a {vertical} {function} demo.
547
+
548
+ {context}
549
+
550
+ ---
551
+
552
+ Create a bullet outline demo script with:
553
+ - Opening hook and problem statement
554
+ - Key visualizations to show with talking points
555
+ - The "aha moment" reveal
556
+ - Spotter questions to ask live
557
+ - Closing value proposition""",
558
+ }
559
+
560
+ DEFAULT_TEMPLATE = """You are helping create a {vertical} {function} demo.
561
+
562
+ {context}
563
+
564
+ ---
565
+
566
+ Provide output appropriate for this use case."""
smart_data_adjuster.py CHANGED
@@ -7,7 +7,6 @@ Bundles confirmations into one step when confident.
7
 
8
  import os
9
  from typing import Dict, List, Optional, Tuple
10
- from openai import OpenAI
11
  from snowflake_auth import get_snowflake_connection
12
  from thoughtspot_deployer import ThoughtSpotDeployer
13
  import json
@@ -16,18 +15,67 @@ import json
16
  class SmartDataAdjuster:
17
  """Smart adjuster with liveboard context and conversational flow"""
18
 
19
- def __init__(self, database: str, schema: str, liveboard_guid: str):
20
  self.database = database
21
  self.schema = schema
22
  self.liveboard_guid = liveboard_guid
23
  self.conn = None
24
  self.ts_client = None
25
- self.openai_client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
 
 
 
26
 
27
  # Context about the liveboard
28
  self.liveboard_name = None
29
  self.visualizations = [] # List of viz metadata
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  def connect(self):
32
  """Connect to Snowflake and ThoughtSpot"""
33
  # Snowflake
@@ -300,13 +348,7 @@ CRITICAL: target_value and percentage must be numbers, never strings.
300
  If unsure about ANY field, set confidence to "low" or "medium".
301
  """
302
 
303
- response = self.openai_client.chat.completions.create(
304
- model="gpt-4o",
305
- messages=[{"role": "user", "content": prompt}],
306
- temperature=0
307
- )
308
-
309
- content = response.choices[0].message.content
310
  if content.startswith('```'):
311
  lines = content.split('\n')
312
  content = '\n'.join(lines[1:-1])
 
7
 
8
  import os
9
  from typing import Dict, List, Optional, Tuple
 
10
  from snowflake_auth import get_snowflake_connection
11
  from thoughtspot_deployer import ThoughtSpotDeployer
12
  import json
 
15
  class SmartDataAdjuster:
16
  """Smart adjuster with liveboard context and conversational flow"""
17
 
18
+ def __init__(self, database: str, schema: str, liveboard_guid: str, llm_model: str = None):
19
  self.database = database
20
  self.schema = schema
21
  self.liveboard_guid = liveboard_guid
22
  self.conn = None
23
  self.ts_client = None
24
+
25
+ # LLM setup - use provided model or default to Claude
26
+ self.llm_model = llm_model or os.getenv('DEFAULT_LLM', 'claude-sonnet-4')
27
+ self._llm_client = None
28
 
29
  # Context about the liveboard
30
  self.liveboard_name = None
31
  self.visualizations = [] # List of viz metadata
32
 
33
+ def _call_llm(self, prompt: str) -> str:
34
+ """Call the configured LLM (Anthropic or OpenAI)"""
35
+ # Determine provider from model name
36
+ model_lower = self.llm_model.lower()
37
+
38
+ if 'claude' in model_lower or 'anthropic' in model_lower:
39
+ # Use Anthropic
40
+ import anthropic
41
+ client = anthropic.Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))
42
+
43
+ # Map display names to API model names
44
+ model_map = {
45
+ 'claude-sonnet-4': 'claude-sonnet-4-20250514',
46
+ 'claude-sonnet-4.5': 'claude-sonnet-4-20250514',
47
+ 'claude-3.5-sonnet': 'claude-3-5-sonnet-20241022',
48
+ 'claude-3-opus': 'claude-3-opus-20240229',
49
+ }
50
+ api_model = model_map.get(self.llm_model, 'claude-sonnet-4-20250514')
51
+
52
+ response = client.messages.create(
53
+ model=api_model,
54
+ max_tokens=2000,
55
+ messages=[{"role": "user", "content": prompt}]
56
+ )
57
+ return response.content[0].text
58
+ else:
59
+ # Use OpenAI
60
+ from openai import OpenAI
61
+ client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
62
+
63
+ # Map display names to API model names
64
+ model_map = {
65
+ 'gpt-4o': 'gpt-4o',
66
+ 'gpt-4': 'gpt-4',
67
+ 'gpt-4-turbo': 'gpt-4-turbo',
68
+ 'gpt-3.5-turbo': 'gpt-3.5-turbo',
69
+ }
70
+ api_model = model_map.get(self.llm_model, 'gpt-4o')
71
+
72
+ response = client.chat.completions.create(
73
+ model=api_model,
74
+ messages=[{"role": "user", "content": prompt}],
75
+ temperature=0
76
+ )
77
+ return response.choices[0].message.content
78
+
79
  def connect(self):
80
  """Connect to Snowflake and ThoughtSpot"""
81
  # Snowflake
 
348
  If unsure about ANY field, set confidence to "low" or "medium".
349
  """
350
 
351
+ content = self._call_llm(prompt)
 
 
 
 
 
 
352
  if content.startswith('```'):
353
  lines = content.split('\n')
354
  content = '\n'.join(lines[1:-1])
sprint_2026_02.md CHANGED
@@ -68,6 +68,9 @@
68
  ## Tasks
69
 
70
  ### To Do
 
 
 
71
 
72
  #### LegitData Improvements (from REI demo learnings)
73
  - [ ] **Fix DAYSONHAND generation** - Currently random, needs business logic:
@@ -143,3 +146,145 @@
143
 
144
  ## Notes
145
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ## Tasks
69
 
70
  ### To Do
71
+ - [ ] **Fix tag assignment to models** - Returns 404 error, works for tables but not models
72
+ - [ ] **CRITICAL: Fix MCP bearer auth for on-prem deployments** - OAuth workaround works for cloud but bearer auth needed for on-prem instances that OAuth can't reach (see detailed notes below)
73
+ - [ ] **Fix research cache not loading** - Cache files exist but aren't found due to relative path issue (fix ready, needs restart to test)
74
 
75
  #### LegitData Improvements (from REI demo learnings)
76
  - [ ] **Fix DAYSONHAND generation** - Currently random, needs business logic:
 
146
 
147
  ## Notes
148
 
149
+ ### Feb 3, 2026 - ThoughtSpot Model Validation & MCP Import Fix
150
+
151
+ **Issue 1: ThoughtSpot Model Validation Failed (Error 13124)**
152
+ - Model TML was missing ID columns (CUSTOMER_ID, STORE_ID, etc.) that were referenced in joins
153
+ - Joins validated but columns section didn't include the join keys
154
+
155
+ **Root Cause:**
156
+ - Code in `thoughtspot_deployer.py` (lines 976-984) was intentionally skipping FK/PK columns to "clean up" the model
157
+ - Logic: "nobody searches for customer 23455 so hide ID columns"
158
+ - But ThoughtSpot requires columns used in joins to be present in the model, even if users don't search them
159
+
160
+ **Solution:**
161
+ - Commented out the skip logic for FK/PK columns in `_create_model_with_constraints()`
162
+ - ID columns now included in model's `columns:` section
163
+ - Model deploys successfully with all 54 columns including IDs
164
+
165
+ **Issue 2: MCP Import Error**
166
+ - `from mcp.client.streamable_http import streamablehttp_client` failed
167
+ - ModuleNotFoundError: No module named 'mcp.client.streamable_http'
168
+
169
+ **Root Cause:**
170
+ - MCP package upgraded from 0.x to 1.0.0
171
+ - Module structure changed: `streamable_http` → `sse` (Server-Sent Events)
172
+
173
+ **Solution:**
174
+ - Updated import: `from mcp.client.sse import sse_client`
175
+ - Updated client usage: `sse_client()` instead of `streamablehttp_client()`
176
+
177
+ ---
178
+
179
+ ### Feb 3, 2026 - Supabase Compatibility Fix
180
+ **Issue:** Supabase module import failing with `ModuleNotFoundError: No module named 'websockets.asyncio'` causing app to not load settings and default to OpenAI (which had exceeded quota).
181
+
182
+ **Root Cause:**
183
+ - Gradio 4.44.0 requires `websockets<13.0`
184
+ - Newer Supabase versions (2.10+) require `websockets>=11` but pull realtime 2.x which needs `websockets.asyncio` (only in 13+)
185
+ - Version conflict prevented Supabase from loading
186
+
187
+ **Solution:** Downgraded to compatible version set:
188
+ - `supabase==1.2.0`
189
+ - `realtime==1.0.6`
190
+ - `websockets==12.0`
191
+ - `httpx==0.24.1` (already had this)
192
+ - `gradio==4.44.0` (unchanged)
193
+
194
+ **Impact:** Settings now load properly from Supabase, app uses correct LLM model from user settings instead of falling back to OpenAI.
195
+
196
+ ---
197
+
198
+ ### Feb 3, 2026 - MCP Bearer Auth vs OAuth Investigation
199
+
200
+ **Context:** MCP liveboard creation was working previously with on-prem ThoughtSpot instances that can't be reached via OAuth. This means bearer auth was the working solution. However, current implementation fails with 400 Bad Request.
201
+
202
+ **Problem Statement:**
203
+ - MCP endpoint `https://agent.thoughtspot.app/bearer/mcp` returns 400 Bad Request when using SSE or streamable_http clients
204
+ - OAuth via stdio works but only for cloud instances accessible from internet
205
+ - Need bearer auth for on-prem deployments
206
+
207
+ **Investigation Timeline:**
208
+
209
+ 1. **Initial Error (Feb 3 AM):**
210
+ - Error: `HTTPStatusError: Client error '400 Bad Request' for url 'https://agent.thoughtspot.app/bearer/mcp'`
211
+ - Code was using `from mcp.client.sse import sse_client` (MCP 1.0)
212
+ - Bearer auth header format: `Bearer {token}@{host}`
213
+
214
+ 2. **First Attempted Fix - Downgrade to MCP 0.9.1:**
215
+ - Reasoning: Maybe MCP 1.0's SSE client doesn't work with bearer endpoint
216
+ - Result: MCP 0.9.1 doesn't have `streamable_http` module either - only has `sse` and `stdio`
217
+ - **Learning:** `streamable_http` never existed in any released MCP version we can access
218
+
219
+ 3. **Git History Investigation:**
220
+ - Commit `f10a9f5` (Jan 27): Added `from mcp.client.streamable_http import streamablehttp_client` with bearer auth
221
+ - requirements.txt at that time: `mcp==1.0.0`
222
+ - But MCP 1.0.0 doesn't actually have `streamable_http` module!
223
+ - **Learning:** That code was committed but never successfully tested/deployed
224
+
225
+ 4. **Found Working Implementation:**
226
+ - Commit `d26f47e` (earlier): Used `stdio_client` with `npx mcp-remote` proxy
227
+ - Code:
228
+ ```python
229
+ from mcp import ClientSession, StdioServerParameters
230
+ from mcp.client.stdio import stdio_client
231
+
232
+ server_params = StdioServerParameters(
233
+ command="npx",
234
+ args=["mcp-remote@latest", "https://agent.thoughtspot.app/mcp"]
235
+ )
236
+ async with stdio_client(server_params) as (read, write):
237
+ async with ClientSession(read, write) as session:
238
+ await session.initialize()
239
+ ```
240
+ - This approach uses OAuth but works
241
+
242
+ 5. **Current Workaround (OAuth via stdio):**
243
+ - Reverted to stdio_client approach from commit d26f47e
244
+ - Tested successfully: Created liveboard b6cc9cad-ff91-4dd4-aec5-091984c2afd2
245
+ - OAuth flow opens browser for authorization
246
+ - Works for cloud instances only
247
+
248
+ **Technical Details:**
249
+
250
+ **Bearer Auth Endpoint (Not Working):**
251
+ - URL: `https://agent.thoughtspot.app/bearer/mcp`
252
+ - Auth header: `Bearer {token}@{host}`
253
+ - Transport: Unknown (streamable_http doesn't exist, SSE returns 400)
254
+ - Status: 400 Bad Request - endpoint rejects SSE connection attempts
255
+
256
+ **OAuth Endpoint (Currently Working):**
257
+ - URL: `https://agent.thoughtspot.app/mcp`
258
+ - Proxy: `npx mcp-remote@latest`
259
+ - Transport: stdio → npx → StreamableHTTPClientTransport (handled by mcp-remote)
260
+ - Auth: Browser OAuth flow
261
+ - Limitation: Requires internet-accessible ThoughtSpot instance
262
+
263
+ **The Problem:**
264
+ - User confirmed it was working with on-prem instances before
265
+ - On-prem instances can't complete OAuth (not internet-accessible)
266
+ - Therefore, bearer auth must have been working at some point
267
+ - But no evidence in git history of working bearer auth code
268
+ - `mcp-remote` proxy shows it connects using `StreamableHTTPClientTransport` after OAuth
269
+ - The bearer endpoint might require the same transport but with bearer auth headers instead of OAuth
270
+
271
+ **Possible Solutions to Investigate:**
272
+ 1. **Use mcp-remote with bearer auth**: See if `npx mcp-remote` supports bearer token parameter
273
+ 2. **Direct StreamableHTTPClientTransport**: Find/install the transport library that mcp-remote uses internally
274
+ 3. **MCP pre-1.0 version**: Search for alpha/beta versions before 0.9.1 that might have streamable_http
275
+ 4. **ThoughtSpot-specific MCP package**: Check if ThoughtSpot provides their own MCP client library
276
+ 5. **Raw HTTP requests**: Bypass MCP library and make direct HTTP calls to bearer endpoint
277
+
278
+ **Current State:**
279
+ - OAuth via stdio works for cloud instances
280
+ - Bearer auth needed for on-prem but implementation unclear
281
+ - Temporary workaround: Using OAuth approach (works for testing/development)
282
+ - **BLOCKER for on-prem deployments**
283
+
284
+ **Next Steps:**
285
+ - [ ] Contact ThoughtSpot to ask about bearer auth implementation
286
+ - [ ] Investigate mcp-remote source code to see how it handles StreamableHTTPClientTransport
287
+ - [ ] Test if mcp-remote accepts bearer token as parameter
288
+ - [ ] Look for ThoughtSpot-specific documentation on MCP bearer auth
289
+
290
+
supabase_client.py CHANGED
@@ -380,7 +380,6 @@ def load_gradio_settings(email: str) -> Dict[str, Any]:
380
  "column_naming_style": "snake_case", # Options: snake_case, camelCase, PascalCase, UPPER_CASE, original
381
 
382
  # Liveboard Creation
383
- "liveboard_method": "HYBRID",
384
  "geo_scope": "USA Only",
385
  "validation_mode": "Off",
386
 
 
380
  "column_naming_style": "snake_case", # Options: snake_case, camelCase, PascalCase, UPPER_CASE, original
381
 
382
  # Liveboard Creation
 
383
  "geo_scope": "USA Only",
384
  "validation_mode": "Off",
385
 
thoughtspot_deployer.py CHANGED
@@ -973,15 +973,19 @@ class ThoughtSpotDeployer:
973
  col_name = col['name'].upper()
974
  original_col_name = col.get('original_name', col['name']) # Use original casing for display
975
 
 
 
 
 
976
  # SKIP foreign key columns - they're join keys, not analytics columns
977
- if self._is_foreign_key_column(col_name, table_name_upper, foreign_keys):
978
- print(f" ⏭️ Skipping FK column: {table_name_upper}.{col_name}")
979
- continue
980
-
981
  # SKIP surrogate primary keys (numeric IDs) - nobody searches "customer 23455"
982
- if self._is_surrogate_primary_key(col, col_name):
983
- print(f" ⏭️ Skipping surrogate PK: {table_name_upper}.{col_name}")
984
- continue
985
 
986
  # Start with basic conflict resolution
987
  display_name = self._resolve_column_name_conflict(
@@ -1646,6 +1650,9 @@ class ThoughtSpotDeployer:
1646
  return True
1647
  else:
1648
  print(f"[ThoughtSpot] ⚠️ Tag assignment failed: {assign_response.status_code}", flush=True)
 
 
 
1649
  return False
1650
 
1651
  except Exception as e:
@@ -2126,8 +2133,10 @@ class ThoughtSpotDeployer:
2126
 
2127
  try:
2128
  # Build company data from parameters
 
 
2129
  company_data = {
2130
- 'name': company_name or 'Demo Company',
2131
  'use_case': use_case or 'General Analytics'
2132
  }
2133
 
 
973
  col_name = col['name'].upper()
974
  original_col_name = col.get('original_name', col['name']) # Use original casing for display
975
 
976
+ # NOTE: We used to skip FK/PK columns, but ThoughtSpot requires them for joins
977
+ # Even though users don't search "customer 23455", the join columns must be present
978
+ # in the model's columns section for the joins to work properly.
979
+ #
980
  # SKIP foreign key columns - they're join keys, not analytics columns
981
+ # if self._is_foreign_key_column(col_name, table_name_upper, foreign_keys):
982
+ # print(f" ⏭️ Skipping FK column: {table_name_upper}.{col_name}")
983
+ # continue
984
+ #
985
  # SKIP surrogate primary keys (numeric IDs) - nobody searches "customer 23455"
986
+ # if self._is_surrogate_primary_key(col, col_name):
987
+ # print(f" ⏭️ Skipping surrogate PK: {table_name_upper}.{col_name}")
988
+ # continue
989
 
990
  # Start with basic conflict resolution
991
  display_name = self._resolve_column_name_conflict(
 
1650
  return True
1651
  else:
1652
  print(f"[ThoughtSpot] ⚠️ Tag assignment failed: {assign_response.status_code}", flush=True)
1653
+ print(f"[ThoughtSpot] DEBUG: Response text: {assign_response.text[:500]}", flush=True)
1654
+ print(f"[ThoughtSpot] DEBUG: Object GUIDs: {object_guids}", flush=True)
1655
+ print(f"[ThoughtSpot] DEBUG: Object type: {object_type}", flush=True)
1656
  return False
1657
 
1658
  except Exception as e:
 
2133
 
2134
  try:
2135
  # Build company data from parameters
2136
+ # Clean company name for display (strip .com, .org, etc)
2137
+ clean_company = company_name.split('.')[0].title() if company_name and '.' in company_name else (company_name or 'Demo Company')
2138
  company_data = {
2139
+ 'name': clean_company,
2140
  'use_case': use_case or 'General Analytics'
2141
  }
2142