Spaces:

thoughtspot-dp
/

demoprep

Running

mikeboone Cursor commited on Feb 6

Commit

3b2cd7b

1 Parent(s): b5feaff

Feb sprint: vertical×function matrix, structured outliers, unified prompts

- demo_personas.py: VERTICALS, FUNCTIONS, MATRIX_OVERRIDES dicts with get_use_case_config() and parse_use_case()
- outlier_system.py: OutlierPattern/OutlierConfig dataclasses, OUTLIER_CONFIGS with Retail Sales patterns
- prompts.py: build_prompt() composable system, STAGE_TEMPLATES for research/ddl/liveboard/demo_notes
- chat_interface.py: research cache fix (absolute paths), auto-use cache, DDL failure guard
- liveboard_creator.py: _clean_viz_title() helper, revert MCP to working stdio/npx approach
- smart_data_adjuster.py: multi-LLM support (Claude + OpenAI) via _call_llm()
- thoughtspot_deployer.py: fix model validation by keeping FK/PK columns, tag debug logging
- CLAUDE.md/PROJECT_STATUS.md: simplify liveboard docs to unified process
- demo_prep.py: remove unsupported max_lines from gr.Code()

Co-authored-by: Cursor <cursoragent@cursor.com>

Files changed (12) hide show

CLAUDE.md +21 -58
PROJECT_STATUS.md +8 -11
chat_interface.py +129 -41
demo_personas.py +368 -15
demo_prep.py +2 -4
liveboard_creator.py +125 -38
outlier_system.py +197 -5
prompts.py +156 -1
smart_data_adjuster.py +52 -10
sprint_2026_02.md +145 -0
supabase_client.py +0 -1
thoughtspot_deployer.py +17 -8

CLAUDE.md CHANGED Viewed

@@ -111,7 +111,7 @@ Example:
 ```bash
 # Run the app properly
-source ./demoprep/bin/activate && python demo_prep.py
 # Check git changes
 git diff --stat
@@ -201,51 +201,21 @@ DO NOT use create_visualization_tml() directly - that's internal low-level code
 ---
-## Liveboard Creation - Three-Method System
-**PRIMARY GOAL: All three methods (TML, MCP, HYBRID) can be selected via Settings UI**
-### Method Selection
-- **Settings UI:** Admin tab → "Liveboard Creation Method" dropdown
-- **Environment variable:** `LIVEBOARD_METHOD=TML|MCP|HYBRID`
-- **Legacy:** `USE_MCP_LIVEBOARD=true/false` still works for backwards compatibility
-- **Default:** HYBRID (recommended)
-- **Entry point:** `thoughtspot_deployer.py` deploy_all() function
-### The Three Methods
-| Method | Speed | Quality | Control | Best For |
-|--------|-------|---------|---------|----------|
-| **TML** | ~20s | High (with tuning) | Full | Precise control, debugging |
-| **MCP** | ~60s | Basic | None | Quick prototypes |
-| **HYBRID** | ~90s | Best | Via post-processing | Production demos |
-### TML Method (Template-Based)
-- Builds ThoughtSpot Modeling Language (YAML) structures directly
-- Full control over chart types, layout, colors
-- REST API with token auth
-- **Main function:** `create_liveboard_from_model()` in liveboard_creator.py
-- **Class:** `LiveboardCreator`
-### MCP Method (AI-Driven)
-- Uses Model Context Protocol with ThoughtSpot's agent.thoughtspot.app
-- Leverages ThoughtSpot's AI for smart question generation
-- Natural language questions → ThoughtSpot creates visualizations
-- OAuth authentication, requires npx/Node.js
-- **Main function:** `create_liveboard_from_model_mcp()` in liveboard_creator.py
-### HYBRID Method (Recommended)
-- **Step 1:** MCP creates liveboard quickly with AI-driven questions
-- **Step 2:** TML post-processing enhances with:
-  - Groups (tabs) for organization
-  - KPI sparkline fixes
-  - Brand color styling
-- **Main functions:**
-  - `create_liveboard_from_model_mcp()` for creation
-  - `enhance_mcp_liveboard()` for post-processing
-### enhance_mcp_liveboard() Function
-Located in `liveboard_creator.py`, this function:
 1. Exports the MCP-created liveboard TML
 2. Classifies visualizations by type (KPI, trend, categorical)
 3. Adds Groups (tabs) to organize by type
@@ -253,18 +223,16 @@ Located in `liveboard_creator.py`, this function:
 5. Applies brand colors to groups and tiles
 6. Re-imports the enhanced TML
-### KPI Requirements (All methods need these)
 - **For sparklines and percent change comparisons:**
   - Must include time dimension (date column)
   - Must specify granularity (daily, weekly, monthly, quarterly, yearly)
   - Example: `[Total_revenue] [Order_date].monthly`
-- **MCP:** Natural language includes time context
-- **TML:** Search query must have `[measure] [date_column].granularity`
-- **HYBRID:** Post-processing adds sparkline settings automatically
 ### Terminology (Important!)
-- **Outliers** = Interesting data points in existing data (works with all methods)
-- **Data Adjuster** = Modifying data values (NOT possible with MCP, needs Snowflake views)
 ### Golden Demo Structure
 - **Location:** `dev_notes/liveboard_demogold2/🏬 Global Retail Apparel Sales (New).liveboard.tml`
@@ -273,11 +241,6 @@ Located in `liveboard_creator.py`, this function:
 - Brand colors via style_properties (GBC_A-J for groups, TBC_A-J for tiles)
 - KPI structure: `[sales] [date].weekly [date].'last 8 quarters'`
-### Testing Strategy
-- Test all three methods when changing shared code
-- HYBRID should be the default for most testing
-- Use TML for debugging visualization issues
 ---
 ## Frustration Points (AVOID)
@@ -301,5 +264,5 @@ User gets frustrated when you:
 ---
-*Last Updated: January 13, 2026*
 *This is the source of truth - update rules here, not in .cursorrules*

 ```bash
 # Run the app properly
+source ./demoprep/bin/activate && python chat_interface.py
 # Check git changes
 git diff --stat
 ---
+## Liveboard Creation
+Liveboard creation is a single unified process with two phases:
+1. **MCP Creation** - Uses ThoughtSpot's AI (via Model Context Protocol at `agent.thoughtspot.app`) to generate smart visualizations from natural language questions
+2. **TML Post-Processing** - Enhances the AI-created liveboard with groups, KPI sparklines, brand colors, and layout refinement
+These are implemented as separate functions but are **one process** - do NOT treat them as separate "methods" or offer the user a choice between them.
+### Key Functions (liveboard_creator.py)
+- **`create_liveboard_from_model_mcp()`** - Main entry point. Handles MCP creation.
+- **`enhance_mcp_liveboard()`** - Post-processing. Exports TML, enhances, re-imports.
+- **`LiveboardCreator` class** - TML utilities used during post-processing.
+### enhance_mcp_liveboard() Details
 1. Exports the MCP-created liveboard TML
 2. Classifies visualizations by type (KPI, trend, categorical)
 3. Adds Groups (tabs) to organize by type
 5. Applies brand colors to groups and tiles
 6. Re-imports the enhanced TML
+### KPI Requirements
 - **For sparklines and percent change comparisons:**
   - Must include time dimension (date column)
   - Must specify granularity (daily, weekly, monthly, quarterly, yearly)
   - Example: `[Total_revenue] [Order_date].monthly`
+- Post-processing adds sparkline settings automatically
 ### Terminology (Important!)
+- **Outliers** = Interesting data points in existing data
+- **Data Adjuster** = Modifying data values (needs Snowflake views)
 ### Golden Demo Structure
 - **Location:** `dev_notes/liveboard_demogold2/🏬 Global Retail Apparel Sales (New).liveboard.tml`
 - Brand colors via style_properties (GBC_A-J for groups, TBC_A-J for tiles)
 - KPI structure: `[sales] [date].weekly [date].'last 8 quarters'`
 ---
 ## Frustration Points (AVOID)
 ---
+*Last Updated: February 4, 2026*
 *This is the source of truth - update rules here, not in .cursorrules*

PROJECT_STATUS.md CHANGED Viewed

@@ -40,15 +40,13 @@ An AI-powered demo builder for ThoughtSpot that automatically creates complete d
 **Working:**
 - End-to-end demo creation via chat interface
-- Three-method liveboard creation (TML, MCP, HYBRID)
-- HYBRID method: MCP creates + TML post-processing for Groups, KPIs, colors
-- Settings UI for method selection
 - LegitData for realistic data generation
 - Supabase settings persistence
 - ThoughtSpot authentication and deployment
 **Needs Work:**
-- Outliers not working well with MCP method
 - Data adjuster has column matching issues
 - Tags not assigning to objects
@@ -56,10 +54,9 @@ An AI-powered demo builder for ThoughtSpot that automatically creates complete d
 ## Key Technical Decisions
-**Liveboard Creation**: Three-method system (configurable via Settings UI)
-- TML: Template-based, full control over visualizations
-- MCP: AI-driven, fast creation, basic quality
-- HYBRID (default): MCP creates + TML post-processing (recommended)
 **Data Generation**: LegitData
 - Uses AI + web search for realistic data
@@ -79,10 +76,10 @@ An AI-powered demo builder for ThoughtSpot that automatically creates complete d
 ## Sprint History
-- **Sprint Jan 2026**: Making it better (current) - see `sprint_2026_01.md` in root
 - *(Previous sprints archived in dev_notes/archive/)*
-- *(Sprint files are gitignored - local working docs)*
 ---
-*Last Updated: January 12, 2026*

 **Working:**
 - End-to-end demo creation via chat interface
+- Liveboard creation: MCP creates visualizations + TML post-processing for Groups, KPIs, colors
 - LegitData for realistic data generation
 - Supabase settings persistence
 - ThoughtSpot authentication and deployment
 **Needs Work:**
+- Outliers need better integration into liveboard creation
 - Data adjuster has column matching issues
 - Tags not assigning to objects
 ## Key Technical Decisions
+**Liveboard Creation**: MCP creation + TML post-processing
+- MCP (via `agent.thoughtspot.app`) generates AI-driven visualizations
+- TML post-processing adds Groups, KPI sparklines, brand colors, layout refinement
 **Data Generation**: LegitData
 - Uses AI + web search for realistic data
 ## Sprint History
+- **Sprint Feb 2026**: Current - see `sprint_2026_02.md` in root
+- **Sprint Jan 2026**: Closed - see `sprint_2026_01.md` in root
 - *(Previous sprints archived in dev_notes/archive/)*
 ---
+*Last Updated: February 4, 2026*

chat_interface.py CHANGED Viewed

@@ -9,11 +9,21 @@ warnings.filterwarnings('ignore', message='.*tuples.*format.*chatbot.*deprecated
 import gradio as gr
 import os
 import sys
 from dotenv import load_dotenv
 from demo_builder_class import DemoBuilder
 from supabase_client import load_gradio_settings
 from main_research import MultiLLMResearcher, Website
-from demo_personas import build_company_analysis_prompt, build_industry_research_prompt
 from demo_prep import map_llm_display_to_provider
 load_dotenv(override=True)
@@ -497,6 +507,13 @@ Watch the AI Feedback tab for real-time progress!"""
                 # Auto-create DDL
                 ddl_response, ddl_code = self.run_ddl_creation()
                 chat_history[-1] = (message, f"✅ DDL Created\n\n🚀 **Deploying to Snowflake...**")
                 yield chat_history, current_stage, current_model, company, use_case, ""
@@ -1402,6 +1419,10 @@ To change settings, use:
             use_case: Use case name
             generic_context: Additional context provided by user for generic use cases
         """
         import time
         import os
         from main_research import ResultsManager
@@ -1454,25 +1475,41 @@ To change settings, use:
             use_case_safe = use_case.lower().replace(' ', '_').replace('/', '_')
             # Try new format first (with use case)
             cache_filename = f"{safe_domain}_{use_case_safe}.json"
-            cache_filepath = os.path.join("results", cache_filename)
-            # If new format doesn't exist, try old format (without use case)
             if not os.path.exists(cache_filepath):
-                old_cache_filename = f"research_{safe_domain}.json"
-                old_cache_filepath = os.path.join("results", old_cache_filename)
-                if os.path.exists(old_cache_filepath):
-                    cache_filename = old_cache_filename
-                    cache_filepath = old_cache_filepath
             cached_results = None
             cache_age_hours = None
-            # Allow cache for generic use cases during testing (was: skip cache for fresh research)
-            # if self.is_generic_use_case:
-            #     self.log_feedback(f"🔄 Generic use case detected - skipping cache, running fresh research")
-            #     progress_message += f"🔄 **Generic use case** - running fresh research for custom context...\n"
-            #     yield progress_message
             if os.path.exists(cache_filepath):
                 try:
                     # Check cache age (5 day expiry)
@@ -1481,25 +1518,38 @@ To change settings, use:
                     cache_age_hours = cache_age / 3600  # Convert to hours
                     if cache_age_hours <= 120:  # Cache valid for 5 days (120 hours)
-                        self.log_feedback(f"📋 Found cached research (age: {cache_age_hours:.1f} hours)")
-                        progress_message += f"📋 **Found Cached Research!**\n\n"
-                        progress_message += f"**Age:** {cache_age_hours:.1f} hours old\n"
-                        progress_message += f"**Company:** {domain}\n"
-                        progress_message += f"**Use Case:** {use_case}\n\n"
-                        progress_message += "**Would you like to use the cached results?**\n"
-                        progress_message += "- Type 'yes' to use cache (instant)\n"
-                        progress_message += "- Type 'no' to run fresh research (2-3 minutes)\n"
-                        # Store cache info for later use
-                        self._cached_research_path = cache_filepath
-                        self._cache_available = True
-                        # Yield with "yes" pre-filled
                         yield progress_message
-                        return  # Wait for user response
                     else:
-                        self.log_feedback(f"📋 Found cached research but it's too old ({cache_age_hours:.1f} hours)")
-                        progress_message += f"📋 Cache too old ({cache_age_hours:.1f} hours), running fresh research...\n"
                         yield progress_message
                 except Exception as e:
                     self.log_feedback(f"⚠️ Could not load cache: {str(e)}")
@@ -1659,8 +1709,8 @@ To change settings, use:
                 'use_case': use_case,
                 'generated_at': datetime.now().isoformat(),
             }
-            os.makedirs("results", exist_ok=True)
-            ResultsManager.save_results(research_results, cache_filename, "results")
             progress_message += "💾 Cached research results for future use!\n\n"
             yield progress_message
         except Exception as e:
@@ -2076,6 +2126,10 @@ Generate complete CREATE TABLE statements with proper Snowflake syntax and depen
             self.log_feedback("Generating DDL...")
             ddl_result = researcher.make_request(messages, temperature=0.2, max_tokens=4000, stream=False)
             # Store in demo_builder
             self.demo_builder.schema_generation_results = ddl_result
             self.ddl_code = ddl_result
@@ -2104,6 +2158,9 @@ Generate complete CREATE TABLE statements with proper Snowflake syntax and depen
             import traceback
             error_msg = f"❌ DDL creation failed: {str(e)}\n{traceback.format_exc()}"
             self.log_feedback(error_msg)
             return error_msg, ""
     def get_fallback_population_code(self, schema_info, fact_rows=10000, dim_rows=100):
@@ -2475,19 +2532,28 @@ Generate complete CREATE TABLE statements with proper Snowflake syntax and depen
         self.log_feedback("🔢 Starting data population...")
         try:
-            from demo_personas import get_persona_config
             from schema_utils import parse_ddl_schema, generate_schema_constrained_prompt
             import re
-            persona_config = get_persona_config(self.demo_builder.use_case)
             # Build business context for population
             business_context = f"""
 BUSINESS CONTEXT:
-- Use Case: {self.demo_builder.use_case}
-- Target Persona: {persona_config['target_persona']}
-- Business Problem: {persona_config['business_problem']}
-- Demo Objectives: {persona_config['demo_objectives']}
 MANDATORY CONNECTION CODE (MUST BE COMPLETE):
 ```python
@@ -2649,6 +2715,14 @@ LegitData will generate realistic, AI-powered data.
                 self.demo_builder.schema_generation_results
             )
             if not success:
                 log_progress(f"[ERROR] DDL Deployment failed!")
                 raise Exception(f"Schema deployment failed: {deploy_message}")
@@ -2706,8 +2780,17 @@ LegitData will generate realistic, AI-powered data.
             def run_population():
                 try:
                     success, message, results = populate_demo_data(
-                        ddl_content=self.demo_builder.schema_generation_results,
                         company_url=self.demo_builder.company_url,
                         use_case=self.demo_builder.use_case,
                         schema_name=schema_name,
@@ -2865,11 +2948,14 @@ Tables: Created and populated
                 ts_secret = os.getenv('THOUGHTSPOT_SECRET_KEY')
                 liveboard_method = self.settings.get('liveboard_method', 'HYBRID')
-                liveboard_name = self.settings.get('liveboard_name', '') or f"{company} - {use_case}"
                 # Get company data for liveboard
                 company_data = {
-                    'name': company,
                     'url': getattr(self.demo_builder, 'company_url', company),
                     'logo_url': getattr(self.demo_builder, 'logo_url', None),
                     'primary_color': getattr(self.demo_builder, 'primary_color', '#3498db'),
@@ -3231,7 +3317,9 @@ Ask these questions to showcase ThoughtSpot's AI capabilities:
                     try:
                         from smart_data_adjuster import SmartDataAdjuster
-                        adjuster = SmartDataAdjuster(database, schema_name, liveboard_guid)
                         adjuster.connect()
                         if adjuster.load_liveboard_context():
@@ -4243,7 +4331,7 @@ if __name__ == "__main__":
     app.launch(
         server_name="0.0.0.0",
-        server_port=int(os.environ.get('PORT', 7863)),  # Reads from .env, defaults to 7863
         share=False,
         inbrowser=True,
         debug=True,

 import gradio as gr
 import os
 import sys
+import json
+import time
+import glob
 from dotenv import load_dotenv
 from demo_builder_class import DemoBuilder
 from supabase_client import load_gradio_settings
 from main_research import MultiLLMResearcher, Website
+from demo_personas import (
+    build_company_analysis_prompt,
+    build_industry_research_prompt,
+    VERTICALS,
+    FUNCTIONS,
+    get_use_case_config,
+    parse_use_case
+)
 from demo_prep import map_llm_display_to_provider
 load_dotenv(override=True)
                 # Auto-create DDL
                 ddl_response, ddl_code = self.run_ddl_creation()
+                # Check if DDL creation failed
+                if not ddl_code or ddl_code.strip() == "":
+                    chat_history[-1] = (message, f"{ddl_response}\n\n❌ **Cannot proceed without valid DDL.** Please fix the error and try again.")
+                    yield chat_history, current_stage, current_model, company, use_case, ""
+                    return
                 chat_history[-1] = (message, f"✅ DDL Created\n\n🚀 **Deploying to Snowflake...**")
                 yield chat_history, current_stage, current_model, company, use_case, ""
             use_case: Use case name
             generic_context: Additional context provided by user for generic use cases
         """
+        print(f"\n\n[CACHE DEBUG] === run_research_streaming called ===")
+        print(f"[CACHE DEBUG] company: {company}")
+        print(f"[CACHE DEBUG] use_case: {use_case}\n\n")
         import time
         import os
         from main_research import ResultsManager
             use_case_safe = use_case.lower().replace(' ', '_').replace('/', '_')
             # Try new format first (with use case)
+            # Use absolute path to ensure we find cache regardless of CWD
+            script_dir = os.path.dirname(os.path.abspath(__file__))
+            results_dir = os.path.join(script_dir, "results")
             cache_filename = f"{safe_domain}_{use_case_safe}.json"
+            cache_filepath = os.path.join(results_dir, cache_filename)
+            # If exact match doesn't exist, try fuzzy matching for similar use cases
             if not os.path.exists(cache_filepath):
+                import glob
+                print(f"[CACHE DEBUG] Current working directory: {os.getcwd()}")
+                print(f"[CACHE DEBUG] Script directory: {script_dir}")
+                print(f"[CACHE DEBUG] Results directory: {results_dir}")
+                similar_files = glob.glob(os.path.join(results_dir, f"{safe_domain}_*.json"))
+                print(f"[CACHE DEBUG] Exact file {cache_filepath} not found")
+                print(f"[CACHE DEBUG] Glob pattern: {results_dir}/{safe_domain}_*.json")
+                print(f"[CACHE DEBUG] Similar files found: {similar_files}")
+                if similar_files:
+                    # Found similar cache files for this company
+                    cache_filepath = similar_files[0]  # Use the first one found
+                    cache_filename = os.path.basename(cache_filepath)
+                    print(f"[CACHE DEBUG] Using similar file: {cache_filename}")
+                    self.log_feedback(f"📋 Found similar cache file: {cache_filename}")
+                elif not os.path.exists(cache_filepath):
+                    # Try old format (without use case)
+                    old_cache_filename = f"research_{safe_domain}.json"
+                    old_cache_filepath = os.path.join(results_dir, old_cache_filename)
+                    if os.path.exists(old_cache_filepath):
+                        cache_filename = old_cache_filename
+                        cache_filepath = old_cache_filepath
             cached_results = None
             cache_age_hours = None
+            # Check for cached research and use automatically if valid
+            print(f"[CACHE DEBUG] Final cache_filepath: {cache_filepath}, exists: {os.path.exists(cache_filepath)}")
             if os.path.exists(cache_filepath):
                 try:
                     # Check cache age (5 day expiry)
                     cache_age_hours = cache_age / 3600  # Convert to hours
                     if cache_age_hours <= 120:  # Cache valid for 5 days (120 hours)
+                        self.log_feedback(f"📋 Using cached research (age: {cache_age_hours:.1f} hours)")
+                        progress_message += f"📋 **Using Cached Research** ({cache_age_hours:.1f} hours old)\n\n"
+                        # Load cached results automatically
+                        with open(cache_filepath, 'r') as f:
+                            cached_data = json.load(f)
+                        self.demo_builder.company_analysis_results = cached_data.get('company_summary', '')
+                        self.demo_builder.industry_research_results = cached_data.get('research_paper', '')
+                        self.demo_builder.combined_research_results = self.demo_builder.get_research_context()
+                        self.demo_builder.company_url = cached_data.get('url', url)
+                        self.demo_builder.advance_stage()
+                        progress_message += "✅ **Research loaded from cache!**\n\n"
+                        progress_message += "Proceeding to DDL generation...\n"
+                        self.log_feedback("✅ Research loaded from cache, generating DDL")
                         yield progress_message
+                        # Automatically trigger DDL generation
+                        try:
+                            response, ddl_code = self.run_ddl_creation()
+                            yield response
+                        except Exception as e:
+                            import traceback
+                            error_msg = f"❌ DDL generation failed: {str(e)}\n{traceback.format_exc()}"
+                            self.log_feedback(error_msg)
+                            yield error_msg
+                        return
                     else:
+                        self.log_feedback(f"📋 Cache too old ({cache_age_hours:.1f} hours), running fresh research")
+                        progress_message += f"📋 Cache expired ({cache_age_hours:.1f} hours old), running fresh research...\n"
                         yield progress_message
                 except Exception as e:
                     self.log_feedback(f"⚠️ Could not load cache: {str(e)}")
                 'use_case': use_case,
                 'generated_at': datetime.now().isoformat(),
             }
+            os.makedirs(results_dir, exist_ok=True)
+            ResultsManager.save_results(research_results, cache_filename, results_dir)
             progress_message += "💾 Cached research results for future use!\n\n"
             yield progress_message
         except Exception as e:
             self.log_feedback("Generating DDL...")
             ddl_result = researcher.make_request(messages, temperature=0.2, max_tokens=4000, stream=False)
+            # Validate DDL result
+            if not ddl_result or not isinstance(ddl_result, str) or 'CREATE TABLE' not in ddl_result.upper():
+                raise Exception(f"DDL generation failed or produced invalid output. Result: {ddl_result[:200] if ddl_result else 'None'}")
             # Store in demo_builder
             self.demo_builder.schema_generation_results = ddl_result
             self.ddl_code = ddl_result
             import traceback
             error_msg = f"❌ DDL creation failed: {str(e)}\n{traceback.format_exc()}"
             self.log_feedback(error_msg)
+            # Set schema_generation_results to empty string so it's not None
+            self.demo_builder.schema_generation_results = ""
+            self.ddl_code = ""
             return error_msg, ""
     def get_fallback_population_code(self, schema_info, fact_rows=10000, dim_rows=100):
         self.log_feedback("🔢 Starting data population...")
         try:
             from schema_utils import parse_ddl_schema, generate_schema_constrained_prompt
             import re
+            # Parse use case into vertical and function
+            vertical, function = parse_use_case(self.demo_builder.use_case)
+            config = get_use_case_config(vertical or "Generic", function or "Generic")
             # Build business context for population
+            # Handle both new config structure and backward compatibility
+            target_persona = config.get('target_persona', 'Business Leader')
+            business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
+            demo_objectives = config.get('demo_objectives', 'Show self-service analytics and business insights')
+            # For generic cases, use the use_case_name
+            use_case_display = config.get('use_case_name', self.demo_builder.use_case)
             business_context = f"""
 BUSINESS CONTEXT:
+- Use Case: {use_case_display}
+- Target Persona: {target_persona}
+- Business Problem: {business_problem}
+- Demo Objectives: {demo_objectives}
 MANDATORY CONNECTION CODE (MUST BE COMPLETE):
 ```python
                 self.demo_builder.schema_generation_results
             )
+            # DEBUG: Log what was passed
+            ddl_passed = self.demo_builder.schema_generation_results
+            log_progress(f"[DEBUG] DDL type passed to deployer: {type(ddl_passed)}")
+            log_progress(f"[DEBUG] DDL is None: {ddl_passed is None}")
+            if ddl_passed:
+                log_progress(f"[DEBUG] DDL length: {len(ddl_passed)}")
+                log_progress(f"[DEBUG] DDL first 100 chars: {ddl_passed[:100]}")
             if not success:
                 log_progress(f"[ERROR] DDL Deployment failed!")
                 raise Exception(f"Schema deployment failed: {deploy_message}")
             def run_population():
                 try:
+                    # Validate DDL before passing to legitdata
+                    ddl = self.demo_builder.schema_generation_results
+                    if not ddl or not isinstance(ddl, str):
+                        raise Exception(f"DDL is invalid (type: {type(ddl)}). Cannot populate data. Please regenerate DDL.")
+                    # Check if DDL contains the word "None" which would indicate AI generated bad SQL
+                    if ddl == "None" or ddl.strip() == "None":
+                        raise Exception("DDL generation returned 'None'. Please regenerate DDL with a different prompt or model.")
                     success, message, results = populate_demo_data(
+                        ddl_content=ddl,
                         company_url=self.demo_builder.company_url,
                         use_case=self.demo_builder.use_case,
                         schema_name=schema_name,
                 ts_secret = os.getenv('THOUGHTSPOT_SECRET_KEY')
                 liveboard_method = self.settings.get('liveboard_method', 'HYBRID')
+                # Clean company name for display (strip .com, .org, etc)
+                clean_company = company.split('.')[0].title() if '.' in company else company
+                liveboard_name = self.settings.get('liveboard_name', '') or f"{clean_company} - {use_case}"
                 # Get company data for liveboard
                 company_data = {
+                    'name': clean_company,
                     'url': getattr(self.demo_builder, 'company_url', company),
                     'logo_url': getattr(self.demo_builder, 'logo_url', None),
                     'primary_color': getattr(self.demo_builder, 'primary_color', '#3498db'),
                     try:
                         from smart_data_adjuster import SmartDataAdjuster
+                        # Pass the selected LLM model to the adjuster
+                        llm_model = self.settings.get('model', 'claude-sonnet-4')
+                        adjuster = SmartDataAdjuster(database, schema_name, liveboard_guid, llm_model=llm_model)
                         adjuster.connect()
                         if adjuster.load_liveboard_context():
     app.launch(
         server_name="0.0.0.0",
+        server_port=7863,  # Different port from main app (7860) and old chat (7861)
         share=False,
         inbrowser=True,
         debug=True,

demo_personas.py CHANGED Viewed

@@ -5,6 +5,276 @@ All persona data and prompt templates for use case-driven demo preparation
 from schema_utils import extract_key_business_terms
 # Use Case Persona Configurations
 USE_CASE_PERSONAS = {
     "Merchandising": {
@@ -613,9 +883,41 @@ def get_persona_config(use_case):
 def build_company_analysis_prompt(use_case, website_title, website_url, website_content, css_count, logo_candidates):
     """Build dynamic company analysis prompt based on use case"""
-    config = get_persona_config(use_case)
-    system_prompt = COMPANY_ANALYSIS_TEMPLATE.format(use_case=use_case, **config)
     # Extract key business terms instead of raw content dump
     key_terms = extract_key_business_terms(website_content, max_chars=1000)
@@ -629,36 +931,87 @@ VISUAL ASSETS SUMMARY:
 CSS Resources: {css_count} stylesheets detected
 Logo Assets: {len(logo_candidates)} logo variations found
-Conduct analysis specifically for {use_case} use case targeting {config['target_persona']} who needs to solve: {config['business_problem']}
-Extract specific, quantifiable information wherever possible that relates to {config['key_metrics']} and {config['persona_focus']}."""
     return system_prompt, user_prompt
 def build_industry_research_prompt(use_case, company_analysis_results):
     """Build dynamic industry research prompt based on use case and company analysis"""
-    config = get_persona_config(use_case)
-    # Format research focus areas as bulleted list
-    research_focus_formatted = "\n".join([f"- {focus}" for focus in config['research_focus']])
-    system_prompt = INDUSTRY_RESEARCH_TEMPLATE.format(
-        use_case=use_case,
-        research_focus_formatted=research_focus_formatted,
-        **config
-    )
-    user_prompt = f"""Conduct comprehensive {use_case} research based on this company analysis:
 COMPANY ANALYSIS RESULTS:
 {company_analysis_results}
-Focus specifically on creating realistic demo scenarios that showcase how ThoughtSpot's {config['thoughtspot_solution']} solves {config['business_problem']} for {config['target_persona']}.
 Provide specific recommendations for:
 1. Database schemas and table structures
 2. Realistic data patterns and volumes
 3. Compelling outlier scenarios
-4. Success metrics that prove ROI: {config['success_outcomes']}"""
     return system_prompt, user_prompt

 from schema_utils import extract_key_business_terms
+# ============================================================================
+# VERTICAL × FUNCTION MATRIX SYSTEM (Phase 1 - February 2026)
+# ============================================================================
+# New composable system replacing flat USE_CASE_PERSONAS
+# Keep USE_CASE_PERSONAS below for backward compatibility during transition
+# ============================================================================
+# VERTICALS: Industry-specific context
+VERTICALS = {
+    "Retail": {
+        "typical_entities": ["Store", "Product", "Category", "Region", "Customer"],
+        "industry_terms": ["SKU", "basket", "shrink", "markdown", "comp sales", "footfall"],
+        "data_patterns": ["seasonality", "holiday_spikes", "weather_impact", "back_to_school"],
+    },
+    "Banking": {
+        "typical_entities": ["Account", "Customer", "Branch", "Product", "Loan"],
+        "industry_terms": ["AUM", "NIM", "deposits", "charge-off", "delinquency", "APR"],
+        "data_patterns": ["month_end_spikes", "rate_sensitivity", "quarter_close"],
+    },
+    "Software": {
+        "typical_entities": ["Account", "User", "Subscription", "Feature", "License"],
+        "industry_terms": ["ARR", "MRR", "churn", "NRR", "seats", "expansion"],
+        "data_patterns": ["renewal_cycles", "usage_spikes", "trial_conversion"],
+    },
+    "Manufacturing": {
+        "typical_entities": ["Plant", "Line", "Product", "Supplier", "Shift"],
+        "industry_terms": ["OEE", "yield", "scrap", "downtime", "throughput", "WIP"],
+        "data_patterns": ["shift_patterns", "maintenance_cycles", "supply_disruptions"],
+    },
+}
+# FUNCTIONS: Department-specific KPIs, visualizations, and patterns
+FUNCTIONS = {
+    "Sales": {
+        "kpis": ["Dollar Sales", "Unit Sales", "ASP"],
+        "kpi_definitions": {
+            "Dollar Sales": "Total revenue ($)",
+            "Unit Sales": "Total units sold",
+            "ASP": "Dollar Sales ÷ Unit Sales (Average Selling Price)",
+        },
+        "viz_types": ["KPI_sparkline", "trend", "by_region", "by_product", "vs_target"],
+        "outlier_categories": ["surge", "decline", "pricing_anomaly", "regional_variance"],
+        "spotter_templates": [
+            "Which {entity} had the highest {kpi} last {period}?",
+            "Show me {kpi} trend by {dimension}",
+            "Why did {kpi} drop last month?",
+            "Compare {kpi} across {dimension}",
+        ],
+    },
+    "Supply Chain": {
+        "kpis": ["Avg Inventory", "OTIF", "Days on Hand", "Stockout Rate"],
+        "kpi_definitions": {
+            "Avg Inventory": "(Beginning Inventory + Ending Inventory) ÷ 2",
+            "OTIF": "On-Time In-Full delivery rate",
+            "Days on Hand": "Inventory ÷ Daily Usage",
+            "Stockout Rate": "% of SKUs with zero inventory",
+        },
+        "viz_types": ["inventory_levels", "stockout_risk", "supplier_perf", "trend"],
+        "outlier_categories": ["stockout", "overstock", "lead_time_spike", "supplier_issue"],
+        "spotter_templates": [
+            "Which {entity} is at risk of stockout?",
+            "Show inventory levels by {dimension}",
+            "Which suppliers have the longest lead times?",
+        ],
+    },
+    "Marketing": {
+        "kpis": ["CTR", "Bounce Rate", "Fill Rate", "Approval Rate"],
+        "kpi_definitions": {
+            "CTR": "Clicks ÷ Impressions (Click-Through Rate)",
+            "Bounce Rate": "% leaving landing page without action",
+            "Fill Rate": "% completing application/form",
+            "Approval Rate": "% of applications approved",
+        },
+        "viz_types": ["funnel", "channel_comparison", "trend", "by_campaign"],
+        "outlier_categories": ["conversion_drop", "channel_spike", "cost_anomaly"],
+        "spotter_templates": [
+            "What is our conversion rate by {channel}?",
+            "Show me the funnel for {campaign}",
+            "Which channel has the highest CTR?",
+        ],
+    },
+}
+# MATRIX_OVERRIDES: Specific Vertical × Function combinations
+# Only specify what differs from the base vertical + function merge
+MATRIX_OVERRIDES = {
+    ("Retail", "Sales"): {
+        "add_kpis": ["Basket Size", "Items per Transaction"],
+        "add_kpi_definitions": {
+            "Basket Size": "Dollar Sales ÷ Transactions",
+            "Items per Transaction": "Unit Sales ÷ Transactions",
+        },
+        "add_viz": ["by_store", "by_category"],
+        "target_persona": "VP Merchandising, Retail Sales Leader",
+        "business_problem": "$1T lost annually to stockouts and overstock",
+    },
+    ("Banking", "Marketing"): {
+        "add_kpis": ["Application Fill Rate", "Cost per Acquisition"],
+        "add_kpi_definitions": {
+            "Application Fill Rate": "% completing loan/account application",
+            "Cost per Acquisition": "Marketing spend ÷ New customers acquired",
+        },
+        "rename_kpis": {"CTR": "Click-through Rate"},
+        "target_persona": "CMO, VP Digital Marketing",
+        "business_problem": "High cost per acquisition, low funnel conversion",
+    },
+    ("Software", "Sales"): {
+        "add_kpis": ["ARR", "Net Revenue Retention", "Pipeline Coverage"],
+        "add_kpi_definitions": {
+            "ARR": "Annual Recurring Revenue",
+            "Net Revenue Retention": "(Starting ARR + Expansion - Churn) ÷ Starting ARR",
+            "Pipeline Coverage": "Pipeline value ÷ Quota",
+        },
+        "add_viz": ["by_segment", "by_rep"],
+        "target_persona": "CRO, VP Sales",
+    },
+}
+def parse_use_case(user_input: str) -> tuple[str | None, str | None]:
+    """
+    Parse user input string like "Retail Sales" into (vertical, function) tuple.
+    Checks for known patterns by testing against VERTICALS.keys() and FUNCTIONS.keys().
+    Handles case-insensitive matching.
+    Args:
+        user_input: User input string like "Retail Sales", "Banking Marketing", etc.
+    Returns:
+        Tuple of (vertical, function) like ("Retail", "Sales")
+        Returns (None, None) for unclear inputs
+    """
+    if not user_input or not user_input.strip():
+        return (None, None)
+    user_input_lower = user_input.strip().lower()
+    # Try to find both vertical and function in the input
+    found_vertical = None
+    found_function = None
+    # Check for known verticals (case-insensitive)
+    for vertical in VERTICALS.keys():
+        if vertical.lower() in user_input_lower:
+            found_vertical = vertical
+            break
+    # Check for known functions (case-insensitive)
+    for function in FUNCTIONS.keys():
+        if function.lower() in user_input_lower:
+            found_function = function
+            break
+    # If we found both, return them
+    if found_vertical and found_function:
+        return (found_vertical, found_function)
+    # If we found only one, return it with None for the other
+    if found_vertical:
+        return (found_vertical, None)
+    if found_function:
+        return (None, found_function)
+    # If we found neither, return (None, None)
+    return (None, None)
+def get_use_case_config(vertical: str, function: str) -> dict:
+    """
+    Merge vertical + function + overrides into final configuration.
+    Handles known combinations, partial matches, and fully generic cases.
+    Args:
+        vertical: Industry vertical (e.g., "Retail")
+        function: Functional department (e.g., "Sales")
+    Returns:
+        Complete configuration dict with all fields merged
+    """
+    v = VERTICALS.get(vertical, {})
+    f = FUNCTIONS.get(function, {})
+    override = MATRIX_OVERRIDES.get((vertical, function), {})
+    # Determine if this is a known, partial, or generic case
+    is_known_vertical = vertical in VERTICALS
+    is_known_function = function in FUNCTIONS
+    # Build base config
+    config = {
+        # Metadata
+        "vertical": vertical,
+        "function": function,
+        "use_case_name": f"{vertical} {function}",
+        # From vertical
+        "entities": v.get("typical_entities", []).copy(),
+        "industry_terms": v.get("industry_terms", []).copy(),
+        "data_patterns": v.get("data_patterns", []).copy(),
+        # From function (copy to allow modification)
+        "kpis": f.get("kpis", []).copy(),
+        "kpi_definitions": f.get("kpi_definitions", {}).copy(),
+        "viz_types": f.get("viz_types", []).copy(),
+        "outlier_categories": f.get("outlier_categories", []).copy(),
+        "spotter_templates": f.get("spotter_templates", []).copy(),
+        # Flags
+        "is_generic": False,
+        "ai_should_determine": [],
+    }
+    # Apply overrides
+    if override.get("add_kpis"):
+        config["kpis"].extend(override["add_kpis"])
+    if override.get("add_kpi_definitions"):
+        config["kpi_definitions"].update(override["add_kpi_definitions"])
+    if override.get("add_viz"):
+        config["viz_types"].extend(override["add_viz"])
+    if override.get("rename_kpis"):
+        for old, new in override["rename_kpis"].items():
+            if old in config["kpis"]:
+                idx = config["kpis"].index(old)
+                config["kpis"][idx] = new
+    if override.get("target_persona"):
+        config["target_persona"] = override["target_persona"]
+    if override.get("business_problem"):
+        config["business_problem"] = override["business_problem"]
+    # Handle generic cases
+    if not is_known_vertical and not is_known_function:
+        # Fully generic
+        config["is_generic"] = True
+        config["ai_should_determine"] = ["entities", "industry_terms", "kpis", "viz_types", "outliers"]
+        config["prompt_user_for"] = ["key_metrics", "target_persona", "business_questions"]
+    elif not is_known_vertical:
+        # Known function, unknown vertical
+        config["is_generic"] = True
+        config["ai_should_determine"] = ["entities", "industry_terms", "data_patterns"]
+    elif not is_known_function:
+        # Known vertical, unknown function
+        config["is_generic"] = True
+        config["ai_should_determine"] = ["kpis", "viz_types", "outliers"]
+    # Add legacy fields for backward compatibility with existing prompts
+    if "demo_objectives" not in config:
+        config["demo_objectives"] = f"Demonstrate {function} analytics capabilities with {vertical}-specific insights"
+    if "key_metrics" not in config:
+        config["key_metrics"] = ", ".join(config["kpis"][:5]) if config["kpis"] else "revenue, growth, efficiency"
+    if "research_focus" not in config:
+        config["research_focus"] = config["industry_terms"][:5] if config["industry_terms"] else []
+    if "thoughtspot_solution" not in config:
+        config["thoughtspot_solution"] = f"Self-service analytics for {vertical} {function} teams"
+    if "persona_focus" not in config:
+        config["persona_focus"] = f"{function} optimization and decision-making"
+    if "cost_impact" not in config:
+        config["cost_impact"] = "Significant business impact through data-driven decisions"
+    if "success_outcomes" not in config:
+        config["success_outcomes"] = f"Improved {function.lower()} performance and faster insights"
+    return config
+# ============================================================================
+# LEGACY USE CASE PERSONAS (Backward Compatibility)
+# ============================================================================
+# Keep for backward compatibility during transition
+# New code should use get_use_case_config() instead
+# ============================================================================
 # Use Case Persona Configurations
 USE_CASE_PERSONAS = {
     "Merchandising": {
 def build_company_analysis_prompt(use_case, website_title, website_url, website_content, css_count, logo_candidates):
     """Build dynamic company analysis prompt based on use case"""
+    # Parse use case into vertical and function
+    vertical, function = parse_use_case(use_case)
+    # Get config from new system, fallback to legacy if needed
+    if vertical or function:
+        config = get_use_case_config(vertical or "Generic", function or "Generic")
+        # Map new config fields to legacy template fields
+        use_case_display = config.get('use_case_name', use_case)
+        target_persona = config.get('target_persona', 'Business Leader')
+        business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
+        # Convert KPIs list to key_metrics string
+        kpis = config.get('kpis', [])
+        key_metrics = ', '.join(kpis) if kpis else 'key operational metrics'
+        # Use function as persona_focus, or derive from vertical
+        persona_focus = function or vertical or 'operational efficiency, data-driven decisions'
+    else:
+        # Fallback to legacy system for unrecognized use cases
+        config = get_persona_config(use_case)
+        use_case_display = use_case
+        target_persona = config.get('target_persona', 'Business Leader')
+        business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
+        key_metrics = config.get('key_metrics', 'key operational metrics')
+        persona_focus = config.get('persona_focus', 'operational efficiency, data-driven decisions')
+    # Build template dict with mapped fields
+    template_dict = {
+        'use_case': use_case_display,
+        'target_persona': target_persona,
+        'business_problem': business_problem,
+        'key_metrics': key_metrics,
+        'persona_focus': persona_focus,
+        'cost_impact': config.get('cost_impact', 'Lost opportunities from data bottlenecks'),
+    }
+    system_prompt = COMPANY_ANALYSIS_TEMPLATE.format(**template_dict)
     # Extract key business terms instead of raw content dump
     key_terms = extract_key_business_terms(website_content, max_chars=1000)
 CSS Resources: {css_count} stylesheets detected
 Logo Assets: {len(logo_candidates)} logo variations found
+Conduct analysis specifically for {use_case_display} use case targeting {target_persona} who needs to solve: {business_problem}
+Extract specific, quantifiable information wherever possible that relates to {key_metrics} and {persona_focus}."""
     return system_prompt, user_prompt
 def build_industry_research_prompt(use_case, company_analysis_results):
     """Build dynamic industry research prompt based on use case and company analysis"""
+    # Parse use case into vertical and function
+    vertical, function = parse_use_case(use_case)
+    # Get config from new system, fallback to legacy if needed
+    if vertical or function:
+        config = get_use_case_config(vertical or "Generic", function or "Generic")
+        # Map new config fields to legacy template fields
+        use_case_display = config.get('use_case_name', use_case)
+        target_persona = config.get('target_persona', 'Business Leader')
+        business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
+        # Convert KPIs list to key_metrics string
+        kpis = config.get('kpis', [])
+        key_metrics = ', '.join(kpis) if kpis else 'key operational metrics'
+        # Use function as persona_focus, or derive from vertical
+        persona_focus = function or vertical or 'operational efficiency, data-driven decisions'
+        # Build research focus from entities, industry_terms, and data_patterns
+        entities = config.get('entities', [])
+        industry_terms = config.get('industry_terms', [])
+        data_patterns = config.get('data_patterns', [])
+        research_focus_list = []
+        if entities:
+            research_focus_list.append(f"Core entities: {', '.join(entities[:5])}")
+        if industry_terms:
+            research_focus_list.append(f"Industry terminology: {', '.join(industry_terms[:5])}")
+        if data_patterns:
+            research_focus_list.append(f"Data patterns: {', '.join(data_patterns[:3])}")
+        if not research_focus_list:
+            research_focus_list = ["core business processes", "key operational metrics", "competitive positioning"]
+        research_focus_formatted = "\n".join([f"- {focus}" for focus in research_focus_list])
+        # Default values for fields not in new system
+        thoughtspot_solution = f"AI-powered analytics for {use_case_display}"
+        success_outcomes = "Faster insights, improved decision making, operational efficiency gains"
+        demo_objectives = f"Show self-service analytics for {use_case_display}"
+    else:
+        # Fallback to legacy system for unrecognized use cases
+        config = get_persona_config(use_case)
+        use_case_display = use_case
+        target_persona = config.get('target_persona', 'Business Leader')
+        business_problem = config.get('business_problem', 'Need for faster, data-driven decisions')
+        key_metrics = config.get('key_metrics', 'key operational metrics')
+        persona_focus = config.get('persona_focus', 'operational efficiency, data-driven decisions')
+        research_focus_formatted = "\n".join([f"- {focus}" for focus in config.get('research_focus', [])])
+        thoughtspot_solution = config.get('thoughtspot_solution', 'Self-service analytics platform')
+        success_outcomes = config.get('success_outcomes', 'Faster insights, improved decision making')
+        demo_objectives = config.get('demo_objectives', 'Show self-service analytics')
+    # Build template dict with mapped fields
+    template_dict = {
+        'use_case': use_case_display,
+        'target_persona': target_persona,
+        'business_problem': business_problem,
+        'key_metrics': key_metrics,
+        'persona_focus': persona_focus,
+        'research_focus_formatted': research_focus_formatted,
+        'thoughtspot_solution': thoughtspot_solution,
+        'success_outcomes': success_outcomes,
+        'demo_objectives': demo_objectives,
+        'cost_impact': config.get('cost_impact', 'Lost opportunities from data bottlenecks'),
+    }
+    system_prompt = INDUSTRY_RESEARCH_TEMPLATE.format(**template_dict)
+    user_prompt = f"""Conduct comprehensive {use_case_display} research based on this company analysis:
 COMPANY ANALYSIS RESULTS:
 {company_analysis_results}
+Focus specifically on creating realistic demo scenarios that showcase how ThoughtSpot's {thoughtspot_solution} solves {business_problem} for {target_persona}.
 Provide specific recommendations for:
 1. Database schemas and table structures
 2. Realistic data patterns and volumes
 3. Compelling outlier scenarios
+4. Success metrics that prove ROI: {success_outcomes}"""
     return system_prompt, user_prompt

demo_prep.py CHANGED Viewed

@@ -2548,8 +2548,7 @@ Schema Validation: Will be checked next...
                             value="*Database schema will appear here after Create stage*",
                             language="sql",
                             interactive=False,
-                            lines=20,
-                            max_lines=30
                         )
                     with gr.Column(scale=1):
                         edit_ddl_btn = gr.Button("🔍 DDL", elem_classes=["edit-btn"])
@@ -2579,8 +2578,7 @@ Schema Validation: Will be checked next...
                     value="Generated Python code will appear here after population step",
                     language="python",
                     interactive=False,
-                    lines=10,
-                    max_lines=15,
                 )
                 with gr.Row():

                             value="*Database schema will appear here after Create stage*",
                             language="sql",
                             interactive=False,
+                            lines=20
                         )
                     with gr.Column(scale=1):
                         edit_ddl_btn = gr.Button("🔍 DDL", elem_classes=["edit-btn"])
                     value="Generated Python code will appear here after population step",
                     language="python",
                     interactive=False,
+                    lines=10
                 )
                 with gr.Row():

liveboard_creator.py CHANGED Viewed

@@ -27,6 +27,61 @@ _direct_api_token = None
 _direct_api_session = None
 def _get_direct_api_session():
     """
     Get or create an authenticated session for direct ThoughtSpot API calls.
@@ -2815,49 +2870,21 @@ def create_liveboard_from_model_mcp(
         print(f"[MCP] Starting async MCP liveboard creation...")
         try:
             print(f"[MCP] Importing MCP modules...")
-            from mcp import ClientSession
-            from mcp.client.streamable_http import streamablehttp_client
             print(f"[MCP] MCP modules imported successfully")
-            # ALWAYS use bearer auth with trusted auth token
-            # This ensures MCP uses the same org as our table/model deployment
-            print(f"[MCP] Using bearer auth (same org as trusted auth)")
-            # Get auth token from our direct API session (trusted auth)
-            session_obj = _get_direct_api_session()
-            if not session_obj or not _direct_api_token:
-                print(f"[MCP ERROR] Failed to get auth token for bearer auth")
-                return {
-                    'success': False,
-                    'error': 'Failed to authenticate for MCP bearer auth'
-                }
-            ts_host = os.getenv('THOUGHTSPOT_URL', '').rstrip('/').replace('https://', '').replace('http://', '')
-            bearer_token = _direct_api_token
-            # Bearer auth format: "Bearer {token}@{host}"
-            # This is ThoughtSpot's MCP server format for bearer endpoint
-            auth_header = f"Bearer {bearer_token}@{ts_host}"
-            # Use /bearer/mcp endpoint (Streamable HTTP transport, not SSE)
-            mcp_endpoint = "https://agent.thoughtspot.app/bearer/mcp"
-            print(f"[MCP] Bearer endpoint: {mcp_endpoint}")
-            print(f"[MCP] Host: {ts_host}")
-            print(f"[MCP] Token: {bearer_token[:20]}...")
-            # Use Streamable HTTP client with bearer auth headers
-            # This bypasses OAuth and uses our trusted auth token directly
-            headers = {"Authorization": auth_header}
-            print(f"[MCP] Starting Streamable HTTP client with bearer auth...")
-            async with streamablehttp_client(mcp_endpoint, headers=headers) as (read, write, _get_session_id):
-                print(f"DEBUG: Streamable HTTP client context established")
-                print(f"DEBUG: Creating ClientSession...")
                 async with ClientSession(read, write) as session:
-                    print(f"DEBUG: ClientSession context entered")
-                    print(f"DEBUG: Calling session.initialize()...")
                     await session.initialize()
-                    print(f"DEBUG: session.initialize() completed")
                     # Verify connection with ping
                     print(f"Pinging MCP server...")
@@ -2955,6 +2982,9 @@ def create_liveboard_from_model_mcp(
                                 # Use direct ThoughtSpot API (bypasses MCP proxy issues)
                                 answer_data = _get_answer_direct(question_text, model_id)
                                 if answer_data:
                                     print(f"  🔍 DEBUG: Direct API answer keys: {list(answer_data.keys())}")
                                     answers.append(answer_data)
                                     print(f"  ✅ Answer retrieved (direct API)", flush=True)
@@ -2966,6 +2996,9 @@ def create_liveboard_from_model_mcp(
                                         "datasourceId": model_id
                                     })
                                     answer_data = json.loads(answer_result.content[0].text)
                                     answers.append(answer_data)
                                     print(f"  ✅ Answer retrieved (MCP fallback)", flush=True)
                             else:
@@ -2982,6 +3015,9 @@ def create_liveboard_from_model_mcp(
                                 # Parse answer data
                                 answer_data = json.loads(answer_result.content[0].text)
                                 print(f"  🔍 DEBUG: Answer keys: {list(answer_data.keys())}")
                                 answers.append(answer_data)
                                 print(f"  ✅ Answer retrieved", flush=True)
@@ -3243,6 +3279,57 @@ def create_liveboard_from_model_mcp(
                                         })
                                         print(f"   ✓ Added dark theme style to Viz_1")
                                     # Re-import fixed TML using authenticated session
                                     import_response = ts_client.session.post(
                                         f"{ts_base_url}/api/rest/2.0/metadata/tml/import",

 _direct_api_session = None
+def _clean_viz_title(title: str) -> str:
+    """
+    Clean up visualization titles to be more readable.
+    Examples:
+        'shipping_cost by month last 18 months' → 'Shipping Cost by Month'
+        'Top 15 product_name by quantity_shipped' → 'Top 15 Products by Quantity Shipped'
+        'total_revenue weekly' → 'Total Revenue Weekly'
+    """
+    if not title:
+        return title
+    # Remove date filter suffixes
+    date_filters = [
+        ' last 18 months', ' last 12 months', ' last 2 years', ' last year',
+        ' last 6 months', ' last 3 months', ' last 30 days', ' last 90 days'
+    ]
+    for filter_str in date_filters:
+        if title.lower().endswith(filter_str):
+            title = title[:-len(filter_str)]
+    # Replace underscores with spaces
+    title = title.replace('_', ' ')
+    # Clean up common column name patterns
+    replacements = {
+        'product name': 'Products',
+        'supplier name': 'Suppliers',
+        'warehouse name': 'Warehouses',
+        'customer name': 'Customers',
+        'brand name': 'Brands',
+        'store name': 'Stores',
+        'category name': 'Categories',
+        'region name': 'Regions',
+    }
+    title_lower = title.lower()
+    for old, new in replacements.items():
+        if old in title_lower:
+            # Case-insensitive replace
+            import re
+            title = re.sub(re.escape(old), new, title, flags=re.IGNORECASE)
+    # Title case the result, but preserve words like 'by', 'vs', 'and'
+    words = title.split()
+    result = []
+    small_words = {'by', 'vs', 'and', 'or', 'the', 'a', 'an', 'of', 'in', 'on', 'to'}
+    for i, word in enumerate(words):
+        if i == 0 or word.lower() not in small_words:
+            result.append(word.capitalize())
+        else:
+            result.append(word.lower())
+    return ' '.join(result)
 def _get_direct_api_session():
     """
     Get or create an authenticated session for direct ThoughtSpot API calls.
         print(f"[MCP] Starting async MCP liveboard creation...")
         try:
             print(f"[MCP] Importing MCP modules...")
+            from mcp import ClientSession, StdioServerParameters
+            from mcp.client.stdio import stdio_client
             print(f"[MCP] MCP modules imported successfully")
+            # Use stdio client with npx mcp-remote proxy
+            # This connects to ThoughtSpot's public MCP endpoint via npx proxy
+            print(f"[MCP] Initializing stdio connection via npx mcp-remote...")
+            server_params = StdioServerParameters(
+                command="npx",
+                args=["mcp-remote@latest", "https://agent.thoughtspot.app/mcp"]
+            )
+            async with stdio_client(server_params) as (read, write):
                 async with ClientSession(read, write) as session:
                     await session.initialize()
                     # Verify connection with ping
                     print(f"Pinging MCP server...")
                                 # Use direct ThoughtSpot API (bypasses MCP proxy issues)
                                 answer_data = _get_answer_direct(question_text, model_id)
                                 if answer_data:
+                                    # Clean up the viz title
+                                    if 'question' in answer_data:
+                                        answer_data['question'] = _clean_viz_title(answer_data['question'])
                                     print(f"  🔍 DEBUG: Direct API answer keys: {list(answer_data.keys())}")
                                     answers.append(answer_data)
                                     print(f"  ✅ Answer retrieved (direct API)", flush=True)
                                         "datasourceId": model_id
                                     })
                                     answer_data = json.loads(answer_result.content[0].text)
+                                    # Clean up the viz title
+                                    if 'question' in answer_data:
+                                        answer_data['question'] = _clean_viz_title(answer_data['question'])
                                     answers.append(answer_data)
                                     print(f"  ✅ Answer retrieved (MCP fallback)", flush=True)
                             else:
                                 # Parse answer data
                                 answer_data = json.loads(answer_result.content[0].text)
+                                # Clean up the viz title
+                                if 'question' in answer_data:
+                                    answer_data['question'] = _clean_viz_title(answer_data['question'])
                                 print(f"  🔍 DEBUG: Answer keys: {list(answer_data.keys())}")
                                 answers.append(answer_data)
                                 print(f"  ✅ Answer retrieved", flush=True)
                                         })
                                         print(f"   ✓ Added dark theme style to Viz_1")
+                                    # Convert time-series visualizations to KPIs with sparklines
+                                    print(f"   🔄 Converting time-series charts to KPIs...")
+                                    kpi_count = 0
+                                    for viz in visualizations:
+                                        if viz.get('id') == 'Viz_1':
+                                            continue  # Skip note tile
+                                        answer = viz.get('answer', {})
+                                        viz_name = answer.get('name', '').lower()
+                                        search_query = answer.get('search_query', '').lower()
+                                        # Check if this is a time-series viz (weekly, monthly, daily patterns)
+                                        time_patterns = ['weekly', 'monthly', 'daily', 'quarterly', 'yearly', '.week', '.month', '.day', '.quarter', '.year']
+                                        is_time_series = any(p in viz_name or p in search_query for p in time_patterns)
+                                        if is_time_series and 'chart' in answer:
+                                            # Convert to KPI
+                                            answer['chart']['type'] = 'KPI'
+                                            # Add KPI-specific settings for sparkline and comparison
+                                            kpi_settings = {
+                                                "showLabel": True,
+                                                "showComparison": True,
+                                                "showSparkline": True,
+                                                "showAnomalies": False,
+                                                "showBounds": False,
+                                                "customCompare": "PREV_AVAILABLE",
+                                                "showOnlyLatestAnomaly": False
+                                            }
+                                            # Update client_state_v2 with KPI settings
+                                            import json as json_module
+                                            client_state = answer['chart'].get('client_state_v2', '{}')
+                                            try:
+                                                cs = json_module.loads(client_state) if client_state else {}
+                                                if 'chartProperties' not in cs:
+                                                    cs['chartProperties'] = {}
+                                                if 'chartSpecific' not in cs['chartProperties']:
+                                                    cs['chartProperties']['chartSpecific'] = {}
+                                                cs['chartProperties']['chartSpecific']['customProps'] = json_module.dumps(kpi_settings)
+                                                cs['chartProperties']['chartSpecific']['dataFieldArea'] = 'column'
+                                                answer['chart']['client_state_v2'] = json_module.dumps(cs)
+                                            except:
+                                                pass  # Keep existing if parsing fails
+                                            kpi_count += 1
+                                            print(f"   ✓ Converted '{answer.get('name', '?')}' to KPI")
+                                    if kpi_count > 0:
+                                        print(f"   ✅ Converted {kpi_count} visualizations to KPIs with sparklines")
                                     # Re-import fixed TML using authenticated session
                                     import_response = ts_client.session.post(
                                         f"{ts_base_url}/api/rest/2.0/metadata/tml/import",

outlier_system.py CHANGED Viewed

@@ -25,13 +25,205 @@ Usage:
 import re
 import os
 from typing import Dict, List, Optional, Tuple
-from dataclasses import dataclass
 from datetime import datetime
 @dataclass
 class OutlierPattern:
-    """Represents a data pattern to inject."""
     title: str
     description: str
     sql_update: str
@@ -115,7 +307,7 @@ class OutlierGenerator:
             target_table, target_column, conditions, pattern_description
         )
-        return OutlierPattern(
             title=parsed.get('title', pattern_description[:50]),
             description=pattern_description,
             sql_update=sql,
@@ -422,7 +614,7 @@ WHERE product_id IN (
 def apply_outliers(
     snowflake_conn,
-    outliers: List[OutlierPattern],
     schema_name: str,
     dry_run: bool = False
 ) -> List[Dict]:
@@ -483,7 +675,7 @@ def apply_outliers(
 def generate_demo_pack(
-    outliers: List[OutlierPattern],
     company_name: str,
     use_case: str
 ) -> str:

 import re
 import os
 from typing import Dict, List, Optional, Tuple
+from dataclasses import dataclass, field
 from datetime import datetime
+# ============================================================================
+# Phase 1: New Structured Outlier System (February 2026 Sprint)
+# ============================================================================
 @dataclass
 class OutlierPattern:
+    """
+    Defines a single outlier pattern that serves three purposes:
+    1. Liveboard visualizations
+    2. Spotter questions
+    3. Demo talking points
+    """
+    # Identity
+    name: str                                    # "ASP Decline"
+    category: str                                # "pricing", "volume", "inventory"
+    # For LIVEBOARD (visualization)
+    viz_type: str                                # "KPI", "COLUMN", "LINE"
+    viz_question: str                            # "ASP weekly"
+    viz_talking_point: str                       # "ASP dropped 12% — excessive discounting"
+    # For SPOTTER (ad-hoc questions)
+    spotter_questions: List[str] = field(default_factory=list)
+    spotter_followups: List[str] = field(default_factory=list)
+    # For DATA INJECTION (SQL generation)
+    sql_template: str = ""                       # "UPDATE {fact_table} SET {column} = ..."
+    affected_columns: List[str] = field(default_factory=list)
+    magnitude: str = ""                          # "15% below normal"
+    target_filter: str = ""                      # "WHERE REGION = 'West'"
+    # For DEMO NOTES
+    demo_setup: str = ""                         # "Start by showing overall sales are UP"
+    demo_payoff: str = ""                        # "Then reveal ASP is DOWN — 'at what cost?'"
+@dataclass
+class OutlierConfig:
+    """
+    Configuration for outliers per use case.
+    Combines required patterns, optional patterns, and AI generation guidance.
+    """
+    required: List[OutlierPattern] = field(default_factory=list)    # Always include
+    optional: List[OutlierPattern] = field(default_factory=list)    # AI picks 1-2
+    allow_ai_generated: bool = True                                  # AI can create 1 custom
+    ai_guidance: str = ""                                            # Hint for AI generation
+OUTLIER_CONFIGS = {
+    ("Retail", "Sales"): OutlierConfig(
+        required=[
+            OutlierPattern(
+                name="ASP Decline",
+                category="pricing",
+                viz_type="KPI",
+                viz_question="ASP weekly",
+                viz_talking_point="ASP dropped 12% even though revenue is up — we're discounting too heavily",
+                spotter_questions=[
+                    "Why did ASP drop last month?",
+                    "Which products have the biggest discount?",
+                    "Show me ASP by region",
+                ],
+                spotter_followups=[
+                    "Compare to same period last year",
+                    "Which stores are discounting most?",
+                ],
+                sql_template="UPDATE {fact_table} SET UNIT_PRICE = UNIT_PRICE * 0.85 WHERE REGION = 'West' AND {date_column} > '{recent_date}'",
+                affected_columns=["UNIT_PRICE", "DISCOUNT_PCT"],
+                magnitude="15% below normal",
+                target_filter="WHERE REGION = 'West'",
+                demo_setup="Start by showing overall sales are UP — everything looks good",
+                demo_payoff="Then reveal ASP is DOWN — 'but at what cost?' moment",
+            ),
+            OutlierPattern(
+                name="Regional Variance",
+                category="geographic",
+                viz_type="COLUMN",
+                viz_question="Dollar Sales by Region",
+                viz_talking_point="West region outperforming by 40% — what are they doing differently?",
+                spotter_questions=[
+                    "Which region has the highest sales?",
+                    "Compare West to East performance",
+                ],
+                spotter_followups=[
+                    "What products are driving West?",
+                    "Show me the trend for West region",
+                ],
+                sql_template="UPDATE {fact_table} SET QUANTITY = QUANTITY * 1.4 WHERE REGION = 'West'",
+                affected_columns=["QUANTITY", "REVENUE"],
+                magnitude="40% above other regions",
+                target_filter="WHERE REGION = 'West'",
+                demo_setup="Show overall sales by region",
+                demo_payoff="West is crushing it — drill in to find out why",
+            ),
+        ],
+        optional=[
+            OutlierPattern(
+                name="Seasonal Spike",
+                category="temporal",
+                viz_type="LINE",
+                viz_question="Dollar Sales trend by month",
+                viz_talking_point="Holiday surge 3x normal — were we prepared?",
+                spotter_questions=["Show me sales trend for Q4", "When was our peak sales day?"],
+                spotter_followups=[],
+                sql_template="UPDATE {fact_table} SET QUANTITY = QUANTITY * 3 WHERE MONTH IN (11, 12)",
+                affected_columns=["QUANTITY", "REVENUE"],
+                magnitude="3x normal",
+                target_filter="WHERE MONTH IN (11, 12)",
+                demo_setup="",
+                demo_payoff="",
+            ),
+            OutlierPattern(
+                name="Category Surge",
+                category="product",
+                viz_type="COLUMN",
+                viz_question="Dollar Sales by Category",
+                viz_talking_point="Electronics up 60% YoY while Apparel flat",
+                spotter_questions=["Which category grew fastest?", "Compare Electronics to Apparel"],
+                spotter_followups=[],
+                sql_template="",
+                affected_columns=[],
+                magnitude="60% YoY",
+                target_filter="",
+                demo_setup="",
+                demo_payoff="",
+            ),
+        ],
+        allow_ai_generated=True,
+        ai_guidance="If company has sustainability initiatives, create outlier around eco-friendly product sales",
+    ),
+    ("Banking", "Marketing"): OutlierConfig(
+        required=[
+            OutlierPattern(
+                name="Funnel Drop-off",
+                category="conversion",
+                viz_type="COLUMN",
+                viz_question="Conversion rate by funnel stage",
+                viz_talking_point="70% drop-off at application page — UX issue?",
+                spotter_questions=[
+                    "Where is our biggest funnel drop-off?",
+                    "What's our application completion rate?",
+                ],
+                spotter_followups=[],
+                sql_template="",
+                affected_columns=[],
+                magnitude="70% drop-off",
+                target_filter="",
+                demo_setup="Show the full funnel from impression to approval",
+                demo_payoff="The application page is killing conversions",
+            ),
+        ],
+        optional=[
+            OutlierPattern(
+                name="Channel Performance",
+                category="channel",
+                viz_type="COLUMN",
+                viz_question="CTR by channel",
+                viz_talking_point="Mobile CTR 2x desktop — shift budget?",
+                spotter_questions=["Which channel has the best CTR?"],
+                spotter_followups=[],
+                sql_template="",
+                affected_columns=[],
+                magnitude="2x desktop",
+                target_filter="",
+                demo_setup="",
+                demo_payoff="",
+            ),
+        ],
+        allow_ai_generated=True,
+        ai_guidance="Consider seasonal patterns in loan applications",
+    ),
+}
+def get_outliers_for_use_case(vertical: str, function: str) -> OutlierConfig:
+    """Get outlier configuration for a use case, with fallback to empty config."""
+    return OUTLIER_CONFIGS.get(
+        (vertical, function),
+        OutlierConfig(
+            required=[],
+            optional=[],
+            allow_ai_generated=True,
+            ai_guidance=f"Generate outliers appropriate for {vertical} {function}"
+        )
+    )
+# ============================================================================
+# Legacy Outlier System (existing code below)
+# ============================================================================
+@dataclass
+class LegacyOutlierPattern:
+    """Represents a data pattern to inject (legacy structure)."""
     title: str
     description: str
     sql_update: str
             target_table, target_column, conditions, pattern_description
         )
+        return LegacyOutlierPattern(
             title=parsed.get('title', pattern_description[:50]),
             description=pattern_description,
             sql_update=sql,
 def apply_outliers(
     snowflake_conn,
+    outliers: List[LegacyOutlierPattern],
     schema_name: str,
     dry_run: bool = False
 ) -> List[Dict]:
 def generate_demo_pack(
+    outliers: List[LegacyOutlierPattern],
     company_name: str,
     use_case: str
 ) -> str:

prompts.py CHANGED Viewed

@@ -408,4 +408,159 @@ REQUIREMENTS:
 - Add data validation and error handling
 - Generate complete .env file template
-Generate executable code that creates compelling {use_case} demo data for {company_name}."""

 - Add data validation and error handling
 - Generate complete .env file template
+Generate executable code that creates compelling {use_case} demo data for {company_name}."""
+# ============================================================================
+# UNIFIED PROMPT BUILDING SYSTEM (Phase 1 - February 2026)
+# ============================================================================
+# New composable prompt construction system that assembles context sections
+# consistently across all stages (research, DDL, liveboard, demo notes)
+# ============================================================================
+def build_prompt(
+    stage: str,
+    vertical: str,
+    function: str,
+    company_context: str,
+    user_overrides: str = None,
+) -> str:
+    """
+    Build a complete prompt by assembling context sections.
+    Args:
+        stage: One of "research", "ddl", "liveboard", "demo_notes"
+        vertical: Industry vertical (e.g., "Retail")
+        function: Functional department (e.g., "Sales")
+        company_context: Text from website research
+        user_overrides: Optional user requirements that override defaults
+    Returns:
+        Complete prompt string ready for LLM
+    """
+    from demo_personas import get_use_case_config
+    from outlier_system import get_outliers_for_use_case
+    # Get merged configuration
+    config = get_use_case_config(vertical, function)
+    outliers = get_outliers_for_use_case(vertical, function)
+    # Build sections
+    sections = []
+    # Section A: Company Context
+    sections.append(f"""## COMPANY CONTEXT
+{company_context}""")
+    # Section B: Use Case Framework
+    persona = config.get("target_persona", "Business Leader")
+    problem = config.get("business_problem", "Need for faster, data-driven decisions")
+    sections.append(f"""## USE CASE
+- **Name:** {vertical} {function}
+- **Target Persona:** {persona}
+- **Business Problem:** {problem}
+- **Industry Terms:** {', '.join(config.get('industry_terms', []))}
+- **Typical Entities:** {', '.join(config.get('entities', []))}""")
+    # Section C: Required KPIs and Visualizations
+    kpi_text = "\n".join([f"- {kpi}: {config['kpi_definitions'].get(kpi, '')}" for kpi in config.get('kpis', [])])
+    sections.append(f"""## REQUIRED KPIs
+{kpi_text}
+## REQUIRED VISUALIZATIONS
+{', '.join(config.get('viz_types', []))}""")
+    # Section D: Outlier Patterns
+    if outliers.required:
+        outlier_text = "\n".join([f"- **{o.name}:** {o.viz_talking_point}" for o in outliers.required])
+        sections.append(f"""## DATA STORIES TO CREATE
+{outlier_text}""")
+    # Section E: Spotter Questions
+    spotter_qs = []
+    for o in outliers.required:
+        spotter_qs.extend(o.spotter_questions[:2])  # Top 2 from each required outlier
+    if spotter_qs:
+        sections.append(f"""## SPOTTER QUESTIONS TO ENABLE
+{chr(10).join(['- ' + q for q in spotter_qs[:6]])}""")
+    # Section F: User Overrides
+    if user_overrides:
+        sections.append(f"""## USER REQUIREMENTS (override defaults)
+{user_overrides}""")
+    # Section G: AI Guidance
+    if config.get("is_generic"):
+        ai_tasks = config.get("ai_should_determine", [])
+        sections.append(f"""## AI TASKS (Generic Use Case)
+This is a generic use case without pre-defined configuration.
+Please determine the following based on company context:
+{chr(10).join(['- ' + task for task in ai_tasks])}""")
+    else:
+        sections.append("""## AI GUIDANCE
+- Include all REQUIRED KPIs and visualizations listed above
+- You may add 2-3 additional items if valuable for this specific company
+- If you add something, briefly explain why""")
+    # Assemble final prompt
+    context_block = "\n\n---\n\n".join(sections)
+    # Get stage-specific template
+    template = STAGE_TEMPLATES.get(stage, DEFAULT_TEMPLATE)
+    return template.format(
+        context=context_block,
+        vertical=vertical,
+        function=function,
+    )
+# Stage-specific templates
+STAGE_TEMPLATES = {
+    "research": """You are a business intelligence analyst researching a company for demo preparation.
+{context}
+---
+Provide comprehensive research focusing on information that will help create a compelling {vertical} {function} demo.""",
+    "ddl": """You are a database architect creating a schema for a {vertical} {function} demo.
+{context}
+---
+Create Snowflake DDL that supports all the KPIs, visualizations, and data stories listed above.
+Follow star schema design with clear fact and dimension tables.""",
+    "liveboard": """You are creating a ThoughtSpot liveboard for a {vertical} {function} demo.
+{context}
+---
+Generate visualization questions that will create a compelling liveboard.
+The first two questions MUST be KPIs with sparklines (format: "{{measure}} weekly" or "{{measure}} monthly").
+Include visualizations that enable the data stories and Spotter questions listed above.""",
+    "demo_notes": """You are creating demo talking points for a {vertical} {function} demo.
+{context}
+---
+Create a bullet outline demo script with:
+- Opening hook and problem statement
+- Key visualizations to show with talking points
+- The "aha moment" reveal
+- Spotter questions to ask live
+- Closing value proposition""",
+}
+DEFAULT_TEMPLATE = """You are helping create a {vertical} {function} demo.
+{context}
+---
+Provide output appropriate for this use case."""

smart_data_adjuster.py CHANGED Viewed

@@ -7,7 +7,6 @@ Bundles confirmations into one step when confident.
 import os
 from typing import Dict, List, Optional, Tuple
-from openai import OpenAI
 from snowflake_auth import get_snowflake_connection
 from thoughtspot_deployer import ThoughtSpotDeployer
 import json
@@ -16,18 +15,67 @@ import json
 class SmartDataAdjuster:
     """Smart adjuster with liveboard context and conversational flow"""
-    def __init__(self, database: str, schema: str, liveboard_guid: str):
         self.database = database
         self.schema = schema
         self.liveboard_guid = liveboard_guid
         self.conn = None
         self.ts_client = None
-        self.openai_client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
         # Context about the liveboard
         self.liveboard_name = None
         self.visualizations = []  # List of viz metadata
     def connect(self):
         """Connect to Snowflake and ThoughtSpot"""
         # Snowflake
@@ -300,13 +348,7 @@ CRITICAL: target_value and percentage must be numbers, never strings.
 If unsure about ANY field, set confidence to "low" or "medium".
 """
-        response = self.openai_client.chat.completions.create(
-            model="gpt-4o",
-            messages=[{"role": "user", "content": prompt}],
-            temperature=0
-        )
-        content = response.choices[0].message.content
         if content.startswith('```'):
             lines = content.split('\n')
             content = '\n'.join(lines[1:-1])

 import os
 from typing import Dict, List, Optional, Tuple
 from snowflake_auth import get_snowflake_connection
 from thoughtspot_deployer import ThoughtSpotDeployer
 import json
 class SmartDataAdjuster:
     """Smart adjuster with liveboard context and conversational flow"""
+    def __init__(self, database: str, schema: str, liveboard_guid: str, llm_model: str = None):
         self.database = database
         self.schema = schema
         self.liveboard_guid = liveboard_guid
         self.conn = None
         self.ts_client = None
+        # LLM setup - use provided model or default to Claude
+        self.llm_model = llm_model or os.getenv('DEFAULT_LLM', 'claude-sonnet-4')
+        self._llm_client = None
         # Context about the liveboard
         self.liveboard_name = None
         self.visualizations = []  # List of viz metadata
+    def _call_llm(self, prompt: str) -> str:
+        """Call the configured LLM (Anthropic or OpenAI)"""
+        # Determine provider from model name
+        model_lower = self.llm_model.lower()
+        if 'claude' in model_lower or 'anthropic' in model_lower:
+            # Use Anthropic
+            import anthropic
+            client = anthropic.Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))
+            # Map display names to API model names
+            model_map = {
+                'claude-sonnet-4': 'claude-sonnet-4-20250514',
+                'claude-sonnet-4.5': 'claude-sonnet-4-20250514',
+                'claude-3.5-sonnet': 'claude-3-5-sonnet-20241022',
+                'claude-3-opus': 'claude-3-opus-20240229',
+            }
+            api_model = model_map.get(self.llm_model, 'claude-sonnet-4-20250514')
+            response = client.messages.create(
+                model=api_model,
+                max_tokens=2000,
+                messages=[{"role": "user", "content": prompt}]
+            )
+            return response.content[0].text
+        else:
+            # Use OpenAI
+            from openai import OpenAI
+            client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
+            # Map display names to API model names
+            model_map = {
+                'gpt-4o': 'gpt-4o',
+                'gpt-4': 'gpt-4',
+                'gpt-4-turbo': 'gpt-4-turbo',
+                'gpt-3.5-turbo': 'gpt-3.5-turbo',
+            }
+            api_model = model_map.get(self.llm_model, 'gpt-4o')
+            response = client.chat.completions.create(
+                model=api_model,
+                messages=[{"role": "user", "content": prompt}],
+                temperature=0
+            )
+            return response.choices[0].message.content
     def connect(self):
         """Connect to Snowflake and ThoughtSpot"""
         # Snowflake
 If unsure about ANY field, set confidence to "low" or "medium".
 """
+        content = self._call_llm(prompt)
         if content.startswith('```'):
             lines = content.split('\n')
             content = '\n'.join(lines[1:-1])

sprint_2026_02.md CHANGED Viewed

@@ -68,6 +68,9 @@
 ## Tasks
 ### To Do
 #### LegitData Improvements (from REI demo learnings)
 - [ ] **Fix DAYSONHAND generation** - Currently random, needs business logic:
@@ -143,3 +146,145 @@
 ## Notes

 ## Tasks
 ### To Do
+- [ ] **Fix tag assignment to models** - Returns 404 error, works for tables but not models
+- [ ] **CRITICAL: Fix MCP bearer auth for on-prem deployments** - OAuth workaround works for cloud but bearer auth needed for on-prem instances that OAuth can't reach (see detailed notes below)
+- [ ] **Fix research cache not loading** - Cache files exist but aren't found due to relative path issue (fix ready, needs restart to test)
 #### LegitData Improvements (from REI demo learnings)
 - [ ] **Fix DAYSONHAND generation** - Currently random, needs business logic:
 ## Notes
+### Feb 3, 2026 - ThoughtSpot Model Validation & MCP Import Fix
+**Issue 1: ThoughtSpot Model Validation Failed (Error 13124)**
+- Model TML was missing ID columns (CUSTOMER_ID, STORE_ID, etc.) that were referenced in joins
+- Joins validated but columns section didn't include the join keys
+**Root Cause:**
+- Code in `thoughtspot_deployer.py` (lines 976-984) was intentionally skipping FK/PK columns to "clean up" the model
+- Logic: "nobody searches for customer 23455 so hide ID columns"
+- But ThoughtSpot requires columns used in joins to be present in the model, even if users don't search them
+**Solution:**
+- Commented out the skip logic for FK/PK columns in `_create_model_with_constraints()`
+- ID columns now included in model's `columns:` section
+- Model deploys successfully with all 54 columns including IDs
+**Issue 2: MCP Import Error**
+- `from mcp.client.streamable_http import streamablehttp_client` failed
+- ModuleNotFoundError: No module named 'mcp.client.streamable_http'
+**Root Cause:**
+- MCP package upgraded from 0.x to 1.0.0
+- Module structure changed: `streamable_http` → `sse` (Server-Sent Events)
+**Solution:**
+- Updated import: `from mcp.client.sse import sse_client`
+- Updated client usage: `sse_client()` instead of `streamablehttp_client()`
+---
+### Feb 3, 2026 - Supabase Compatibility Fix
+**Issue:** Supabase module import failing with `ModuleNotFoundError: No module named 'websockets.asyncio'` causing app to not load settings and default to OpenAI (which had exceeded quota).
+**Root Cause:**
+- Gradio 4.44.0 requires `websockets<13.0`
+- Newer Supabase versions (2.10+) require `websockets>=11` but pull realtime 2.x which needs `websockets.asyncio` (only in 13+)
+- Version conflict prevented Supabase from loading
+**Solution:** Downgraded to compatible version set:
+- `supabase==1.2.0`
+- `realtime==1.0.6`
+- `websockets==12.0`
+- `httpx==0.24.1` (already had this)
+- `gradio==4.44.0` (unchanged)
+**Impact:** Settings now load properly from Supabase, app uses correct LLM model from user settings instead of falling back to OpenAI.
+---
+### Feb 3, 2026 - MCP Bearer Auth vs OAuth Investigation
+**Context:** MCP liveboard creation was working previously with on-prem ThoughtSpot instances that can't be reached via OAuth. This means bearer auth was the working solution. However, current implementation fails with 400 Bad Request.
+**Problem Statement:**
+- MCP endpoint `https://agent.thoughtspot.app/bearer/mcp` returns 400 Bad Request when using SSE or streamable_http clients
+- OAuth via stdio works but only for cloud instances accessible from internet
+- Need bearer auth for on-prem deployments
+**Investigation Timeline:**
+1. **Initial Error (Feb 3 AM):**
+   - Error: `HTTPStatusError: Client error '400 Bad Request' for url 'https://agent.thoughtspot.app/bearer/mcp'`
+   - Code was using `from mcp.client.sse import sse_client` (MCP 1.0)
+   - Bearer auth header format: `Bearer {token}@{host}`
+2. **First Attempted Fix - Downgrade to MCP 0.9.1:**
+   - Reasoning: Maybe MCP 1.0's SSE client doesn't work with bearer endpoint
+   - Result: MCP 0.9.1 doesn't have `streamable_http` module either - only has `sse` and `stdio`
+   - **Learning:** `streamable_http` never existed in any released MCP version we can access
+3. **Git History Investigation:**
+   - Commit `f10a9f5` (Jan 27): Added `from mcp.client.streamable_http import streamablehttp_client` with bearer auth
+   - requirements.txt at that time: `mcp==1.0.0`
+   - But MCP 1.0.0 doesn't actually have `streamable_http` module!
+   - **Learning:** That code was committed but never successfully tested/deployed
+4. **Found Working Implementation:**
+   - Commit `d26f47e` (earlier): Used `stdio_client` with `npx mcp-remote` proxy
+   - Code:
+     ```python
+     from mcp import ClientSession, StdioServerParameters
+     from mcp.client.stdio import stdio_client
+     server_params = StdioServerParameters(
+         command="npx",
+         args=["mcp-remote@latest", "https://agent.thoughtspot.app/mcp"]
+     )
+     async with stdio_client(server_params) as (read, write):
+         async with ClientSession(read, write) as session:
+             await session.initialize()
+     ```
+   - This approach uses OAuth but works
+5. **Current Workaround (OAuth via stdio):**
+   - Reverted to stdio_client approach from commit d26f47e
+   - Tested successfully: Created liveboard b6cc9cad-ff91-4dd4-aec5-091984c2afd2
+   - OAuth flow opens browser for authorization
+   - Works for cloud instances only
+**Technical Details:**
+**Bearer Auth Endpoint (Not Working):**
+- URL: `https://agent.thoughtspot.app/bearer/mcp`
+- Auth header: `Bearer {token}@{host}`
+- Transport: Unknown (streamable_http doesn't exist, SSE returns 400)
+- Status: 400 Bad Request - endpoint rejects SSE connection attempts
+**OAuth Endpoint (Currently Working):**
+- URL: `https://agent.thoughtspot.app/mcp`
+- Proxy: `npx mcp-remote@latest`
+- Transport: stdio → npx → StreamableHTTPClientTransport (handled by mcp-remote)
+- Auth: Browser OAuth flow
+- Limitation: Requires internet-accessible ThoughtSpot instance
+**The Problem:**
+- User confirmed it was working with on-prem instances before
+- On-prem instances can't complete OAuth (not internet-accessible)
+- Therefore, bearer auth must have been working at some point
+- But no evidence in git history of working bearer auth code
+- `mcp-remote` proxy shows it connects using `StreamableHTTPClientTransport` after OAuth
+- The bearer endpoint might require the same transport but with bearer auth headers instead of OAuth
+**Possible Solutions to Investigate:**
+1. **Use mcp-remote with bearer auth**: See if `npx mcp-remote` supports bearer token parameter
+2. **Direct StreamableHTTPClientTransport**: Find/install the transport library that mcp-remote uses internally
+3. **MCP pre-1.0 version**: Search for alpha/beta versions before 0.9.1 that might have streamable_http
+4. **ThoughtSpot-specific MCP package**: Check if ThoughtSpot provides their own MCP client library
+5. **Raw HTTP requests**: Bypass MCP library and make direct HTTP calls to bearer endpoint
+**Current State:**
+- OAuth via stdio works for cloud instances
+- Bearer auth needed for on-prem but implementation unclear
+- Temporary workaround: Using OAuth approach (works for testing/development)
+- **BLOCKER for on-prem deployments**
+**Next Steps:**
+- [ ] Contact ThoughtSpot to ask about bearer auth implementation
+- [ ] Investigate mcp-remote source code to see how it handles StreamableHTTPClientTransport
+- [ ] Test if mcp-remote accepts bearer token as parameter
+- [ ] Look for ThoughtSpot-specific documentation on MCP bearer auth

supabase_client.py CHANGED Viewed

@@ -380,7 +380,6 @@ def load_gradio_settings(email: str) -> Dict[str, Any]:
         "column_naming_style": "snake_case",  # Options: snake_case, camelCase, PascalCase, UPPER_CASE, original
         # Liveboard Creation
-        "liveboard_method": "HYBRID",
         "geo_scope": "USA Only",
         "validation_mode": "Off",

         "column_naming_style": "snake_case",  # Options: snake_case, camelCase, PascalCase, UPPER_CASE, original
         # Liveboard Creation
         "geo_scope": "USA Only",
         "validation_mode": "Off",

thoughtspot_deployer.py CHANGED Viewed

@@ -973,15 +973,19 @@ class ThoughtSpotDeployer:
                 col_name = col['name'].upper()
                 original_col_name = col.get('original_name', col['name'])  # Use original casing for display
                 # SKIP foreign key columns - they're join keys, not analytics columns
-                if self._is_foreign_key_column(col_name, table_name_upper, foreign_keys):
-                    print(f"   ⏭️  Skipping FK column: {table_name_upper}.{col_name}")
-                    continue
                 # SKIP surrogate primary keys (numeric IDs) - nobody searches "customer 23455"
-                if self._is_surrogate_primary_key(col, col_name):
-                    print(f"   ⏭️  Skipping surrogate PK: {table_name_upper}.{col_name}")
-                    continue
                 # Start with basic conflict resolution
                 display_name = self._resolve_column_name_conflict(
@@ -1646,6 +1650,9 @@ class ThoughtSpotDeployer:
                 return True
             else:
                 print(f"[ThoughtSpot]    ⚠️  Tag assignment failed: {assign_response.status_code}", flush=True)
                 return False
         except Exception as e:
@@ -2126,8 +2133,10 @@ class ThoughtSpotDeployer:
                         try:
                             # Build company data from parameters
                             company_data = {
-                                'name': company_name or 'Demo Company',
                                 'use_case': use_case or 'General Analytics'
                             }

                 col_name = col['name'].upper()
                 original_col_name = col.get('original_name', col['name'])  # Use original casing for display
+                # NOTE: We used to skip FK/PK columns, but ThoughtSpot requires them for joins
+                # Even though users don't search "customer 23455", the join columns must be present
+                # in the model's columns section for the joins to work properly.
+                #
                 # SKIP foreign key columns - they're join keys, not analytics columns
+                # if self._is_foreign_key_column(col_name, table_name_upper, foreign_keys):
+                #     print(f"   ⏭️  Skipping FK column: {table_name_upper}.{col_name}")
+                #     continue
+                #
                 # SKIP surrogate primary keys (numeric IDs) - nobody searches "customer 23455"
+                # if self._is_surrogate_primary_key(col, col_name):
+                #     print(f"   ⏭️  Skipping surrogate PK: {table_name_upper}.{col_name}")
+                #     continue
                 # Start with basic conflict resolution
                 display_name = self._resolve_column_name_conflict(
                 return True
             else:
                 print(f"[ThoughtSpot]    ⚠️  Tag assignment failed: {assign_response.status_code}", flush=True)
+                print(f"[ThoughtSpot]    DEBUG: Response text: {assign_response.text[:500]}", flush=True)
+                print(f"[ThoughtSpot]    DEBUG: Object GUIDs: {object_guids}", flush=True)
+                print(f"[ThoughtSpot]    DEBUG: Object type: {object_type}", flush=True)
                 return False
         except Exception as e:
                         try:
                             # Build company data from parameters
+                            # Clean company name for display (strip .com, .org, etc)
+                            clean_company = company_name.split('.')[0].title() if company_name and '.' in company_name else (company_name or 'Demo Company')
                             company_data = {
+                                'name': clean_company,
                                 'use_case': use_case or 'General Analytics'
                             }