demoprep / sprint_2026_03.md
mikeboone's picture
fix: gpt-5 temperature handling + e2e test settings accordion
595f498

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Sprint: March 2026

Started: March 16, 2026 Planning doc: dev_notes/plan_march_2026.md


Context

App is live. Small group launch early this week, broader rollout (4–5 people) within a couple weeks. This sprint covers hardening, settings, and new capabilities before that happens.


Sprint Objectives

Before Small Group Launch (This Week)

  • TS Environment Dropdown βœ… β€” ENV-based dropdown on front page; URLβ†’key map hard-coded in app
    • TS_ENV_N_LABEL/URL pattern in .env; get_ts_environments() reads all to build dropdown
    • Dropdown in right panel alongside AI Model + Liveboard Name
    • update_ts_env() resolves URL + auth key (via os.getenv(key_name)) into controller settings
    • Bug fixed: was storing ENV var name instead of actual secret value
    • Controller created on first message also receives current dropdown env selection
  • Front Page Redesign βœ… β€” Right panel (where Stage + AI Model currently live):
    • Add TS Environment dropdown here alongside AI Model
    • Add Liveboard Name field here
    • Remove the Stage textbox β€” replace with proper progress indicator (see below)
    • Company/use case stays chat-driven only
    • Remove default_company_url setting (replaced by chat-driven flow)
  • Progress Meter Fix βœ… β€” current stage textbox is not great UX
    • Add Init as the first stage in the progress sequence
    • Replace the stage textbox with a visual progress indicator (step 1–N style)
    • Stages: Init β†’ Research β†’ DDL β†’ Data β†’ Model β†’ Liveboard (β†’ Data Adjuster when in that phase)
  • Chat Flow UX Improvements βœ…
    • In-chat help text at session start: brief instructions message
    • Clearer prompting back to user when use case is ambiguous
    • Consider ? tooltip near chat input
  • Error Handling Review βœ…
    • Liveboard partial success βœ…: Snowflake + model OK but liveboard fails β†’ ⚠️ message with Spotter Viz Story tab pointer + 'retry liveboard' prompt
    • TML import errors: parse TS error_list, show which viz failed specifically
    • MCP failures: show which step failed, whether partial work should be kept
    • Top-level wrapper βœ…: process_chat_message wrapped in try/except β†’ yield friendly error + log full traceback
    • Data Adjuster errors: SQL execution failures shown clearly with context, not swallowed
    • Snowflake connection errors: distinguish auth failure vs. query failure vs. timeout
  • Supabase Session Logging βœ… β€” session_logger.py
    • SessionLogger class: writes to Supabase session_logs table
    • Initialized at first message in process_chat_message; stored per-controller on self._session_logger
    • init_session_logger() module-level helper
    • Logs: user message received, research started, deploy started
    • ⚠️ Was incomplete: log_end() existed but was never called β€” completions/failures/durations were never logged
    • Fixed March 27: log_end now called on research, deploy, thoughtspot completion AND failure with duration_ms + error
  • Admin Log Viewer βœ… β€” added to Admin Settings tab; email filter + row limit + Refresh button; queries session_logs via Supabase
  • Data Adjuster Cleanup & Controller Integration βœ…
    • Existing files: data_adjuster.py, smart_data_adjuster.py, conversational_data_adjuster.py, chat_data_adjuster.py
    • Currently wired post-liveboard but state lives on self._adjuster / self._pending_adjustment (instance vars β€” wrong)
    • Controller owns the adjuster phase: adjuster state moves into the chat controller phase flow, not instance vars
    • Multi-turn: controller stays in the adjuster phase across multiple messages; user can ask multiple questions and make multiple adjustments in one session
    • Smart adjustments: LLM understands the request, maps to the right table/column, proposes the SQL, confirms with user, executes
    • Consolidate: decide which of the 4 files survives (likely smart_data_adjuster.py as the engine, rest retired or merged)
  • State Isolation Audit βœ… β€” audit complete; two HIGH issues identified
    • HIGH: session_logger.py module-level _current_logger singleton β€” concurrent sessions overwrite each other's logger
    • HIGH: prompt_logger.py module-level _prompt_logger singleton β€” all users' LLM prompts mix in same in-memory list
    • MEDIUM: inject_admin_settings_to_env() writes to process-global os.environ β€” concurrent deployments could use wrong Snowflake account
    • MEDIUM: Admin settings cache has no TTL β€” external Supabase edits invisible until restart
    • Main ChatDemoInterface state IS isolated via gr.State β€” the pipeline itself is safe
    • Fix (loggers + os.environ) tracked under "Before Broader Rollout β†’ State Isolation Fix" below

Before Broader Rollout

  • Settings Audit & Cleanup βœ…
    • ts_instance_url removed from SETTINGS_SCHEMA + hidden in Settings UI (replaced by env dropdown)
    • default_company_url removed from SETTINGS_SCHEMA + hidden in Settings UI (chat-driven now)
    • AI Model selection already on front page βœ…
  • demo_prep.py Refresh β€” scope too large for this sprint, moved to Phase 2
    • Audit done: ~2-3 day job (Spotter Viz tab, outlier integration, logged_completion, class refactor)
  • Session Persistence Verification βœ… β€” verified working
    • ts_username in SETTINGS_SCHEMA β†’ pre-fills on load via load_settings_on_startup
    • liveboard_name, default_use_case, default_llm all pre-fill on startup
    • company no longer pre-filled (chat-driven) β€” working as intended
  • State Isolation Fix βœ… β€” both loggers now fully per-session
    • Session logger: stored on self._session_logger (per controller instance) βœ…
    • Prompt logger: global singleton removed; PromptLogger instantiated directly per controller βœ…
      • logged_completion() and log_researcher_call() accept logger= param; if None, skips log
      • ThoughtSpotDeployer.prompt_logger set from controller after construction
      • SmartDataAdjuster accepts prompt_logger= constructor param
      • Prompt log tab timer reads from controller._prompt_logger via gr.State
    • inject_admin_settings_to_env() still in use (deferred β€” requires cdw_connector refactor)

Phase 2 (Next Sprint or Later)

demo_prep.py Refresh (from March Sprint)

  • demo_prep.py Refresh β€” sync with chat_interface.py improvements (~2-3 days)
    • Add Spotter Viz Story tab + _generate_spotter_viz_story() (5h)
    • Add Demo Pack tab with outlier-driven talking points + Spotter questions (4h)
    • Replace all researcher.make_request() calls with logged_completion() wrapper (5h)
    • Refactor to class-based pattern (like ChatDemoInterface) for state isolation (7h)
    • Full outlier system integration (2.5h)
    • Per-user session logging throughout (2h)

Carry-forward from Sprint 2

  • Unified Outlier System β€” core done, not satisfied with output quality; needs refinement
  • Demo Pack Generation β€” very unsatisfied, needs significant improvement
  • Chart Titles β€” not happy with viz titles/naming; needs better approach
  • Existing Model Selection + Self-Join Skip β€” may be done; needs confirmation test + verify self-join skip is working correctly
  • Universal Context Prompt β€” double-test this feature end-to-end
  • Chat Adjustment Using Outlier System β€” never got to this
  • Matrix Editor Tab βœ… β€” 🧩 Matrix tab added; first draft view+edit
    • Vertical + Function dropdowns; coverage badge (Override / Base merge)
    • Target persona + business problem shown for overrides
    • KPIs accordion with definitions
    • Liveboard Questions editable dataframe (add/delete rows)
    • Story Controls accordion (read-only for now)
    • Persistence via Supabase deferred to next sprint
    • Reference doc: dev_notes/matrix_reference.md
  • Remember Me on Login βœ… β€” injected into Gradio venv index.html template
    • div.form (not <form>) found via polling; no shadow DOM
    • Saves username to localStorage on login; pre-fills on return visit
  • Interface Mode Refactor (DemoWorkflowEngine shared class concept)
  • Wizard Tab UI βœ… β€” Defined/Custom sub-tabs in Chat; Defined tab first; GO button wires into pipeline
    • Vertical dropdown β†’ Function dropdown (stacked, marks overrides with βœ“)
    • Company URL + "Use URL" checkbox + Additional context + β†’ GO
    • Custom tab = original chat flow unchanged
    • Welcome message redesigned: title + How to start + collapsible example table
    • Chatbot box removed (transparent) to reduce visual clutter
  • Tag Assignment to Models β€” returns 404 (works for tables, not models); needs investigation
  • Spotter Viz Story Verification β€” run end-to-end and verify story generation + blank viz (ASP, Total Sales Weekly) and brand colors rendering
  • Fix Research Cache Not Loading β€” relative path issue; fix was ready, needs test
  • Fix DAYSONHAND Generation β€” currently random; needs business logic (realistic 15–120 day distribution)
  • Verify KPIs in Liveboard β€” requires live deployment test
  • Auto-injection step β€” revisit what this was supposed to be
  • Dead code cleanup: model TML generators β€” thoughtspot_deployer.py has 3 model TML functions; only _create_model_with_constraints is called by deploy_all; remove create_actual_model_tml and create_model_tml

From March Plan

  • Data Adjuster β€” Liveboard-First Entry Point βœ…
    • Paste any TS liveboard URL in the init stage β†’ jumps straight to adjuster (skips build pipeline)
    • load_context_from_liveboard() in smart_data_adjuster.py: liveboard TML (export_fqn) β†’ model GUID β†’ model TML β†’ db/schema
    • Detection in chat_interface.py init stage: regex on pinboard/<guid> pattern β†’ auth TS client β†’ load context β†’ init SmartDataAdjuster β†’ outlier_adjustment stage
  • Sharing βœ… β€” model + liveboard shared (can_edit / MODIFY) after every build
    • share_objects() method in thoughtspot_deployer.py: POST /api/rest/2.0/security/metadata/share
    • Detects @ in value β†’ USER type, otherwise β†’ USER_GROUP
    • share_with in regular Settings (per-user); SHARE_WITH in Admin Settings (system-wide default)
    • Per-user setting takes priority; falls back to admin setting if empty
    • Model shared after creation, liveboard shared after creation
  • Sage Indexing Retry βœ… β€” _get_answer_direct now retries once with 20s wait on 10004 "No answer found"; flag is module-level so the wait happens only once per build run, not per question
  • Fallback TML: Skip Invalid Column Refs βœ… β€” after convert_natural_to_search, validates [Column] tokens against model columns; skips viz (instead of failing the whole liveboard) if any token is missing
  • MCP 500 Retry Logic β€” broader retry for other 5xx errors
  • Model Generator: Chasm Trap Fix β€” when two fact tables share a dimension, model generator must:
    • Include ALL FK joins from each fact table to shared dimensions (e.g. PRIOR_AUTHORIZATIONS.DRUG_NDC β†’ DRUGS)
    • Set is_attribution_dimension: false on shared dimension tables so TS doesn't fold fact tables together
    • Without this: queries fan out through a shared date dimension β†’ every group gets the same average
    • Fixed manually for Abarca: added PRIOR_AUTHORIZATIONS β†’ DRUGS join + DRUGS.is_attribution_dimension=false
  • Data Narrative Layer for Population β€” LLM generates random/flat data because it doesn't know what story the KPIs should tell
  • Quality test: incorporate Snowflake row counts into data grading β€” currently row counts per table are captured in JSON (snowflake_check) and printed after every test run, but not factored into the score. Future: add a "Completeness" sub-score to the data quality grade based on actual row counts (e.g. tables with 0 rows = penalty, minimum thresholds per table type)
    • Root cause: population script gets DDL + company context but NOT the KPI formulas or desired metric distributions
    • Fix: before data generation, build a "data narrative" spec from the verticalΓ—function matrix: explicit per-column constraints ("IS_GENERIC: 93% for Medicaid rows, 80% for Commercial"), outlier targets, trend directions
    • Pass this narrative spec as a required section in the population prompt
    • Domain rules baked in: specialty/biologic drugs get low PA approval, GLP-1s face high scrutiny, Medicaid has highest GDR, etc.
    • Goal: generated data tells the story on first run β€” no manual Snowflake fixups required
  • Fix Domain-Specific NAME Column Generation βœ… β€” DRUG_NAME was falling through to fake.name() (person name) because 'NAME' in col_name_upper matched first, before the drug-specific check; fixed by adding DRUG/MEDICATION check at the top of the NAME block in chat_interface.py
  • Abarca Demo Data β€” KPI Variation βœ… β€” GDR and PA Approval Rate KPIs had flat sparklines
    • Root cause: IS_GENERIC set by plan type only (uniform across months); PA rate set by therapeutic class only
    • Fix: scratch/fix_abarca_kpi_variation.py β€” full per-month reset using plan-type + monthly adjustment
    • GDR visible range: 84–93% with clear oscillations; PA visible range: 74–86%
    • ⚠️ Re-run fix_abarca_kpi_variation.py if other data changes clobber monthly variation
    • PA by therapeutic class fix: scratch/fix_abarca_pa_therapeutic_class.py + scratch/fix_abarca_pa_ts_table.py
    • Root cause of flat PA-by-class chart: no PRIOR_AUTHORIZATIONSβ†’DRUGS join in TS model; added THERAPEUTIC_CLASS column directly to PRIOR_AUTHORIZATIONS table instead

April 7 Session β€” Where We Left Off

Completed This Session

  • Spotter Viz Story fixes βœ… β€” wrapped prompts in messages array (was passing raw string); added prompt logging to both AI + matrix generators
  • SmartDataAdjuster username fix βœ… β€” was failing with "requires username" error; now passes logged-in user email through
  • Spotter Viz story format βœ… β€” removed Step N headers + Expected Result labels; output is now clean numbered prompts only; includes full model URL so Spotter doesn't ask "which data source?"
  • Categorical COMMENT fix (LegitData) βœ… β€” expanded suffix list in DDL prompt to include _stage, _cycle, _motion, _role, _band, _region, _mode, _method, _source, _reason; fixed numeric column guard in legitdata_bridge.py to prevent choice: strategies on INT/NUMERIC columns
  • Shopify data fixes βœ… β€” fixed PIPELINE_STAGE, RENEWAL_CYCLE, FORECAST_CATEGORY, SALES_MOTION, REP_ROLE, TEAM_NAME, 11 orphan rep IDs in scratch/fix_shopify_data.py
  • 3-level cascade dropdown βœ… β€” replaced vertical+function with verticalβ†’lineβ†’function; VERTICAL_LINES + DEMO_FUNCTIONS added to demo_personas.py
  • Matrix fallback chain βœ… β€” get_use_case_config(line, function, vertical_fallback) tries line first, falls back to vertical, then generic
  • Matrix renamed βœ… β€” Retailβ†’"Retail & Consumer Goods", Bankingβ†’"Financial Services", Softwareβ†’"Technology" in VERTICALS + MATRIX_OVERRIDES
  • Technology lines updated βœ… β€” "Cloud Computing" β†’ "Software as a Service"
  • App tab / Chat tab separation βœ… β€” chatbot moved inside Chat tab; App tab is now clean form only; GO still feeds Chat tab in background
  • Matrix reference doc updated βœ… β€” dev_notes/matrix_reference.md rewritten with full 15Γ—6 coverage grid, all lines, current overrides, priority build list
  • CLAUDE.md updated βœ… β€” added "never push without explicit instruction" rule + db/schema derivation pattern

April 13–14 Session

  • AI Feedback + Live Progress tabs merged βœ… β€” single "πŸ€– AI Feedback" tab, two sub-sections
  • Pipeline progress β†’ Complete βœ… β€” current_stage = 'complete' + final yield added to all 4 TS deployment exit paths
  • Unified liveboard question prompt βœ… β€” _generate_smart_questions_with_ai() rewritten:
    • Output: {"kpis": [...], "visualizations": [...]} β€” typed so KPIs and vizzes are explicit
    • Matrix case: persona, business_problem, kpi_definitions, story questions all injected as primary guidance
    • Custom case: additional_context (from Wizard tab) drives the story; no matrix section
    • data_outliers param ready for when LegitData outlier metadata is connected
    • Prompt explicitly asks for N KPIs + M vizzes, with format examples for each
  • additional_context wired end-to-end βœ… β€” generic_use_case_context from controller now flows through deploy_all β†’ company_data β†’ create_liveboard_from_model_mcp β†’ prompt
  • outlier_dicts hack removed βœ… β€” deployer now passes raw matrix_config (the full uc_config) directly; no more reshaping matrix questions into fake outlier format
  • AI Feedback verbosity β€” not a blocker, low priority backlog
  • viz_type enforcement β€” matrix has viz_type per question; enhance_mcp_liveboard() post-processor could use it to override TS chart type choices; deferred
  • Actual data outliers β†’ liveboard β€” LegitData injects outliers into data but that metadata never reaches the liveboard prompt; data_outliers param is ready, just needs the connection from LegitData output

April 27 Session

  • Run history β€” use case truncation + detail panel βœ… β€” use case column truncated to 60 chars in grid; clicking any row shows full use case in a text area below; full_use_cases_state + gr.Dataframe.select() handler
  • Run history β€” Interface column βœ… β€” every run now logs interface (app_defined, app_custom, chat), vertical, line, function, is_custom, additional_context; Interface column added to Run History grid
  • awaiting_use_case stage βœ… β€” fixed bug where pasting long custom text as use case response re-triggered company extraction (e.g. "NJ.Products" from "Princeton, NJ. Products: LONSURF"); chat tab now sets stage to awaiting_use_case when asking "what use case?" so next message is treated as use case only
  • Comprehensive stage + sub-stage logging βœ… β€” added to all gaps in the pipeline:
    • ddl stage: started / completed (with table count) / failed β€” was completely absent before
    • deploy sub-stages: snowflake connected, ddl pushed / failed (verbose)
    • thoughtspot sub-stages: connection created, model tagged, model shared, semantics applied/empty/failed, spotter enabled/failed, liveboard enhance started/completed/failed, liveboard created, liveboard tagged, liveboard shared, liveboard failed
    • Verbose sub-stages use log_verbose() β€” only written when LOG_LEVEL=verbose in admin settings; stage-level events always written
  • Status report + Slack post updated βœ… β€” full sprint delivery items added; status report live at https://boone-ts-demoprep-doc.static.hf.space/status_report.html

Diagnosed

  • Manuel Marco's run (Apr 23) β€” used App tab custom path (stage=awaiting_context confirmed in logs); research completed but DDL stage had no logging so failure point was invisible; new DDL logging will catch this going forward
  • Paul Gilman Texas Children's fail (Apr 17) β€” DML error on QUALITY_CLAIMS.EVENT_COUNT, string inserted into numeric column; ran successfully after on second attempt with different use case

Planned: Settings Contract Test (tests/settings_test.py)

A dedicated settings regression test β€” separate from e2e_quality.py. The quality test only measures pipeline output quality; this test verifies that each user setting actually affects the pipeline correctly.

Pattern per setting:

  1. Read current value as baseline
  2. Set a specific test value
  3. Run a pipeline
  4. Verify the output reflects the change
  5. Reset to original value

Settings to cover:

Setting UI location How to verify
fact_table_size Settings accordion β†’ Data Size COUNT(*) on fact table in Snowflake
dim_table_size Settings accordion β†’ Data Size COUNT(*) on each dimension table
geo_scope Settings accordion β†’ Geographic Scope DISTINCT COUNTRY values in Snowflake
tag_name Settings accordion β†’ Tag Name Tag on model/liveboard via TS API
object_naming_prefix Settings accordion β†’ Object Naming Prefix Schema name starts with prefix
column_naming_style Settings accordion β†’ Column Naming Style Column names in model TML
liveboard_name Settings accordion β†’ Liveboard Name Liveboard name in TS
default_llm AI Model dropdown (above accordion) session_logs β€” which LLM provider/model was used
ts_environment TS Environment dropdown (top of panel) Model/liveboard created on correct TS instance URL
validation_mode Settings accordion DDL validation runs or skips
use_existing_model Settings accordion Data gen stage is skipped

Note: Remove verify_group1_settings() from e2e_quality.py when this is built β€” settings verification doesn't belong in the quality run.

LLM temperature cleanup needed: main_research.py now handles gpt-5 (reasoning model, uses reasoning_effort) vs gpt-5.5+ (supports temperature) via regex in two places. This logic should move to llm_config.py as a proper helper (e.g. build_llm_extra_kwargs(model, temperature)) so it's one place, not scattered across request methods. Also: the Settings UI should surface reasoning_effort as a control when a reasoning model is selected, instead of showing a temperature slider that does nothing.


Mini Sprint: Test Stabilization (Apr 28–29) βœ… SHIPPED TO PROD

Goal: get one clean test run from start to finish before doing anything else.

  • [P0] Verify TS trusted auth βœ… β€” confirmed working
  • [P0] Remove testrunner β†’ mike.boone proxy βœ… β€” testrunner is a real TS user in secloud
  • [P0] Verify Playwright TS env selection βœ… β€” confirmed env_label populates on GO
  • [P1] Fix DATES table date range βœ… β€” DATES now always spans 2 years regardless of dim_table_size
  • [P1] Increase fact table default βœ… β€” bumped from 1k β†’ 5k rows (10k being tested)
  • Empty table detection βœ… β€” pipeline now retries/fails loudly if any non-date table is empty after first attempt; fixed Wells Fargo + Best Buy silent 0-row success
  • Schema name logged to session_logs βœ… β€” meta["schema"] now set so diagnostics can find the Snowflake schema
  • Test suite robustness βœ… β€” selector fallbacks for Liveboard Name + Custom Context; dim_table_size check by row count not name; tag_name JSON guard
  • [P2] HF container log testing β€” deferred
  • [P3] Clean up stale user settings β€” deferred
  • [P3] User onboarding via Slack ⚑ β€” see In Progress below

In Progress / Next Up

  • User onboarding via Slack ⚑ASAP β€” see Mini Sprint above
  • Invite flow β€” proper invite-link system (token + email) β€” bigger lift, backlog after Slack onboarding
  • HTML user guide β€” expand dev_notes/quick_start_guide.md into a full HTML page hosted on HF doc space
  • Technology β†’ Software as a Service + Sales run β€” end-to-end test still pending
  • App tab UX polish β€” GO button flow should update pipeline status on right
  • Enable/disable dropdown items based on matrix coverage
  • viz_type enforcement β€” matrix chart type hints β†’ override TS defaults in post-processing
  • Actual data outliers β†’ liveboard β€” data_outliers param ready, needs connection from LegitData output

Ideas to Think About (Not Implementing Yet)

Spotter Viz Story β€” Matrix-Grounded Single Story

Current state: _generate_spotter_viz_story() generates two sections (persona-driven from matrix + AI-generated LLM story) combined into one Markdown block.

Proposed approach:

  • Single cohesive story (no split tabs) β€” the matrix IS the foundation, AI adds glue
  • Take liveboard_questions from the config β†’ generate high-level NL prompts (not granular)
    • e.g. "Show revenue for the last 3 months as a KPI" / "Revenue by region as a bar chart"
  • AI's role: light narrative that connects the dots, names the story, adds 1-2 sentences per step
  • Keep it simple β€” the goal is a ready-to-use demo script, not documentation
  • Both sections currently generated sequentially after ThoughtSpot deploy; could stay that way

Async Pipeline β€” Parallel ThoughtSpot + Data Population

Current pipeline is fully sequential: Research β†’ DDL β†’ Create Tables β†’ Populate β†’ TS Model β†’ Liveboard

Dependency analysis:

Research ──► DDL ──► Create Tables ─┬──► Populate Data (LegitData)  ──┐
                                    β”‚                                   β”œβ”€β”€β–Ί MCP Liveboard
                                    └──► TS Model + Connections β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         (semantic layer, column naming,
                                          sharing, tagging, Sage index)
  • TS model creation needs schema (tables exist) but NOT data rows
  • Data population needs tables but not the TS model
  • MCP liveboard (Spotter Viz) needs both β€” reads actual data via the model
  • Everything from "Create Tables" onward can be parallelized except the final liveboard step

What runs in parallel after tables are created:

  • Thread A: LegitData population
  • Thread B: TS Model creation β†’ column naming β†’ semantic update β†’ sharing β†’ tagging β†’ Sage indexing

Estimated savings: Model creation + semantic update is ~30-60s; LegitData is ~60-120s. Running in parallel saves roughly the model-creation time off the total wall clock.

Risks/considerations:

  • Gradio streaming yields from a generator β€” parallel work needs to run in threads and feed a shared queue that the generator drains
  • Error handling: if one branch fails, need to cancel/report both and decide whether to proceed to liveboard
  • Progress reporting needs to interleave updates from both branches
  • Could implement with concurrent.futures.ThreadPoolExecutor + a queue.Queue for progress messages

Phase 3 (Future)

  • OAuth/SSO Login β€” swap Gradio auth for proper OAuth flow
  • Batch Runner Gradio Tab β€” after CLI proves out, add Gradio tab for batch testing
  • Batch Runner: Full Pipeline Stages β€” add population, deploy_snowflake, deploy_thoughtspot, liveboard stages
  • Request New Environment Form β€” if/when needed
  • Liveboard Question Column Mapping β€” liveboard_questions[].viz_question uses natural language that may not match actual DDL column names; after model is built, map questions to real column names before sending to MCP. Currently worked around with generic NL ("average selling price by week" vs "ASP weekly") but proper runtime column substitution would be more reliable.

Cancelled / Resolved

  • MCP Bearer Auth investigation β€” resolved; bearer auth working, no further action needed

Done

Session: March 27, 2026 β€” ts_user Fix + Semantics Pipeline + UX Fixes

  • ts_user fix βœ… β€” ThoughtSpot objects now created under logged-in user, not admin
    • All 3 ts_user locations in chat_interface.py β†’ self._get_effective_user_email()
    • ThoughtSpotDeployer raises ValueError if username or secret_key not passed
    • _get_direct_api_session / _get_answer_direct in liveboard_creator.py: accept secret_key param, no global singleton cache
  • THOUGHTSPOT_ADMIN_USER removed βœ… β€” from all non-test Python files
    • supabase_client.py: removed from ADMIN_SETTINGS_KEYS and inject_admin_settings_to_env
    • chat_interface.py: removed startup check + admin hidden field
  • THOUGHTSPOT_TRUSTED_AUTH_KEY removed from Supabase settings βœ…
    • All reads changed to self.settings.get('thoughtspot_trusted_auth_key') β€” set per-env via dropdown
    • Removed admin_ts_auth_key hidden textbox from Settings UI
  • Model semantic enrichment wired into pipeline βœ… β€” thoughtspot_deployer.py deploy_all
    • Model description, per-column description + synonyms + ai_context generated via LLM in one call
    • Applied to model TML and reimported in same Spotter-enable cycle
    • model_semantic_updater.py rewritten: generate_model_description, generate_column_semantics, apply_to_model_tml, enrich_model + backwards-compat alias update_model_semantic_layer
  • Welcome page examples updated βœ… β€” 5 hardcoded examples (Retail/Banking/Software/Manufacturing verticals)
    • Persona column removed; examples guaranteed to work
  • Boolean type mismatch fixed βœ… β€” legitdata_bridge.py convert_value()
    • isinstance(value, bool): return int(value) added before int check (bool is subclass of int)
    • Fixes SUPPLY_CHAIN_EVENTS population failure (NUMBER(38,0) vs BOOLEAN)
  • Company parsing fix βœ… β€” "Caterpillar.com Manufacturing Supply Chain" now parses in one shot
    • Added space-only separator pattern r'[a-zA-Z0-9-]+\.[a-zA-Z]{2,}\s+(.+)' to extract_use_case_from_message
    • Company state preserved when only company detected (not overwritten with old value)
  • Manufacturing in use case list βœ… β€” removed [:3] slice so all 4 verticals show
  • Progress bar stage fix βœ… β€” auto-run now shows correct stage at each step
    • current_stage = 'research' at start of research loop (was 'deploy')
    • current_stage = 'create_ddl' after research, before DDL creation
    • current_stage = 'deploy' after DDL, before Snowflake deployment
  • Remove default company from input bar βœ… β€” input now starts blank
    • All "Amazon.com" fallbacks replaced with "" in load_session_state_on_startup, get_session_defaults, and default_settings

Session: March 27, 2026 β€” Concurrency Fix + Logging Fix

  • Concurrent session UI interference fixed βœ… β€” chat_interface.py
    • liveboard_name_input.change β†’ .blur; outputs=[liveboard_name_input] β†’ outputs=[]
    • Per-keystroke change events were queuing behind long-running deploy generators; Gradio showed the queue wait timer ("197.1/196.8s") directly on the liveboard name textbox
    • Blur fires only on focus-leave; no output round-trip means Gradio never shows a spinner on the field
  • Session logging completions/failures added βœ… β€” chat_interface.py
    • log_end("research", _t) / log_end("research", _t, error=...) added to all exits of run_research_streaming
    • _deploy_error tracker + finally: log_end("deploy", ...) added to run_deployment_streaming
    • log_start("thoughtspot") + _ts_error tracker + finally: log_end("thoughtspot", ...) added to _run_thoughtspot_deployment
    • Now logs: stage completed/failed with duration_ms and error string for every pipeline stage

Session: March 26, 2026 β€” HF Deployment Fixes + Auth + Settings

  • HF Blank Page After Login β€” Root Cause Found βœ…
    • Cause: users were accessing via huggingface.co/spaces/thoughtspot-dp/demoprep (HF wrapper)
    • The wrapper embeds the app in an iframe; modern browsers block third-party iframe cookies β†’ auth cookie not sent back β†’ stays on login page
    • Fix: use the direct URL https://thoughtspot-dp-demoprep.hf.space β€” first-party cookies work fine
    • Also fixed during investigation: pinned starlette==0.50.0 (Starlette 1.0 broke TemplateResponse) and fastapi==0.128.0 (prevents surprise upgrades on HF rebuild)
  • Logout button added βœ… β€” "Sign Out β†’" link in header, wired to Gradio's /logout route
  • Change Password βœ… β€” accordion in Settings tab; requires current password to confirm identity

Session: March 26, 2026 β€” Liveboard Name Fix + Settings Reorganization

  • Liveboard Name Bug Fixed βœ… β€” UI field value now takes priority over DB-loaded default
    • send_message and quick_action accept liveboard_name_ui param
    • liveboard_name_input added to _send_inputs and _action_inputs
    • Applied to controller.settings['liveboard_name'] on every message β€” always uses current UI value
  • Settings UI Reorganized βœ… β€” Split into "Default Settings" and "App Settings"
    • Default Settings: AI Model, Default Use Case, Default Liveboard Name (3-up row)
    • App Settings: Tag Name, Fact Table Size, Dim Table Size, Object Naming Prefix, Column Naming Style

Session: March 26, 2026 β€” New Vision Merge + Pipeline Investigation

  • Spotter enable fix verified βœ… β€” spotter_config placement confirmed correct (nested inside model.properties, not sibling). Tested on model f40ff5bd via scratch/test_spotter_enable.py β€” Spotter answered (HTTP 200).

  • Liveboard pipeline bugs documented βœ… β€” Full trace written in dev_notes/liveboard_flow_amazon_retail.md

    • 6 KPI root cause: AI-generated fill questions (slots 5–8) are all time+metric β†’ MCP creates them all as KPI. Fix: cap AI questions to max 2 single-metric.
    • "Show me" title bug: _convert_outlier_to_mcp_question prepends "Show me"; _clean_viz_title strips "Show " leaving "me...". Fix: add (r'^Show me ', '') before (r'^Show ', '').
    • Spotter Viz Story mismatch: _generate_spotter_viz_story never sees actual viz names β€” generates independently. Fix: pass actual viz names post-build.
    • OutlierPattern fields sql_template, magnitude, affected_columns, target_filter, demo_setup, demo_payoff are all dead β€” never read anywhere.
  • DemoPrep_new_vision2 merge completed βœ… β€” New data generation engine with real outlier injection merged into current codebase:

    • demo_personas.py β€” replaced with new version: DEFAULT_STORY_CONTROLS, story_controls on every vertical/function, Finance/SaaS overrides, ROUTED_USE_CASES, merged get_use_case_config()
    • legitdata_project/legitdata/generator.py β€” replaced with 1,600-line version: _refresh_story_spec(), _apply_storyspec_time_series() (actual outlier injection with deterministic seed + trend/seasonal signals), _generate_saas_finance_gold()
    • legitdata_project/legitdata/storyspec.py β€” new file: StorySpec, TrendProfile, OutlierBudget, ValueGuardrails dataclasses
    • legitdata_project/legitdata/domain/ β€” new package: SemanticType enum + domain value libraries
    • legitdata_project/legitdata/quality/ β€” new package: quality rules, validator, repair
    • 5 updated source files: column_classifier.py, ai_generator.py, generic.py, fk_manager.py, parser.py
    • legitdata_project/legitdata/__init__.py β€” updated with StorySpec exports
    • Verified end-to-end: seed=1780963166 (deterministic from amazon.com+Retail Sales), 1 outlier injected Sept 9 2024 at 2.4x multiplier in SALES_TRANSACTIONS

Carry-Forward / Backlog (Not Scheduled)

  • Unified Outlier System β€” core done, not satisfied with output quality; needs refinement
  • Demo Pack Generation β€” very unsatisfied, needs significant improvement
  • Chart Titles β€” not happy with viz titles/naming; needs better approach
  • Actual data outliers β†’ liveboard β€” data_outliers param ready, needs connection from LegitData output
  • viz_type enforcement β€” matrix chart type hints β†’ override TS defaults in post-processing (deferred)
  • Research cache not loading β€” relative path issue; fix was ready, needs test
  • Fix DAYSONHAND generation β€” currently random; needs business logic (realistic 15–120 day distribution)
  • HTML user guide β€” expand dev_notes/quick_start_guide.md into full HTML page hosted on HF doc space

Known Issues / Tech Debt

  • testrunner@thoughtspot.com β€” now a real TS user in secloud βœ…. Proxy removed Apr 28. Still needs to be added to sebe environments.
  • HF container logs β€” pipeline runs in a background thread; unknown whether stdout reaches the HF SSE log stream. Need to test with curl -N -H "Authorization: Bearer $HF_TOKEN" "https://huggingface.co/api/spaces/thoughtspot-dp/test-demoprep/logs/run". Until confirmed, Supabase session_logs is the real pipeline log.
  • THOUGHTSPOT_URL admin fallback removed β€” all three TS deploy paths now require a TS environment to be selected from the dropdown; no silent fallback. If no env selected, user gets "select a TS environment from the dropdown" error.
  • testrunner not in sebe β€” quality tests targeting sebe environments will fail for testrunner until added.

Notes

Vertical Γ— Function Matrix System

The matrix determines what gets built β€” KPIs, visualizations, outliers, target persona. See dev_notes/plan_march_2026.md appendix for full documentation.

Current coverage:

Vertical Sales Supply Chain Marketing
Retail βœ… Override Base merge Base merge
Banking Base merge Base merge βœ… Override
Software βœ… Override Base merge Base merge
Manufacturing Base merge Base merge Base merge
other Generic Generic Generic

Override = enriched with persona, extra KPIs, specific viz Base merge = Vertical + Function combined, no special override Generic = AI adapts from closest function match