Spaces:

thoughtspot-dp
/

demoprep

Running

App Files Files Community

demoprep / sprint_2026_03.md

mikeboone

fix: gpt-5 temperature handling + e2e test settings accordion

595f498 4 days ago

preview code

raw

history blame contribute delete

37.8 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Sprint: March 2026

Started: March 16, 2026 Planning doc: dev_notes/plan_march_2026.md

Context

App is live. Small group launch early this week, broader rollout (4–5 people) within a couple weeks. This sprint covers hardening, settings, and new capabilities before that happens.

Sprint Objectives

Before Small Group Launch (This Week)

TS Environment Dropdown ✅ — ENV-based dropdown on front page; URL→key map hard-coded in app
- TS_ENV_N_LABEL/URL pattern in .env; get_ts_environments() reads all to build dropdown
- Dropdown in right panel alongside AI Model + Liveboard Name
- update_ts_env() resolves URL + auth key (via os.getenv(key_name)) into controller settings
- Bug fixed: was storing ENV var name instead of actual secret value
- Controller created on first message also receives current dropdown env selection
Front Page Redesign ✅ — Right panel (where Stage + AI Model currently live):
- Add TS Environment dropdown here alongside AI Model
- Add Liveboard Name field here
- Remove the Stage textbox — replace with proper progress indicator (see below)
- Company/use case stays chat-driven only
- Remove default_company_url setting (replaced by chat-driven flow)
Progress Meter Fix ✅ — current stage textbox is not great UX
- Add Init as the first stage in the progress sequence
- Replace the stage textbox with a visual progress indicator (step 1–N style)
- Stages: Init → Research → DDL → Data → Model → Liveboard (→ Data Adjuster when in that phase)
Chat Flow UX Improvements ✅
- In-chat help text at session start: brief instructions message
- Clearer prompting back to user when use case is ambiguous
- Consider ? tooltip near chat input
Error Handling Review ✅
- Liveboard partial success ✅: Snowflake + model OK but liveboard fails → ⚠️ message with Spotter Viz Story tab pointer + 'retry liveboard' prompt
- TML import errors: parse TS error_list, show which viz failed specifically
- MCP failures: show which step failed, whether partial work should be kept
- Top-level wrapper ✅: process_chat_message wrapped in try/except → yield friendly error + log full traceback
- Data Adjuster errors: SQL execution failures shown clearly with context, not swallowed
- Snowflake connection errors: distinguish auth failure vs. query failure vs. timeout
Supabase Session Logging ✅ — session_logger.py
- SessionLogger class: writes to Supabase session_logs table
- Initialized at first message in process_chat_message; stored per-controller on self._session_logger
- init_session_logger() module-level helper
- Logs: user message received, research started, deploy started
- ⚠️ Was incomplete: log_end() existed but was never called — completions/failures/durations were never logged
- Fixed March 27: log_end now called on research, deploy, thoughtspot completion AND failure with duration_ms + error
Admin Log Viewer ✅ — added to Admin Settings tab; email filter + row limit + Refresh button; queries session_logs via Supabase
Data Adjuster Cleanup & Controller Integration ✅
- Existing files: data_adjuster.py, smart_data_adjuster.py, conversational_data_adjuster.py, chat_data_adjuster.py
- Currently wired post-liveboard but state lives on self._adjuster / self._pending_adjustment (instance vars — wrong)
- Controller owns the adjuster phase: adjuster state moves into the chat controller phase flow, not instance vars
- Multi-turn: controller stays in the adjuster phase across multiple messages; user can ask multiple questions and make multiple adjustments in one session
- Smart adjustments: LLM understands the request, maps to the right table/column, proposes the SQL, confirms with user, executes
- Consolidate: decide which of the 4 files survives (likely smart_data_adjuster.py as the engine, rest retired or merged)
State Isolation Audit ✅ — audit complete; two HIGH issues identified
- HIGH: session_logger.py module-level _current_logger singleton — concurrent sessions overwrite each other's logger
- HIGH: prompt_logger.py module-level _prompt_logger singleton — all users' LLM prompts mix in same in-memory list
- MEDIUM: inject_admin_settings_to_env() writes to process-global os.environ — concurrent deployments could use wrong Snowflake account
- MEDIUM: Admin settings cache has no TTL — external Supabase edits invisible until restart
- Main ChatDemoInterface state IS isolated via gr.State — the pipeline itself is safe
- Fix (loggers + os.environ) tracked under "Before Broader Rollout → State Isolation Fix" below

Before Broader Rollout

Settings Audit & Cleanup ✅
- ts_instance_url removed from SETTINGS_SCHEMA + hidden in Settings UI (replaced by env dropdown)
- default_company_url removed from SETTINGS_SCHEMA + hidden in Settings UI (chat-driven now)
- AI Model selection already on front page ✅
demo_prep.py Refresh — scope too large for this sprint, moved to Phase 2
- Audit done: ~2-3 day job (Spotter Viz tab, outlier integration, logged_completion, class refactor)
Session Persistence Verification ✅ — verified working
- ts_username in SETTINGS_SCHEMA → pre-fills on load via load_settings_on_startup
- liveboard_name, default_use_case, default_llm all pre-fill on startup
- company no longer pre-filled (chat-driven) — working as intended
State Isolation Fix ✅ — both loggers now fully per-session
- Session logger: stored on self._session_logger (per controller instance) ✅
- Prompt logger: global singleton removed; PromptLogger instantiated directly per controller ✅
  - logged_completion() and log_researcher_call() accept logger= param; if None, skips log
  - ThoughtSpotDeployer.prompt_logger set from controller after construction
  - SmartDataAdjuster accepts prompt_logger= constructor param
  - Prompt log tab timer reads from controller._prompt_logger via gr.State
- inject_admin_settings_to_env() still in use (deferred — requires cdw_connector refactor)

Phase 2 (Next Sprint or Later)

demo_prep.py Refresh (from March Sprint)

demo_prep.py Refresh — sync with chat_interface.py improvements (~2-3 days)
- Add Spotter Viz Story tab + _generate_spotter_viz_story() (5h)
- Add Demo Pack tab with outlier-driven talking points + Spotter questions (4h)
- Replace all researcher.make_request() calls with logged_completion() wrapper (5h)
- Refactor to class-based pattern (like ChatDemoInterface) for state isolation (7h)
- Full outlier system integration (2.5h)
- Per-user session logging throughout (2h)

Carry-forward from Sprint 2

Unified Outlier System — core done, not satisfied with output quality; needs refinement
Demo Pack Generation — very unsatisfied, needs significant improvement
Chart Titles — not happy with viz titles/naming; needs better approach
Existing Model Selection + Self-Join Skip — may be done; needs confirmation test + verify self-join skip is working correctly
Universal Context Prompt — double-test this feature end-to-end
Chat Adjustment Using Outlier System — never got to this
Matrix Editor Tab ✅ — 🧩 Matrix tab added; first draft view+edit
- Vertical + Function dropdowns; coverage badge (Override / Base merge)
- Target persona + business problem shown for overrides
- KPIs accordion with definitions
- Liveboard Questions editable dataframe (add/delete rows)
- Story Controls accordion (read-only for now)
- Persistence via Supabase deferred to next sprint
- Reference doc: dev_notes/matrix_reference.md
Remember Me on Login ✅ — injected into Gradio venv index.html template
- div.form (not <form>) found via polling; no shadow DOM
- Saves username to localStorage on login; pre-fills on return visit
Interface Mode Refactor (DemoWorkflowEngine shared class concept)
Wizard Tab UI ✅ — Defined/Custom sub-tabs in Chat; Defined tab first; GO button wires into pipeline
- Vertical dropdown → Function dropdown (stacked, marks overrides with ✓)
- Company URL + "Use URL" checkbox + Additional context + → GO
- Custom tab = original chat flow unchanged
- Welcome message redesigned: title + How to start + collapsible example table
- Chatbot box removed (transparent) to reduce visual clutter
Tag Assignment to Models — returns 404 (works for tables, not models); needs investigation
Spotter Viz Story Verification — run end-to-end and verify story generation + blank viz (ASP, Total Sales Weekly) and brand colors rendering
Fix Research Cache Not Loading — relative path issue; fix was ready, needs test
Fix DAYSONHAND Generation — currently random; needs business logic (realistic 15–120 day distribution)
Verify KPIs in Liveboard — requires live deployment test
Auto-injection step — revisit what this was supposed to be
Dead code cleanup: model TML generators — thoughtspot_deployer.py has 3 model TML functions; only _create_model_with_constraints is called by deploy_all; remove create_actual_model_tml and create_model_tml

From March Plan

Data Adjuster — Liveboard-First Entry Point ✅
- Paste any TS liveboard URL in the init stage → jumps straight to adjuster (skips build pipeline)
- load_context_from_liveboard() in smart_data_adjuster.py: liveboard TML (export_fqn) → model GUID → model TML → db/schema
- Detection in chat_interface.py init stage: regex on pinboard/<guid> pattern → auth TS client → load context → init SmartDataAdjuster → outlier_adjustment stage
Sharing ✅ — model + liveboard shared (can_edit / MODIFY) after every build
- share_objects() method in thoughtspot_deployer.py: POST /api/rest/2.0/security/metadata/share
- Detects @ in value → USER type, otherwise → USER_GROUP
- share_with in regular Settings (per-user); SHARE_WITH in Admin Settings (system-wide default)
- Per-user setting takes priority; falls back to admin setting if empty
- Model shared after creation, liveboard shared after creation
Sage Indexing Retry ✅ — _get_answer_direct now retries once with 20s wait on 10004 "No answer found"; flag is module-level so the wait happens only once per build run, not per question
Fallback TML: Skip Invalid Column Refs ✅ — after convert_natural_to_search, validates [Column] tokens against model columns; skips viz (instead of failing the whole liveboard) if any token is missing
MCP 500 Retry Logic — broader retry for other 5xx errors
Model Generator: Chasm Trap Fix — when two fact tables share a dimension, model generator must:
- Include ALL FK joins from each fact table to shared dimensions (e.g. PRIOR_AUTHORIZATIONS.DRUG_NDC → DRUGS)
- Set is_attribution_dimension: false on shared dimension tables so TS doesn't fold fact tables together
- Without this: queries fan out through a shared date dimension → every group gets the same average
- Fixed manually for Abarca: added PRIOR_AUTHORIZATIONS → DRUGS join + DRUGS.is_attribution_dimension=false
Data Narrative Layer for Population — LLM generates random/flat data because it doesn't know what story the KPIs should tell
Quality test: incorporate Snowflake row counts into data grading — currently row counts per table are captured in JSON (snowflake_check) and printed after every test run, but not factored into the score. Future: add a "Completeness" sub-score to the data quality grade based on actual row counts (e.g. tables with 0 rows = penalty, minimum thresholds per table type)
- Root cause: population script gets DDL + company context but NOT the KPI formulas or desired metric distributions
- Fix: before data generation, build a "data narrative" spec from the vertical×function matrix: explicit per-column constraints ("IS_GENERIC: 93% for Medicaid rows, 80% for Commercial"), outlier targets, trend directions
- Pass this narrative spec as a required section in the population prompt
- Domain rules baked in: specialty/biologic drugs get low PA approval, GLP-1s face high scrutiny, Medicaid has highest GDR, etc.
- Goal: generated data tells the story on first run — no manual Snowflake fixups required
Fix Domain-Specific NAME Column Generation ✅ — DRUG_NAME was falling through to fake.name() (person name) because 'NAME' in col_name_upper matched first, before the drug-specific check; fixed by adding DRUG/MEDICATION check at the top of the NAME block in chat_interface.py
Abarca Demo Data — KPI Variation ✅ — GDR and PA Approval Rate KPIs had flat sparklines
- Root cause: IS_GENERIC set by plan type only (uniform across months); PA rate set by therapeutic class only
- Fix: scratch/fix_abarca_kpi_variation.py — full per-month reset using plan-type + monthly adjustment
- GDR visible range: 84–93% with clear oscillations; PA visible range: 74–86%
- ⚠️ Re-run fix_abarca_kpi_variation.py if other data changes clobber monthly variation
- PA by therapeutic class fix: scratch/fix_abarca_pa_therapeutic_class.py + scratch/fix_abarca_pa_ts_table.py
- Root cause of flat PA-by-class chart: no PRIOR_AUTHORIZATIONS→DRUGS join in TS model; added THERAPEUTIC_CLASS column directly to PRIOR_AUTHORIZATIONS table instead

April 7 Session — Where We Left Off

Completed This Session

Spotter Viz Story fixes ✅ — wrapped prompts in messages array (was passing raw string); added prompt logging to both AI + matrix generators
SmartDataAdjuster username fix ✅ — was failing with "requires username" error; now passes logged-in user email through
Spotter Viz story format ✅ — removed Step N headers + Expected Result labels; output is now clean numbered prompts only; includes full model URL so Spotter doesn't ask "which data source?"
Categorical COMMENT fix (LegitData) ✅ — expanded suffix list in DDL prompt to include _stage, _cycle, _motion, _role, _band, _region, _mode, _method, _source, _reason; fixed numeric column guard in legitdata_bridge.py to prevent choice: strategies on INT/NUMERIC columns
Shopify data fixes ✅ — fixed PIPELINE_STAGE, RENEWAL_CYCLE, FORECAST_CATEGORY, SALES_MOTION, REP_ROLE, TEAM_NAME, 11 orphan rep IDs in scratch/fix_shopify_data.py
3-level cascade dropdown ✅ — replaced vertical+function with vertical→line→function; VERTICAL_LINES + DEMO_FUNCTIONS added to demo_personas.py
Matrix fallback chain ✅ — get_use_case_config(line, function, vertical_fallback) tries line first, falls back to vertical, then generic
Matrix renamed ✅ — Retail→"Retail & Consumer Goods", Banking→"Financial Services", Software→"Technology" in VERTICALS + MATRIX_OVERRIDES
Technology lines updated ✅ — "Cloud Computing" → "Software as a Service"
App tab / Chat tab separation ✅ — chatbot moved inside Chat tab; App tab is now clean form only; GO still feeds Chat tab in background
Matrix reference doc updated ✅ — dev_notes/matrix_reference.md rewritten with full 15×6 coverage grid, all lines, current overrides, priority build list
CLAUDE.md updated ✅ — added "never push without explicit instruction" rule + db/schema derivation pattern

April 13–14 Session

AI Feedback + Live Progress tabs merged ✅ — single "🤖 AI Feedback" tab, two sub-sections
Pipeline progress → Complete ✅ — current_stage = 'complete' + final yield added to all 4 TS deployment exit paths
Unified liveboard question prompt ✅ — _generate_smart_questions_with_ai() rewritten:
- Output: {"kpis": [...], "visualizations": [...]} — typed so KPIs and vizzes are explicit
- Matrix case: persona, business_problem, kpi_definitions, story questions all injected as primary guidance
- Custom case: additional_context (from Wizard tab) drives the story; no matrix section
- data_outliers param ready for when LegitData outlier metadata is connected
- Prompt explicitly asks for N KPIs + M vizzes, with format examples for each
additional_context wired end-to-end ✅ — generic_use_case_context from controller now flows through deploy_all → company_data → create_liveboard_from_model_mcp → prompt
outlier_dicts hack removed ✅ — deployer now passes raw matrix_config (the full uc_config) directly; no more reshaping matrix questions into fake outlier format
AI Feedback verbosity — not a blocker, low priority backlog
viz_type enforcement — matrix has viz_type per question; enhance_mcp_liveboard() post-processor could use it to override TS chart type choices; deferred
Actual data outliers → liveboard — LegitData injects outliers into data but that metadata never reaches the liveboard prompt; data_outliers param is ready, just needs the connection from LegitData output

April 27 Session

Run history — use case truncation + detail panel ✅ — use case column truncated to 60 chars in grid; clicking any row shows full use case in a text area below; full_use_cases_state + gr.Dataframe.select() handler
Run history — Interface column ✅ — every run now logs interface (app_defined, app_custom, chat), vertical, line, function, is_custom, additional_context; Interface column added to Run History grid
awaiting_use_case stage ✅ — fixed bug where pasting long custom text as use case response re-triggered company extraction (e.g. "NJ.Products" from "Princeton, NJ. Products: LONSURF"); chat tab now sets stage to awaiting_use_case when asking "what use case?" so next message is treated as use case only
Comprehensive stage + sub-stage logging ✅ — added to all gaps in the pipeline:
- ddl stage: started / completed (with table count) / failed — was completely absent before
- deploy sub-stages: snowflake connected, ddl pushed / failed (verbose)
- thoughtspot sub-stages: connection created, model tagged, model shared, semantics applied/empty/failed, spotter enabled/failed, liveboard enhance started/completed/failed, liveboard created, liveboard tagged, liveboard shared, liveboard failed
- Verbose sub-stages use log_verbose() — only written when LOG_LEVEL=verbose in admin settings; stage-level events always written
Status report + Slack post updated ✅ — full sprint delivery items added; status report live at https://boone-ts-demoprep-doc.static.hf.space/status_report.html

Diagnosed

Manuel Marco's run (Apr 23) — used App tab custom path (stage=awaiting_context confirmed in logs); research completed but DDL stage had no logging so failure point was invisible; new DDL logging will catch this going forward
Paul Gilman Texas Children's fail (Apr 17) — DML error on QUALITY_CLAIMS.EVENT_COUNT, string inserted into numeric column; ran successfully after on second attempt with different use case

Planned: Settings Contract Test (`tests/settings_test.py`)

A dedicated settings regression test — separate from e2e_quality.py. The quality test only measures pipeline output quality; this test verifies that each user setting actually affects the pipeline correctly.

Pattern per setting:

Read current value as baseline
Set a specific test value
Run a pipeline
Verify the output reflects the change
Reset to original value

Settings to cover:

Setting	UI location	How to verify
`fact_table_size`	Settings accordion → Data Size	COUNT(*) on fact table in Snowflake
`dim_table_size`	Settings accordion → Data Size	COUNT(*) on each dimension table
`geo_scope`	Settings accordion → Geographic Scope	DISTINCT COUNTRY values in Snowflake
`tag_name`	Settings accordion → Tag Name	Tag on model/liveboard via TS API
`object_naming_prefix`	Settings accordion → Object Naming Prefix	Schema name starts with prefix
`column_naming_style`	Settings accordion → Column Naming Style	Column names in model TML
`liveboard_name`	Settings accordion → Liveboard Name	Liveboard name in TS
`default_llm`	AI Model dropdown (above accordion)	session_logs — which LLM provider/model was used
`ts_environment`	TS Environment dropdown (top of panel)	Model/liveboard created on correct TS instance URL
`validation_mode`	Settings accordion	DDL validation runs or skips
`use_existing_model`	Settings accordion	Data gen stage is skipped

Note: Remove verify_group1_settings() from e2e_quality.py when this is built — settings verification doesn't belong in the quality run.

LLM temperature cleanup needed: main_research.py now handles gpt-5 (reasoning model, uses reasoning_effort) vs gpt-5.5+ (supports temperature) via regex in two places. This logic should move to llm_config.py as a proper helper (e.g. build_llm_extra_kwargs(model, temperature)) so it's one place, not scattered across request methods. Also: the Settings UI should surface reasoning_effort as a control when a reasoning model is selected, instead of showing a temperature slider that does nothing.

Mini Sprint: Test Stabilization (Apr 28–29) ✅ SHIPPED TO PROD

Goal: get one clean test run from start to finish before doing anything else.

[P0] Verify TS trusted auth ✅ — confirmed working
[P0] Remove testrunner → mike.boone proxy ✅ — testrunner is a real TS user in secloud
[P0] Verify Playwright TS env selection ✅ — confirmed env_label populates on GO
[P1] Fix DATES table date range ✅ — DATES now always spans 2 years regardless of dim_table_size
[P1] Increase fact table default ✅ — bumped from 1k → 5k rows (10k being tested)
Empty table detection ✅ — pipeline now retries/fails loudly if any non-date table is empty after first attempt; fixed Wells Fargo + Best Buy silent 0-row success
Schema name logged to session_logs ✅ — meta["schema"] now set so diagnostics can find the Snowflake schema
Test suite robustness ✅ — selector fallbacks for Liveboard Name + Custom Context; dim_table_size check by row count not name; tag_name JSON guard
[P2] HF container log testing — deferred
[P3] Clean up stale user settings — deferred
[P3] User onboarding via Slack ⚡ — see In Progress below

In Progress / Next Up

User onboarding via Slack ⚡ASAP — see Mini Sprint above
Invite flow — proper invite-link system (token + email) — bigger lift, backlog after Slack onboarding
HTML user guide — expand dev_notes/quick_start_guide.md into a full HTML page hosted on HF doc space
Technology → Software as a Service + Sales run — end-to-end test still pending
App tab UX polish — GO button flow should update pipeline status on right
Enable/disable dropdown items based on matrix coverage
viz_type enforcement — matrix chart type hints → override TS defaults in post-processing
Actual data outliers → liveboard — data_outliers param ready, needs connection from LegitData output

Ideas to Think About (Not Implementing Yet)

Spotter Viz Story — Matrix-Grounded Single Story

Current state: _generate_spotter_viz_story() generates two sections (persona-driven from matrix + AI-generated LLM story) combined into one Markdown block.

Proposed approach:

Single cohesive story (no split tabs) — the matrix IS the foundation, AI adds glue
Take liveboard_questions from the config → generate high-level NL prompts (not granular)
- e.g. "Show revenue for the last 3 months as a KPI" / "Revenue by region as a bar chart"
AI's role: light narrative that connects the dots, names the story, adds 1-2 sentences per step
Keep it simple — the goal is a ready-to-use demo script, not documentation
Both sections currently generated sequentially after ThoughtSpot deploy; could stay that way

Async Pipeline — Parallel ThoughtSpot + Data Population

Current pipeline is fully sequential: Research → DDL → Create Tables → Populate → TS Model → Liveboard

Dependency analysis:

Research ──► DDL ──► Create Tables ─┬──► Populate Data (LegitData)  ──┐
                                    │                                   ├──► MCP Liveboard
                                    └──► TS Model + Connections ────────┘
                                         (semantic layer, column naming,
                                          sharing, tagging, Sage index)

TS model creation needs schema (tables exist) but NOT data rows
Data population needs tables but not the TS model
MCP liveboard (Spotter Viz) needs both — reads actual data via the model
Everything from "Create Tables" onward can be parallelized except the final liveboard step

What runs in parallel after tables are created:

Thread A: LegitData population
Thread B: TS Model creation → column naming → semantic update → sharing → tagging → Sage indexing

Estimated savings: Model creation + semantic update is ~30-60s; LegitData is ~60-120s. Running in parallel saves roughly the model-creation time off the total wall clock.

Risks/considerations:

Gradio streaming yields from a generator — parallel work needs to run in threads and feed a shared queue that the generator drains
Error handling: if one branch fails, need to cancel/report both and decide whether to proceed to liveboard
Progress reporting needs to interleave updates from both branches
Could implement with concurrent.futures.ThreadPoolExecutor + a queue.Queue for progress messages

Phase 3 (Future)

OAuth/SSO Login — swap Gradio auth for proper OAuth flow
Batch Runner Gradio Tab — after CLI proves out, add Gradio tab for batch testing
Batch Runner: Full Pipeline Stages — add population, deploy_snowflake, deploy_thoughtspot, liveboard stages
Request New Environment Form — if/when needed
Liveboard Question Column Mapping — liveboard_questions[].viz_question uses natural language that may not match actual DDL column names; after model is built, map questions to real column names before sending to MCP. Currently worked around with generic NL ("average selling price by week" vs "ASP weekly") but proper runtime column substitution would be more reliable.

Cancelled / Resolved

~~MCP Bearer Auth investigation~~ — resolved; bearer auth working, no further action needed

Done

Session: March 27, 2026 — ts_user Fix + Semantics Pipeline + UX Fixes

ts_user fix ✅ — ThoughtSpot objects now created under logged-in user, not admin
- All 3 ts_user locations in chat_interface.py → self._get_effective_user_email()
- ThoughtSpotDeployer raises ValueError if username or secret_key not passed
- _get_direct_api_session / _get_answer_direct in liveboard_creator.py: accept secret_key param, no global singleton cache
THOUGHTSPOT_ADMIN_USER removed ✅ — from all non-test Python files
- supabase_client.py: removed from ADMIN_SETTINGS_KEYS and inject_admin_settings_to_env
- chat_interface.py: removed startup check + admin hidden field
THOUGHTSPOT_TRUSTED_AUTH_KEY removed from Supabase settings ✅
- All reads changed to self.settings.get('thoughtspot_trusted_auth_key') — set per-env via dropdown
- Removed admin_ts_auth_key hidden textbox from Settings UI
Model semantic enrichment wired into pipeline ✅ — thoughtspot_deployer.py deploy_all
- Model description, per-column description + synonyms + ai_context generated via LLM in one call
- Applied to model TML and reimported in same Spotter-enable cycle
- model_semantic_updater.py rewritten: generate_model_description, generate_column_semantics, apply_to_model_tml, enrich_model + backwards-compat alias update_model_semantic_layer
Welcome page examples updated ✅ — 5 hardcoded examples (Retail/Banking/Software/Manufacturing verticals)
- Persona column removed; examples guaranteed to work
Boolean type mismatch fixed ✅ — legitdata_bridge.py convert_value()
- isinstance(value, bool): return int(value) added before int check (bool is subclass of int)
- Fixes SUPPLY_CHAIN_EVENTS population failure (NUMBER(38,0) vs BOOLEAN)
Company parsing fix ✅ — "Caterpillar.com Manufacturing Supply Chain" now parses in one shot
- Added space-only separator pattern r'[a-zA-Z0-9-]+\.[a-zA-Z]{2,}\s+(.+)' to extract_use_case_from_message
- Company state preserved when only company detected (not overwritten with old value)
Manufacturing in use case list ✅ — removed [:3] slice so all 4 verticals show
Progress bar stage fix ✅ — auto-run now shows correct stage at each step
- current_stage = 'research' at start of research loop (was 'deploy')
- current_stage = 'create_ddl' after research, before DDL creation
- current_stage = 'deploy' after DDL, before Snowflake deployment
Remove default company from input bar ✅ — input now starts blank
- All "Amazon.com" fallbacks replaced with "" in load_session_state_on_startup, get_session_defaults, and default_settings

Session: March 27, 2026 — Concurrency Fix + Logging Fix

Concurrent session UI interference fixed ✅ — chat_interface.py
- liveboard_name_input.change → .blur; outputs=[liveboard_name_input] → outputs=[]
- Per-keystroke change events were queuing behind long-running deploy generators; Gradio showed the queue wait timer ("197.1/196.8s") directly on the liveboard name textbox
- Blur fires only on focus-leave; no output round-trip means Gradio never shows a spinner on the field
Session logging completions/failures added ✅ — chat_interface.py
- log_end("research", _t) / log_end("research", _t, error=...) added to all exits of run_research_streaming
- _deploy_error tracker + finally: log_end("deploy", ...) added to run_deployment_streaming
- log_start("thoughtspot") + _ts_error tracker + finally: log_end("thoughtspot", ...) added to _run_thoughtspot_deployment
- Now logs: stage completed/failed with duration_ms and error string for every pipeline stage

Session: March 26, 2026 — HF Deployment Fixes + Auth + Settings

HF Blank Page After Login — Root Cause Found ✅
- Cause: users were accessing via huggingface.co/spaces/thoughtspot-dp/demoprep (HF wrapper)
- The wrapper embeds the app in an iframe; modern browsers block third-party iframe cookies → auth cookie not sent back → stays on login page
- Fix: use the direct URL https://thoughtspot-dp-demoprep.hf.space — first-party cookies work fine
- Also fixed during investigation: pinned starlette==0.50.0 (Starlette 1.0 broke TemplateResponse) and fastapi==0.128.0 (prevents surprise upgrades on HF rebuild)
Logout button added ✅ — "Sign Out →" link in header, wired to Gradio's /logout route
Change Password ✅ — accordion in Settings tab; requires current password to confirm identity

Session: March 26, 2026 — Liveboard Name Fix + Settings Reorganization

Liveboard Name Bug Fixed ✅ — UI field value now takes priority over DB-loaded default
- send_message and quick_action accept liveboard_name_ui param
- liveboard_name_input added to _send_inputs and _action_inputs
- Applied to controller.settings['liveboard_name'] on every message — always uses current UI value
Settings UI Reorganized ✅ — Split into "Default Settings" and "App Settings"
- Default Settings: AI Model, Default Use Case, Default Liveboard Name (3-up row)
- App Settings: Tag Name, Fact Table Size, Dim Table Size, Object Naming Prefix, Column Naming Style

Session: March 26, 2026 — New Vision Merge + Pipeline Investigation

Spotter enable fix verified ✅ — spotter_config placement confirmed correct (nested inside model.properties, not sibling). Tested on model f40ff5bd via scratch/test_spotter_enable.py — Spotter answered (HTTP 200).
Liveboard pipeline bugs documented ✅ — Full trace written in dev_notes/liveboard_flow_amazon_retail.md
- 6 KPI root cause: AI-generated fill questions (slots 5–8) are all time+metric → MCP creates them all as KPI. Fix: cap AI questions to max 2 single-metric.
- "Show me" title bug: _convert_outlier_to_mcp_question prepends "Show me"; _clean_viz_title strips "Show " leaving "me...". Fix: add (r'^Show me ', '') before (r'^Show ', '').
- Spotter Viz Story mismatch: _generate_spotter_viz_story never sees actual viz names — generates independently. Fix: pass actual viz names post-build.
- OutlierPattern fields sql_template, magnitude, affected_columns, target_filter, demo_setup, demo_payoff are all dead — never read anywhere.
DemoPrep_new_vision2 merge completed ✅ — New data generation engine with real outlier injection merged into current codebase:
- demo_personas.py — replaced with new version: DEFAULT_STORY_CONTROLS, story_controls on every vertical/function, Finance/SaaS overrides, ROUTED_USE_CASES, merged get_use_case_config()
- legitdata_project/legitdata/generator.py — replaced with 1,600-line version: _refresh_story_spec(), _apply_storyspec_time_series() (actual outlier injection with deterministic seed + trend/seasonal signals), _generate_saas_finance_gold()
- legitdata_project/legitdata/storyspec.py — new file: StorySpec, TrendProfile, OutlierBudget, ValueGuardrails dataclasses
- legitdata_project/legitdata/domain/ — new package: SemanticType enum + domain value libraries
- legitdata_project/legitdata/quality/ — new package: quality rules, validator, repair
- 5 updated source files: column_classifier.py, ai_generator.py, generic.py, fk_manager.py, parser.py
- legitdata_project/legitdata/__init__.py — updated with StorySpec exports
- Verified end-to-end: seed=1780963166 (deterministic from amazon.com+Retail Sales), 1 outlier injected Sept 9 2024 at 2.4x multiplier in SALES_TRANSACTIONS

Carry-Forward / Backlog (Not Scheduled)

Unified Outlier System — core done, not satisfied with output quality; needs refinement
Demo Pack Generation — very unsatisfied, needs significant improvement
Chart Titles — not happy with viz titles/naming; needs better approach
Actual data outliers → liveboard — data_outliers param ready, needs connection from LegitData output
viz_type enforcement — matrix chart type hints → override TS defaults in post-processing (deferred)
Research cache not loading — relative path issue; fix was ready, needs test
Fix DAYSONHAND generation — currently random; needs business logic (realistic 15–120 day distribution)
HTML user guide — expand dev_notes/quick_start_guide.md into full HTML page hosted on HF doc space

Known Issues / Tech Debt

testrunner@thoughtspot.com — now a real TS user in secloud ✅. Proxy removed Apr 28. Still needs to be added to sebe environments.
HF container logs — pipeline runs in a background thread; unknown whether stdout reaches the HF SSE log stream. Need to test with curl -N -H "Authorization: Bearer $HF_TOKEN" "https://huggingface.co/api/spaces/thoughtspot-dp/test-demoprep/logs/run". Until confirmed, Supabase session_logs is the real pipeline log.
THOUGHTSPOT_URL admin fallback removed — all three TS deploy paths now require a TS environment to be selected from the dropdown; no silent fallback. If no env selected, user gets "select a TS environment from the dropdown" error.
testrunner not in sebe — quality tests targeting sebe environments will fail for testrunner until added.

Notes

Vertical × Function Matrix System

The matrix determines what gets built — KPIs, visualizations, outliers, target persona. See dev_notes/plan_march_2026.md appendix for full documentation.

Current coverage:

Vertical	Sales	Supply Chain	Marketing
Retail	✅ Override	Base merge	Base merge
Banking	Base merge	Base merge	✅ Override
Software	✅ Override	Base merge	Base merge
Manufacturing	Base merge	Base merge	Base merge
other	Generic	Generic	Generic

Override = enriched with persona, extra KPIs, specific viz Base merge = Vertical + Function combined, no special override Generic = AI adapts from closest function match