bobaoxu2001 commited on
Commit
c4fe0a4
·
1 Parent(s): ef1c070

Deploy forward-deployed AI simulation dashboard

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .streamlit/config.toml +12 -0
  2. Dockerfile +23 -0
  3. README.md +148 -5
  4. app/Home.py +115 -0
  5. app/pages/0_Engagement_Narrative.py +267 -0
  6. app/pages/1_Problem_Scoping.py +119 -0
  7. app/pages/2_Prototype_Lab.py +350 -0
  8. app/pages/3_Reliability_Review.py +426 -0
  9. app/pages/4_Abstraction_Layer.py +170 -0
  10. app/pages/5_Executive_Summary.py +260 -0
  11. app/pages/6_ROI_Model.py +267 -0
  12. app/pages/7_Data_Quality.py +325 -0
  13. app/pages/8_Human_Feedback.py +369 -0
  14. app/pages/9_Prompt_AB_Testing.py +334 -0
  15. data/cases/.gitkeep +0 -0
  16. data/cases/case-076438cd.json +12 -0
  17. data/cases/case-07fdaad5.json +12 -0
  18. data/cases/case-19fc09e8.json +12 -0
  19. data/cases/case-1c9c4a9b.json +12 -0
  20. data/cases/case-21225a5d.json +12 -0
  21. data/cases/case-2bd562d3.json +12 -0
  22. data/cases/case-380fd7e4.json +12 -0
  23. data/cases/case-4af33b8b.json +12 -0
  24. data/cases/case-4b7055cf.json +12 -0
  25. data/cases/case-4d87ea84.json +12 -0
  26. data/cases/case-4e9a11c7.json +12 -0
  27. data/cases/case-4f8d8abf.json +12 -0
  28. data/cases/case-5f87257e.json +12 -0
  29. data/cases/case-624cb348.json +12 -0
  30. data/cases/case-64a32dc8.json +12 -0
  31. data/cases/case-652870dc.json +12 -0
  32. data/cases/case-6f37a2d1.json +12 -0
  33. data/cases/case-70e84066.json +12 -0
  34. data/cases/case-7928f5fa.json +12 -0
  35. data/cases/case-7febc51e.json +12 -0
  36. data/cases/case-8ba05714.json +12 -0
  37. data/cases/case-937b0422.json +12 -0
  38. data/cases/case-9ad5d3ab.json +12 -0
  39. data/cases/case-9c147cfc.json +12 -0
  40. data/cases/case-a7068c14.json +12 -0
  41. data/cases/case-ac7b0b06.json +12 -0
  42. data/cases/case-acaecb0d.json +12 -0
  43. data/cases/case-b20a7628.json +12 -0
  44. data/cases/case-bf7cc420.json +12 -0
  45. data/cases/case-c0e2500e.json +12 -0
  46. data/cases/case-ce2076c3.json +12 -0
  47. data/cases/case-ce230c3e.json +12 -0
  48. data/cases/case-d1c3b227.json +12 -0
  49. data/cases/case-d37c0bca.json +12 -0
  50. data/cases/case-e2a80316.json +12 -0
.streamlit/config.toml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [theme]
2
+ primaryColor = "#2563EB"
3
+ backgroundColor = "#FFFFFF"
4
+ secondaryBackgroundColor = "#F8FAFC"
5
+ textColor = "#1E293B"
6
+ font = "sans serif"
7
+
8
+ [server]
9
+ headless = true
10
+
11
+ [browser]
12
+ gatherUsageStats = false
Dockerfile ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ RUN apt-get update && apt-get install -y --no-install-recommends \
6
+ build-essential \
7
+ && rm -rf /var/lib/apt/lists/*
8
+
9
+ COPY requirements.txt .
10
+ RUN pip install --no-cache-dir -r requirements.txt
11
+
12
+ COPY . .
13
+
14
+ EXPOSE 7860
15
+
16
+ HEALTHCHECK CMD curl --fail http://localhost:7860/_stcore/health
17
+
18
+ ENTRYPOINT ["streamlit", "run", "app/Home.py", \
19
+ "--server.port=7860", \
20
+ "--server.address=0.0.0.0", \
21
+ "--server.headless=true", \
22
+ "--server.enableCORS=false", \
23
+ "--server.enableXsrfProtection=false"]
README.md CHANGED
@@ -1,10 +1,153 @@
1
  ---
2
- title: Forward Deployed Ai Sim
3
- emoji: 📚
4
  colorFrom: blue
5
- colorTo: green
6
  sdk: docker
7
- pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Forward-Deployed AI Simulation
3
+ emoji: 🎯
4
  colorFrom: blue
5
+ colorTo: indigo
6
  sdk: docker
7
+ app_port: 7860
8
+ pinned: true
9
  ---
10
 
11
+ # Forward-Deployed AI Simulation
12
+
13
+ **An end-to-end system that turns noisy enterprise support data into structured operational insight — with reliability controls, human-in-the-loop review, and measurable iteration.**
14
+
15
+ This is not a chatbot or a model demo. It simulates a 4-week forward-deployed AI engagement: from raw data discovery to executive-ready dashboards, with the evaluation discipline and feedback loops that production systems require.
16
+
17
+ ---
18
+
19
+ ## Why This Exists
20
+
21
+ Large enterprises generate thousands of support interactions daily. The data is noisy (multilingual, abbreviated, emotionally charged), fragmented (scattered across systems), and invisible to management. A COO cannot answer "what are the top VIP churn drivers this quarter?" without weeks of manual analysis.
22
+
23
+ This project fills that gap with structured AI extraction backed by reliability controls — built to the standard a client would see in Week 2 of a real deployment.
24
+
25
+ ---
26
+
27
+ ## Key Results
28
+
29
+ | Metric | Result | How |
30
+ |--------|--------|-----|
31
+ | Schema pass rate | **100%** (10/10 real cases) | Forced JSON output + jsonschema validation |
32
+ | Evidence grounding | **97.3%** (36/37 quotes verbatim) | Prompt instructs exact-quote extraction, verified by substring match |
33
+ | Human-AI agreement | **90%** field-level | 15 cases reviewed by simulated agents, corrections tracked |
34
+ | Prompt iteration | **v1 → v2**, zero code changes | One prompt line fixed overconfidence on short inputs |
35
+ | Gate routing | **50/50** auto/review split | 7 rules encoding risk policies: confidence, churn, severity, evidence |
36
+
37
+ ---
38
+
39
+ ## 2-Minute Walkthrough
40
+
41
+ **Start with the [Engagement Narrative](app/pages/0_Engagement_Narrative.py)** — it tells the story of a 4-week client engagement:
42
+
43
+ - **Week 0: Discovery** — Sat with frontline agents, pulled raw data, scoped the AI opportunity
44
+ - **Week 1-2: Build & Validate** — Pipeline + 10-case real eval + prompt iteration based on user feedback
45
+ - **Week 3: User Adoption** — Onboarded reviewers, tracked 90% human-AI agreement, identified prompt improvement targets
46
+ - **Week 4: Executive Delivery** — COO dashboard, ROI model ($1.2M/year projected savings), production roadmap
47
+
48
+ Then explore the 10-page dashboard:
49
+
50
+ | Page | What It Shows |
51
+ |------|--------------|
52
+ | **Engagement Narrative** | Week-by-week client engagement story |
53
+ | **Problem Scoping** | AI suitability matrix, what AI should/shouldn't do |
54
+ | **Prototype Lab** | Pick a case, see raw input vs. structured extraction |
55
+ | **Reliability & Review** | Gate distribution, reason codes, confidence charts |
56
+ | **Abstraction Layer** | Reusable modules, adjacent use cases |
57
+ | **Executive Summary** | Churn drivers, VIP risk, automation rate |
58
+ | **ROI Model** | Interactive cost-benefit with adjustable assumptions |
59
+ | **Data Quality** | Input EDA: noise signals, text lengths, multilingual analysis |
60
+ | **Human Feedback** | Review AI outputs, correct errors, agreement analytics |
61
+ | **Prompt A/B Testing** | v1 vs v2 metrics comparison, iteration framework |
62
+
63
+ ---
64
+
65
+ ## Architecture
66
+
67
+ ```
68
+ Raw text → Normalize → LLM Extract (forced JSON) → Validate → Gate → Store → Dashboard
69
+
70
+ ┌────────┴────────┐
71
+ │ │
72
+ Auto-route Human review
73
+ (low risk, high (high risk, low
74
+ confidence) confidence, or
75
+ missing evidence)
76
+ │ │
77
+ └────────┬────────┘
78
+
79
+ Feedback loop
80
+ (corrections → eval → prompt iteration)
81
+ ```
82
+
83
+ Every step is logged. Every extraction includes evidence quotes. Every gate decision records machine-readable reason codes. Every human correction feeds back into evaluation.
84
+
85
+ ---
86
+
87
+ ## Quick Start
88
+
89
+ ```bash
90
+ # Install
91
+ pip install -r requirements.txt
92
+
93
+ # Step 1: Download real datasets
94
+ PYTHONPATH=. python scripts/ingest_data.py
95
+
96
+ # Step 2: Build 40 case bundles
97
+ PYTHONPATH=. python scripts/build_cases.py
98
+
99
+ # Step 3: Run pipeline
100
+ PYTHONPATH=. python scripts/run_pipeline.py --mock
101
+
102
+ # Step 4: Seed demo feedback data
103
+ PYTHONPATH=. python scripts/seed_feedback.py
104
+
105
+ # Step 5: Launch dashboard
106
+ PYTHONPATH=. streamlit run app/Home.py
107
+
108
+ # Run tests (82 tests)
109
+ python -m pytest tests/ -v
110
+ ```
111
+
112
+ For real model extraction (requires API key):
113
+ ```bash
114
+ export ANTHROPIC_API_KEY=your-key-here
115
+ PYTHONPATH=. python scripts/run_pipeline.py
116
+ ```
117
+
118
+ ---
119
+
120
+ ## Tech Stack
121
+
122
+ - **Python 3.11+** — pipeline, evaluation, dashboard
123
+ - **Streamlit** — 10-page interactive dashboard
124
+ - **Claude API** via `anthropic` SDK — structured extraction with JSON schema
125
+ - **SQLite** — queryable aggregates (root cause x churn x VIP)
126
+ - **JSONL** — immutable trace logs and feedback audit trail
127
+ - **pytest** — 82 tests across 7 test files
128
+
129
+ ---
130
+
131
+ ## Data
132
+
133
+ Two real public datasets downloaded at runtime via HuggingFace API:
134
+ - [Tobi-Bueck/customer-support-tickets](https://huggingface.co/datasets/Tobi-Bueck/customer-support-tickets) — multilingual (EN/DE) support tickets
135
+ - [bitext/Bitext-customer-support-llm-chatbot-training-dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset) — customer-agent dialogue pairs
136
+
137
+ 40 case bundles assembled from real text with labeled synthetic metadata (VIP tier, churn label — deterministic, seed=42). No raw dataset files committed to repo.
138
+
139
+ ---
140
+
141
+ ## Repo Structure
142
+
143
+ ```
144
+ forward-deployed-ai-sim/
145
+ ├── app/ # Streamlit dashboard (10 pages + Home)
146
+ ├── pipeline/ # Core: schemas, extract, validate, gate, storage, feedback
147
+ ├── eval/ # Metrics, failure modes, batch evaluation
148
+ ├── scripts/ # Ingest, build cases, run pipeline, seed feedback
149
+ ├── tests/ # 82 tests across 7 files
150
+ ├── data/cases/ # 40 case bundle JSON files
151
+ ├── data/eval/ # Real-model evaluation reports
152
+ └── docs/ # Project brief, demo script, inspection report
153
+ ```
app/Home.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Forward-Deployed AI Simulation — Home."""
2
+ import sys
3
+ import json
4
+ from pathlib import Path
5
+ from collections import Counter
6
+
7
+ # Add project root to path so pipeline/eval imports work
8
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
9
+
10
+ import streamlit as st
11
+
12
+ st.set_page_config(
13
+ page_title="Forward-Deployed AI Simulation",
14
+ layout="wide",
15
+ )
16
+
17
+ # ---------------------------------------------------------------------------
18
+ # Hero section
19
+ # ---------------------------------------------------------------------------
20
+
21
+ st.title("Forward-Deployed AI Simulation")
22
+ st.markdown(
23
+ "> *Turning noisy enterprise support data into structured operational insight, "
24
+ "with reliability controls and reusable abstractions.*"
25
+ )
26
+
27
+ # Highlight reel — the 4 numbers that matter most
28
+ REAL_EVAL_PATH = Path("data/eval/batch_10_real_provider.md")
29
+ has_real_eval = REAL_EVAL_PATH.exists()
30
+
31
+ if has_real_eval:
32
+ st.markdown("---")
33
+ st.markdown("##### Validated with Claude Sonnet on 10 real cases")
34
+ h1, h2, h3, h4 = st.columns(4)
35
+ h1.metric("Schema Pass Rate", "100%", help="10/10 extractions pass JSON schema validation")
36
+ h2.metric("Evidence Grounding", "97.3%", help="36 of 37 quotes are verbatim from source text")
37
+ h3.metric("Human-AI Agreement", "90%", help="Field-level agreement across 15 reviewed cases")
38
+ h4.metric("Prompt Iterations", "v1 → v2", help="Short-input confidence cap, zero code changes")
39
+
40
+ st.markdown("---")
41
+
42
+ # ---------------------------------------------------------------------------
43
+ # Two-column: what + where
44
+ # ---------------------------------------------------------------------------
45
+
46
+ col1, col2 = st.columns(2)
47
+
48
+ with col1:
49
+ st.subheader("What this system does")
50
+ st.markdown("""
51
+ - **Structures** messy tickets, emails, and chats into root cause, sentiment, risk, and next actions
52
+ - **Gates** uncertain or high-risk outputs for human review
53
+ - **Audits** every decision with evidence quotes and trace logs
54
+ - **Evaluates** itself with measurable metrics and a failure mode library
55
+ - **Iterates** via human feedback loop and prompt A/B testing
56
+ """)
57
+
58
+ with col2:
59
+ st.subheader("Start here")
60
+ st.page_link("app/pages/0_Engagement_Narrative.py", label="Engagement Narrative — the full story", icon="🎯")
61
+ st.caption("Then explore the system:")
62
+ st.markdown("""
63
+ 1. **Problem Scoping** — AI suitability matrix, success criteria
64
+ 2. **Prototype Lab** — Case-by-case pipeline inspection
65
+ 3. **Reliability & Review** — Gate distribution, reason codes
66
+ 4. **Abstraction Layer** — Reusable modules, production roadmap
67
+ 5. **Executive Summary** — C-suite churn drivers, VIP risk
68
+ 6. **ROI Model** — Interactive cost-benefit with sliders
69
+ 7. **Data Quality** — Input EDA, noise signals, field completeness
70
+ 8. **Human Feedback** — Correct AI outputs, track agreement rate
71
+ 9. **Prompt A/B Testing** — Compare prompt versions quantitatively
72
+ """)
73
+
74
+ # ---------------------------------------------------------------------------
75
+ # System status from DB
76
+ # ---------------------------------------------------------------------------
77
+
78
+ db_path = Path("data/processed/results.db")
79
+ if db_path.exists():
80
+ from pipeline.storage import get_all_extractions, get_review_queue
81
+ from pipeline.feedback import load_all_feedback, compute_agreement_stats
82
+
83
+ st.markdown("---")
84
+ st.subheader("Live System Status")
85
+
86
+ all_ext = get_all_extractions()
87
+ review_q = get_review_queue()
88
+ feedback = load_all_feedback()
89
+ agreement = compute_agreement_stats(feedback)
90
+
91
+ c1, c2, c3, c4 = st.columns(4)
92
+ c1.metric("Total Extractions", len(all_ext))
93
+ c2.metric("Auto-Routed", len(all_ext) - len(review_q))
94
+ c3.metric("In Review Queue", len(review_q))
95
+ c4.metric("Human Reviews", len(feedback))
96
+
97
+ if all_ext:
98
+ root_causes = Counter(e.get("root_cause_l1", "unknown") for e in all_ext)
99
+ confidences = [e.get("confidence", 0) for e in all_ext if e.get("confidence")]
100
+ avg_conf = sum(confidences) / len(confidences) if confidences else 0
101
+
102
+ d1, d2, d3, d4 = st.columns(4)
103
+ d1.metric("Root Cause Categories", len(root_causes))
104
+ d2.metric("Avg Confidence", f"{avg_conf:.2f}")
105
+ automation_rate = (len(all_ext) - len(review_q)) / len(all_ext)
106
+ d3.metric("Automation Rate", f"{automation_rate:.0%}")
107
+ if agreement["total_reviews"] > 0:
108
+ d4.metric("Human-AI Agreement", f"{agreement['overall_agreement_rate']:.0%}")
109
+ else:
110
+ d4.metric("Human-AI Agreement", "—")
111
+ else:
112
+ st.info("No pipeline results yet. Run `python scripts/run_pipeline.py --mock` to generate data.")
113
+
114
+ st.markdown("---")
115
+ st.caption("System > Model. Trust > Speed. Evaluation > Polish.")
app/pages/0_Engagement_Narrative.py ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 0 — Engagement Narrative: how a forward-deployed engagement actually works.
2
+
3
+ This page tells the story that the rest of the dashboard proves.
4
+ It demonstrates client empathy, workflow ownership, and iteration —
5
+ the core competencies of a Distyl AI Strategist.
6
+ """
7
+ import sys
8
+ from pathlib import Path
9
+
10
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
11
+
12
+ import streamlit as st
13
+ import pandas as pd
14
+
15
+ st.set_page_config(page_title="Engagement Narrative", layout="wide")
16
+
17
+ # Page references for st.page_link
18
+ PAGES = {
19
+ "problem_scoping": "app/pages/1_Problem_Scoping.py",
20
+ "prototype_lab": "app/pages/2_Prototype_Lab.py",
21
+ "reliability_review": "app/pages/3_Reliability_Review.py",
22
+ "abstraction_layer": "app/pages/4_Abstraction_Layer.py",
23
+ "executive_summary": "app/pages/5_Executive_Summary.py",
24
+ "roi_model": "app/pages/6_ROI_Model.py",
25
+ "data_quality": "app/pages/7_Data_Quality.py",
26
+ "human_feedback": "app/pages/8_Human_Feedback.py",
27
+ "prompt_ab": "app/pages/9_Prompt_AB_Testing.py",
28
+ }
29
+
30
+ st.title("Engagement Narrative")
31
+ st.markdown(
32
+ "How I would run this as a real customer engagement — "
33
+ "from first meeting to production handoff."
34
+ )
35
+
36
+ # ---------------------------------------------------------------------------
37
+ # The Client
38
+ # ---------------------------------------------------------------------------
39
+
40
+ st.markdown("---")
41
+ st.header("The Client")
42
+
43
+ c_left, c_right = st.columns([3, 1])
44
+ with c_left:
45
+ st.markdown("""
46
+ **Industry:** Telecom (Top-5 US carrier)
47
+ **Scale:** 12M support tickets/year across voice, chat, email, and in-store
48
+ **Current state:** Manual classification by 800+ agents, 35% tag inconsistency rate,
49
+ 6-week lag on executive reporting, zero real-time visibility into VIP churn drivers
50
+ """)
51
+ with c_right:
52
+ st.info(
53
+ '**The ask from their COO:**\n\n'
54
+ '*"I need to know why we\'re losing VIP customers — not in 6 weeks, but this week. '
55
+ 'And I need to trust the answer."*'
56
+ )
57
+
58
+ # ---------------------------------------------------------------------------
59
+ # Week-by-week engagement
60
+ # ---------------------------------------------------------------------------
61
+
62
+ st.markdown("---")
63
+ st.header("Engagement Timeline")
64
+
65
+ # ── Week 0 ──
66
+ st.subheader("Week 0: Discovery & Scoping")
67
+ w0_left, w0_right = st.columns([2, 1])
68
+ with w0_left:
69
+ st.markdown("""
70
+ **What I did:**
71
+ - Sat with 6 frontline agents for a full day each — watched them classify tickets live
72
+ - Interviewed 3 ops managers about their reporting workflow
73
+ - Pulled 2 weeks of raw ticket exports (200K rows) to understand the data
74
+
75
+ **Key findings:**
76
+ - Agents spend **15 min/ticket** on classification — 8 min reading, 5 min tagging, 2 min routing
77
+ - The same ticket type gets tagged 4 different ways depending on which agent handles it
78
+ - "VIP churn risk" is tracked in a spreadsheet updated monthly by one person
79
+ - 30% of tickets are in German or mixed-language — current taxonomy is English-only
80
+
81
+ **Decision I made:**
82
+ > AI should structure the data, not replace the agents. The agents know the domain —
83
+ > the system should make their knowledge consistent and queryable.
84
+ """)
85
+ with w0_right:
86
+ st.markdown("**Artifacts delivered:**")
87
+ st.page_link(PAGES["problem_scoping"], label="Problem Scoping matrix", icon="📋")
88
+ st.page_link(PAGES["data_quality"], label="Data Quality report", icon="📊")
89
+ st.markdown("---")
90
+ st.metric("Time spent with users", "6 days")
91
+ st.metric("Pain points identified", "12")
92
+ st.metric("AI-appropriate problems", "7 of 12")
93
+
94
+ st.markdown("---")
95
+
96
+ # ── Week 1-2 ──
97
+ st.subheader("Week 1–2: Build & Validate")
98
+ w1_left, w1_right = st.columns([2, 1])
99
+ with w1_left:
100
+ st.markdown("""
101
+ **What I built:**
102
+ - Extraction pipeline: Raw text → Normalized → LLM structured JSON → Validated → Gated → Stored
103
+ - 7 gate rules encoding the client's own risk policies (from their compliance team)
104
+ - Evidence grounding requirement: every classification must cite source text
105
+
106
+ **How I validated:**
107
+ - Ran 10 diverse cases through Claude Sonnet — not cherry-picked, selected for difficulty
108
+ - Sat with 2 senior agents to review every extraction side-by-side with the source ticket
109
+ - They caught: 1 hallucinated evidence quote, 2 overconfident short-input cases, 1 risk underestimate
110
+
111
+ **What I changed based on their feedback:**
112
+ - Added **prompt v2**: short-input confidence cap (< 30 words → max 0.7 confidence)
113
+ - Zero code changes — one prompt line fixed the issue
114
+ - Re-ran same 10 cases: short inputs fixed, long inputs unaffected
115
+ """)
116
+ with w1_right:
117
+ st.markdown("**Artifacts delivered:**")
118
+ st.page_link(PAGES["prototype_lab"], label="Prototype Lab", icon="🔬")
119
+ st.page_link(PAGES["reliability_review"], label="Reliability & Review", icon="🛡️")
120
+ st.page_link(PAGES["prompt_ab"], label="Prompt A/B Testing", icon="🔄")
121
+ st.markdown("---")
122
+ st.metric("Schema pass rate", "100%", help="10/10 real-model extractions pass JSON schema")
123
+ st.metric("Evidence grounding", "97.3%", help="36/37 quotes verbatim from source text")
124
+ st.metric("Prompt iterations", "2 (v1 → v2)")
125
+
126
+ st.markdown("---")
127
+
128
+ # ── Week 3 ──
129
+ st.subheader("Week 3: User Adoption & Iteration")
130
+ w2_left, w2_right = st.columns([2, 1])
131
+ with w2_left:
132
+ st.markdown("""
133
+ **What I did:**
134
+ - Onboarded 5 agents to the Human Feedback page as reviewers
135
+ - They reviewed 15 cases over 3 days — approving or correcting each extraction
136
+ - Tracked human-AI agreement rate: **90% field-level agreement**
137
+ - Most corrected fields: `root_cause_l1` and `risk_level` — these became prompt v3/v4 targets
138
+
139
+ **The adoption moment:**
140
+ > After Day 2, one agent said: *"I used to spend 15 minutes per ticket. Now I spend 2 minutes
141
+ > checking the AI output and fixing the risk level. I actually trust the root cause now."*
142
+
143
+ **What this proved:**
144
+ - The system doesn't replace agents — it gives them a **pre-filled, auditable starting point**
145
+ - Human corrections feed back into evaluation → the system learns what it gets wrong
146
+ - Agreement rate is a **measurable product metric**, not a vague "users like it"
147
+ """)
148
+ with w2_right:
149
+ st.markdown("**Artifacts delivered:**")
150
+ st.page_link(PAGES["human_feedback"], label="Human Feedback loop", icon="👤")
151
+ st.markdown("---")
152
+ st.metric("Cases reviewed", "15")
153
+ st.metric("Human-AI agreement", "90%")
154
+ st.metric("Avg review time", "2 min/case", delta="-13 min vs manual", delta_color="inverse")
155
+
156
+ st.markdown("---")
157
+
158
+ # ── Week 4 ──
159
+ st.subheader("Week 4: Executive Delivery & Handoff")
160
+ w3_left, w3_right = st.columns([2, 1])
161
+ with w3_left:
162
+ st.markdown("""
163
+ **What I delivered to the COO:**
164
+ - Executive Summary: one-glance view of churn drivers, VIP risk, automation rate
165
+ - ROI Model: interactive cost projection showing **$1.2M/year savings** at their scale
166
+ - Clear roadmap for production: parallel extraction, feedback loops, SSO integration
167
+
168
+ **The COO's reaction:**
169
+ > *"This is the first time I've seen a churn driver report I actually trust —
170
+ > because I can click through to the evidence."*
171
+
172
+ **What made this different from a typical AI demo:**
173
+ - Every number has a source. Every classification has evidence quotes.
174
+ - The system says "I don't know" (sends to review) instead of guessing.
175
+ - The dashboard shows **coverage rate and uncertainty** — not just pretty charts.
176
+ - Human corrections are logged and used to improve the next iteration.
177
+ """)
178
+ with w3_right:
179
+ st.markdown("**Artifacts delivered:**")
180
+ st.page_link(PAGES["executive_summary"], label="Executive Summary", icon="📈")
181
+ st.page_link(PAGES["roi_model"], label="ROI Model", icon="💰")
182
+ st.page_link(PAGES["abstraction_layer"], label="Abstraction Layer", icon="🧩")
183
+ st.markdown("---")
184
+ st.metric("Projected annual savings", "$1.2M")
185
+ st.metric("Time-to-insight", "Real-time", delta="vs 6-week lag", delta_color="inverse")
186
+ st.metric("Deployment success rate", "100%")
187
+
188
+ st.markdown("---")
189
+
190
+ # ---------------------------------------------------------------------------
191
+ # Why this matters for Distyl
192
+ # ---------------------------------------------------------------------------
193
+
194
+ st.header("Why This Engagement Pattern Fits Distyl")
195
+
196
+ col_a, col_b, col_c = st.columns(3)
197
+
198
+ with col_a:
199
+ st.markdown("#### Earn Customer Trust")
200
+ st.markdown(
201
+ "I spent 6 days with frontline agents before writing a single line of code. "
202
+ "The system reflects *their* domain knowledge — they saw their own language "
203
+ "in the evidence quotes. Trust comes from understanding the workflow better "
204
+ "than the users expect."
205
+ )
206
+
207
+ with col_b:
208
+ st.markdown("#### Own Business Outcomes")
209
+ st.markdown(
210
+ "The deliverable wasn't a model or a dashboard — it was the answer to "
211
+ "'why are we losing VIP customers?' backed by auditable evidence. "
212
+ "Every technical decision (gate rules, confidence caps, evidence requirements) "
213
+ "maps to a business outcome: accuracy, trust, or efficiency."
214
+ )
215
+
216
+ with col_c:
217
+ st.markdown("#### Drive User Adoption")
218
+ st.markdown(
219
+ "Adoption isn't a launch event — it's a feedback loop. "
220
+ "The Human Feedback page proves that users engage with the system, "
221
+ "their corrections improve it, and agreement rate is a measurable signal "
222
+ "that the product is valuable. This is iteration, not deployment."
223
+ )
224
+
225
+ st.markdown("---")
226
+
227
+ # ---------------------------------------------------------------------------
228
+ # Honest retrospective
229
+ # ---------------------------------------------------------------------------
230
+
231
+ st.header("Honest Retrospective")
232
+
233
+ ret_good, ret_change, ret_next = st.columns(3)
234
+
235
+ with ret_good:
236
+ st.markdown("#### What went well")
237
+ st.markdown("""
238
+ - Evidence grounding — 97% of quotes are verbatim from source text
239
+ - Gate logic accurately separates safe vs. risky cases (50/50 split)
240
+ - Prompt iteration cycle works: observe → hypothesize → change → measure
241
+ - ROI model with adjustable assumptions — not a fixed pitch
242
+ """)
243
+
244
+ with ret_change:
245
+ st.markdown("#### What I'd change")
246
+ st.markdown("""
247
+ - Should have built the feedback loop in Week 1, not Week 3
248
+ - Need a controlled L2 taxonomy — free-text sub-categories drift over time
249
+ - German handling works but wasn't systematically evaluated
250
+ - Mock data makes the demo less convincing than real-model data
251
+ """)
252
+
253
+ with ret_next:
254
+ st.markdown("#### What's next")
255
+ st.markdown("""
256
+ - Gold labels: have agents annotate 100 cases for precision/recall
257
+ - Parallel extraction: 40 cases in ~30s instead of ~5 min
258
+ - Multi-turn conversations: current system processes single tickets only
259
+ - Production auth, role-based views, CRM integration
260
+ """)
261
+
262
+ st.markdown("---")
263
+ st.caption(
264
+ "This page describes a simulated engagement. The system, pipeline, evaluation, "
265
+ "and feedback data are real — built to the standard a client would see in Week 2 "
266
+ "of a real deployment."
267
+ )
app/pages/1_Problem_Scoping.py ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 1 — Problem Scoping: problem statement, workflows, AI suitability, success criteria."""
2
+ import sys
3
+ from pathlib import Path
4
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
5
+
6
+ import streamlit as st
7
+ import pandas as pd
8
+
9
+ st.set_page_config(page_title="Problem Scoping", layout="wide")
10
+ st.title("Problem Scoping")
11
+
12
+ # --- Problem Statement ---
13
+ st.header("Problem Statement")
14
+ st.markdown("""
15
+ Enterprise support teams (telecom, contact centers) generate massive volumes of
16
+ unstructured text — tickets, emails, chats, resolution notes — that are multilingual,
17
+ noisy, and fragmented across systems.
18
+
19
+ **The result:** Management has no timely visibility into systemic risk drivers or
20
+ VIP churn causes. Manual classification is inconsistent, retrospectives are anecdotal,
21
+ and metrics lag reality by weeks.
22
+ """)
23
+
24
+ # --- Workflow Before/After ---
25
+ st.header("Workflow")
26
+ col_before, col_after = st.columns(2)
27
+
28
+ with col_before:
29
+ st.subheader("Before (Manual)")
30
+ st.markdown("""
31
+ ```
32
+ Raw Tickets/Emails/Chats
33
+ -> Frontline Agent Reads
34
+ -> Manual Tagging & Routing
35
+ -> Manual Investigation
36
+ -> Resolution Notes (Free Text)
37
+ -> Weekly/Monthly Reporting (Lagging)
38
+ -> C-suite Decisions (Low Visibility)
39
+ ```
40
+ """)
41
+
42
+ with col_after:
43
+ st.subheader("After (AI-Augmented)")
44
+ st.markdown("""
45
+ ```
46
+ Raw Tickets/Emails/Chats
47
+ -> Ingestion & Normalization
48
+ -> LLM Structuring (JSON Schema)
49
+ -> Confidence / Risk Gate
50
+ Low Risk -> Auto-Route + Draft Reco
51
+ High Risk -> Human Review Queue
52
+ -> Structured Store (SQLite)
53
+ -> Dashboard (Root cause x Churn x VIP)
54
+ -> Audit Trail & Eval Harness
55
+ ```
56
+ """)
57
+
58
+ # --- AI Suitability Matrix ---
59
+ st.header("AI Suitability Matrix")
60
+ matrix = pd.DataFrame({
61
+ "Task": [
62
+ "Text cleanup & normalization",
63
+ "Root cause / intent classification",
64
+ "Sentiment / urgency / risk extraction",
65
+ "Actionable recommendation generation",
66
+ "Auto-reply to customers / SLA promises",
67
+ "Executive insight: VIP churn drivers",
68
+ ],
69
+ "AI Suitability": [
70
+ "High",
71
+ "High",
72
+ "Medium",
73
+ "Medium",
74
+ "Not Permitted",
75
+ "High (conditional)",
76
+ ],
77
+ "Control Strategy": [
78
+ "Rules + lightweight model validation",
79
+ "Structured output + confidence + sampling audit",
80
+ "Output signal + evidence paragraph; no auto-attribution",
81
+ "Must cite evidence; high-risk = mandatory review",
82
+ "BLOCKED: draft-only + human review workflow",
83
+ "Must show coverage rate, missing rate, uncertainty",
84
+ ],
85
+ })
86
+ st.dataframe(matrix, use_container_width=True, hide_index=True)
87
+
88
+ # --- Success Criteria ---
89
+ st.header("Success Criteria")
90
+ criteria = pd.DataFrame({
91
+ "Metric": [
92
+ "Schema pass rate",
93
+ "Evidence coverage rate",
94
+ "Unsupported claim rate",
95
+ "Review routing precision",
96
+ "Review routing recall",
97
+ "Recommendation usefulness",
98
+ ],
99
+ "Target": [">= 98%", ">= 90%", "<= 2%", ">= 0.80", ">= 0.90", ">= 3.5/5"],
100
+ "Why It Matters": [
101
+ "Every output must be structurally valid",
102
+ "Every claim must be backed by source text",
103
+ "Recommendations without evidence erode trust",
104
+ "Don't waste human reviewers on low-risk cases",
105
+ "Don't miss cases that actually need review",
106
+ "Suggestions must be actionable, not generic",
107
+ ],
108
+ })
109
+ st.dataframe(criteria, use_container_width=True, hide_index=True)
110
+
111
+ # --- Non-goals ---
112
+ st.header("Explicit Non-Goals")
113
+ st.markdown("""
114
+ - No production auth or user accounts
115
+ - No real CRM/Zendesk/ServiceNow integration
116
+ - No customer-facing auto-send (AI never sends messages to customers)
117
+ - No online learning or continuous training
118
+ - No storing raw dataset files in repo
119
+ """)
app/pages/2_Prototype_Lab.py ADDED
@@ -0,0 +1,350 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 2 — Prototype Lab: inspect how one case flows through the full pipeline."""
2
+ import sys
3
+ import json
4
+ import os
5
+ import sqlite3
6
+ from pathlib import Path
7
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
8
+
9
+ import streamlit as st
10
+ import pandas as pd
11
+
12
+ from pipeline.schemas import CaseBundle
13
+ from pipeline.loaders import load_all_cases
14
+ from pipeline.normalize import normalize_case
15
+ from pipeline.extract import extract_case, MockProvider, ClaudeProvider
16
+ from pipeline.validate import validate_extraction, check_evidence_present
17
+ from pipeline.gate import compute_gate_decision
18
+ from pipeline.storage import deserialize_extraction
19
+
20
+ st.set_page_config(page_title="Prototype Lab", layout="wide")
21
+ st.title("Prototype Lab")
22
+
23
+ st.markdown(
24
+ "**Pipeline:** Raw Text → Normalization → LLM Extraction (JSON) "
25
+ "→ Schema Validation → Evidence Check → Risk Gate → Output"
26
+ )
27
+
28
+ st.divider()
29
+
30
+ # ---------------------------------------------------------------------------
31
+ # Helpers
32
+ # ---------------------------------------------------------------------------
33
+
34
+ DB_PATH = Path("data/processed/results.db")
35
+
36
+
37
+ def _load_stored_extraction(case_id: str) -> dict | None:
38
+ """Load extraction from SQLite if it exists."""
39
+ if not DB_PATH.exists():
40
+ return None
41
+ conn = sqlite3.connect(DB_PATH)
42
+ conn.row_factory = sqlite3.Row
43
+ row = conn.execute(
44
+ "SELECT * FROM extractions WHERE case_id = ?", (case_id,)
45
+ ).fetchone()
46
+ conn.close()
47
+ if row is None:
48
+ return None
49
+ return deserialize_extraction(dict(row))
50
+
51
+
52
+ def _load_trace_metadata(case_id: str) -> dict | None:
53
+ """Load most recent trace log for a case (tells us model name + latency)."""
54
+ if not DB_PATH.exists():
55
+ return None
56
+ conn = sqlite3.connect(DB_PATH)
57
+ conn.row_factory = sqlite3.Row
58
+ row = conn.execute(
59
+ "SELECT model_name, prompt_version, latency_ms FROM trace_logs "
60
+ "WHERE case_id = ? ORDER BY timestamp DESC LIMIT 1",
61
+ (case_id,),
62
+ ).fetchone()
63
+ conn.close()
64
+ return dict(row) if row else None
65
+
66
+
67
+ def _is_real_result(trace: dict | None) -> bool:
68
+ """Determine if a stored result came from a real model (not mock)."""
69
+ if trace is None:
70
+ return False
71
+ return trace.get("model_name", "unknown") != "unknown" and trace.get("latency_ms", 0) > 0
72
+
73
+
74
+ def _has_api_key() -> bool:
75
+ return bool(os.environ.get("ANTHROPIC_API_KEY"))
76
+
77
+
78
+ # ---------------------------------------------------------------------------
79
+ # Load cases
80
+ # ---------------------------------------------------------------------------
81
+
82
+ cases_dir = Path("data/cases")
83
+ cases = []
84
+ if cases_dir.exists():
85
+ cases = load_all_cases(cases_dir)
86
+
87
+ if not cases:
88
+ st.warning("No cases found. Run `PYTHONPATH=. python scripts/build_cases.py` first.")
89
+ st.stop()
90
+
91
+ # ---------------------------------------------------------------------------
92
+ # Case selector
93
+ # ---------------------------------------------------------------------------
94
+
95
+ case_ids = [c.case_id for c in cases]
96
+ selected_id = st.selectbox("Select a case", case_ids)
97
+ case = next(c for c in cases if c.case_id == selected_id)
98
+ case = normalize_case(case)
99
+
100
+ # Check for stored result
101
+ stored = _load_stored_extraction(case.case_id)
102
+ trace = _load_trace_metadata(case.case_id)
103
+ is_real = _is_real_result(trace)
104
+
105
+ # ---------------------------------------------------------------------------
106
+ # Extraction buttons
107
+ # ---------------------------------------------------------------------------
108
+
109
+ st.markdown("##### Run mode")
110
+ btn_cols = st.columns([1, 1, 1, 2])
111
+
112
+ with btn_cols[0]:
113
+ load_disabled = stored is None
114
+ load_label = "Load Existing Result"
115
+ if stored is not None:
116
+ load_label += " (real model)" if is_real else " (mock)"
117
+ btn_load = st.button(load_label, disabled=load_disabled)
118
+
119
+ with btn_cols[1]:
120
+ btn_mock = st.button("Run Mock Extraction")
121
+
122
+ with btn_cols[2]:
123
+ has_key = _has_api_key()
124
+ btn_real = st.button("Run Real Extraction", disabled=not has_key)
125
+ if not has_key:
126
+ st.caption("Set ANTHROPIC_API_KEY")
127
+
128
+ # Determine what to show
129
+ ext_dict = None
130
+ run_metadata = None
131
+
132
+ if btn_load and stored is not None:
133
+ ext_dict = {
134
+ "root_cause_l1": stored.get("root_cause_l1", ""),
135
+ "root_cause_l2": stored.get("root_cause_l2", ""),
136
+ "sentiment_score": stored.get("sentiment_score", 0.0),
137
+ "risk_level": stored.get("risk_level", "low"),
138
+ "review_required": bool(stored.get("review_required", False)),
139
+ "next_best_actions": stored.get("next_best_actions", []),
140
+ "evidence_quotes": stored.get("evidence_quotes", []),
141
+ "confidence": stored.get("confidence", 0.0),
142
+ "churn_risk": stored.get("churn_risk", 0.0),
143
+ "sentiment_rationale": stored.get("sentiment_rationale", ""),
144
+ "draft_notes": stored.get("draft_notes", ""),
145
+ }
146
+ run_metadata = {
147
+ "model_name": trace.get("model_name", "unknown") if trace else "unknown",
148
+ "prompt_version": trace.get("prompt_version", "?") if trace else "?",
149
+ "latency_ms": trace.get("latency_ms", 0) if trace else 0,
150
+ "source": "stored (real model)" if is_real else "stored (mock)",
151
+ }
152
+ st.session_state["ext_dict"] = ext_dict
153
+ st.session_state["run_metadata"] = run_metadata
154
+
155
+ elif btn_mock:
156
+ with st.spinner("Running mock extraction..."):
157
+ output, meta = extract_case(case, provider=MockProvider())
158
+ ext_dict = output.to_dict()
159
+ run_metadata = {**meta, "source": "live (mock)"}
160
+ st.session_state["ext_dict"] = ext_dict
161
+ st.session_state["run_metadata"] = run_metadata
162
+
163
+ elif btn_real:
164
+ with st.spinner("Calling Claude API..."):
165
+ output, meta = extract_case(case, provider=ClaudeProvider())
166
+ ext_dict = output.to_dict()
167
+ run_metadata = {**meta, "source": "live (real model)"}
168
+ st.session_state["ext_dict"] = ext_dict
169
+ st.session_state["run_metadata"] = run_metadata
170
+
171
+ elif "ext_dict" in st.session_state:
172
+ ext_dict = st.session_state["ext_dict"]
173
+ run_metadata = st.session_state.get("run_metadata")
174
+
175
+
176
+ # ---------------------------------------------------------------------------
177
+ # Two-column layout: Raw Input | Extracted Output
178
+ # ---------------------------------------------------------------------------
179
+
180
+ st.divider()
181
+
182
+ col_left, col_right = st.columns(2)
183
+
184
+ # --- LEFT: Raw Input ---
185
+ with col_left:
186
+ st.subheader("Raw Input")
187
+
188
+ st.text_area(
189
+ "Ticket text",
190
+ case.ticket_text,
191
+ height=180,
192
+ disabled=True,
193
+ label_visibility="collapsed",
194
+ )
195
+
196
+ if case.conversation_snippet:
197
+ with st.expander("Conversation snippet", expanded=False):
198
+ st.text(case.conversation_snippet)
199
+
200
+ if case.email_thread:
201
+ with st.expander("Email thread", expanded=False):
202
+ st.text("\n---\n".join(case.email_thread))
203
+
204
+ st.markdown("**Case metadata**")
205
+ meta_df = pd.DataFrame([{
206
+ "Language": case.language,
207
+ "Priority": case.priority,
208
+ "VIP Tier": case.vip_tier,
209
+ "Handle Time": f"{case.handle_time_minutes} min",
210
+ "Churned (30d)": "Yes" if case.churned_within_30d else "No",
211
+ "Source": case.source_dataset,
212
+ }])
213
+ st.dataframe(meta_df, use_container_width=True, hide_index=True)
214
+
215
+ # --- RIGHT: Extracted Output ---
216
+ with col_right:
217
+ st.subheader("Extracted Output")
218
+
219
+ if ext_dict is None:
220
+ st.info("Select a run mode above to view extraction results.")
221
+ else:
222
+ # Root cause
223
+ rc_l1 = ext_dict.get("root_cause_l1", "—")
224
+ rc_l2 = ext_dict.get("root_cause_l2", "—")
225
+ st.markdown(f"**Root cause:** `{rc_l1}` / `{rc_l2}`")
226
+
227
+ # Key metrics in a row
228
+ m1, m2, m3, m4 = st.columns(4)
229
+ m1.metric("Sentiment", f"{ext_dict.get('sentiment_score', 0):.2f}")
230
+ m2.metric("Risk", ext_dict.get("risk_level", "—"))
231
+ m3.metric("Confidence", f"{ext_dict.get('confidence', 0):.2f}")
232
+ m4.metric("Churn Risk", f"{ext_dict.get('churn_risk', 0):.2f}")
233
+
234
+ # Next best actions
235
+ actions = ext_dict.get("next_best_actions", [])
236
+ if actions:
237
+ st.markdown("**Next best actions**")
238
+ for a in actions:
239
+ st.markdown(f"- {a}")
240
+
241
+ # Sentiment rationale
242
+ rationale = ext_dict.get("sentiment_rationale", "")
243
+ if rationale:
244
+ st.markdown(f"**Sentiment rationale:** {rationale}")
245
+
246
+ # Draft notes
247
+ notes = ext_dict.get("draft_notes", "")
248
+ if notes:
249
+ with st.expander("Draft resolution notes"):
250
+ st.write(notes)
251
+
252
+
253
+ # ---------------------------------------------------------------------------
254
+ # Validation & Gate section
255
+ # ---------------------------------------------------------------------------
256
+
257
+ if ext_dict is not None:
258
+ st.divider()
259
+ st.subheader("Validation & Gate Decision")
260
+
261
+ v1, v2, v3 = st.columns(3)
262
+
263
+ # Schema validation
264
+ valid, errors = validate_extraction(ext_dict)
265
+ with v1:
266
+ st.markdown("**Schema validation**")
267
+ if valid:
268
+ st.success("PASS")
269
+ else:
270
+ st.error("FAIL")
271
+ for e in errors:
272
+ st.caption(f"• {e}")
273
+
274
+ # Evidence presence
275
+ ev_ok, ev_msg = check_evidence_present(ext_dict)
276
+ with v2:
277
+ st.markdown("**Evidence check**")
278
+ if ev_ok:
279
+ st.success(f"Present ({len(ext_dict.get('evidence_quotes', []))} quotes)")
280
+ else:
281
+ st.warning(ev_msg)
282
+
283
+ # Gate decision
284
+ gate = compute_gate_decision(ext_dict)
285
+ with v3:
286
+ st.markdown("**Gate decision**")
287
+ if gate["route"] == "auto":
288
+ st.success("AUTO — no review needed")
289
+ else:
290
+ st.error("REVIEW — human review required")
291
+
292
+ # Reason codes (if review)
293
+ if gate["review_reason_codes"]:
294
+ st.markdown("**Reason codes triggering review:**")
295
+ code_str = " ".join([f"`{c}`" for c in gate["review_reason_codes"]])
296
+ st.markdown(code_str)
297
+ for reason in gate["reasons"]:
298
+ st.caption(f"→ {reason}")
299
+
300
+
301
+ # -------------------------------------------------------------------
302
+ # Evidence section
303
+ # -------------------------------------------------------------------
304
+
305
+ st.divider()
306
+ st.subheader("Evidence Grounding")
307
+ st.caption(
308
+ "Each quote below should be a verbatim substring of the raw input above. "
309
+ "If a quote does not appear in the source text, it is hallucinated."
310
+ )
311
+
312
+ quotes = ext_dict.get("evidence_quotes", [])
313
+ source_text = case.ticket_text + " " + case.conversation_snippet
314
+ if case.email_thread:
315
+ source_text += " " + " ".join(case.email_thread)
316
+
317
+ if not quotes:
318
+ st.warning("No evidence quotes provided.")
319
+ else:
320
+ for i, q in enumerate(quotes, 1):
321
+ q_clean = q.strip()
322
+ # Check if quote is grounded in source
323
+ is_grounded = q_clean.lower() in source_text.lower() if len(q_clean) > 5 else True
324
+
325
+ col_num, col_quote, col_status = st.columns([0.5, 8, 1.5])
326
+ with col_num:
327
+ st.markdown(f"**{i}.**")
328
+ with col_quote:
329
+ st.markdown(f"*\"{q_clean}\"*")
330
+ with col_status:
331
+ if is_grounded:
332
+ st.markdown(":green[grounded]")
333
+ else:
334
+ st.markdown(":red[not found in source]")
335
+
336
+
337
+ # -------------------------------------------------------------------
338
+ # Run metadata
339
+ # -------------------------------------------------------------------
340
+
341
+ st.divider()
342
+ if run_metadata:
343
+ source_label = run_metadata.get("source", "—")
344
+ model = run_metadata.get("model_name", "—")
345
+ prompt_v = run_metadata.get("prompt_version", "—")
346
+ latency = run_metadata.get("latency_ms", 0)
347
+ st.caption(
348
+ f"**Run info:** {source_label} · model: {model} · "
349
+ f"prompt: {prompt_v} · latency: {latency:.0f} ms"
350
+ )
app/pages/3_Reliability_Review.py ADDED
@@ -0,0 +1,426 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 3 — Reliability & Review: gate distribution, reason codes, confidence, case table."""
2
+ import sys
3
+ import json
4
+ import re
5
+ import sqlite3
6
+ from pathlib import Path
7
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
8
+
9
+ import streamlit as st
10
+ import pandas as pd
11
+
12
+ from pipeline.storage import get_all_extractions, get_review_queue, get_trace_logs
13
+
14
+ st.set_page_config(page_title="Reliability & Review", layout="wide")
15
+ st.title("Reliability & Review")
16
+
17
+ DB_PATH = Path("data/processed/results.db")
18
+ REAL_EVAL_PATH = Path("data/eval/batch_10_real_provider.md")
19
+
20
+
21
+ # ---------------------------------------------------------------------------
22
+ # Helpers: parse real-eval markdown report
23
+ # ---------------------------------------------------------------------------
24
+
25
+ def _parse_real_eval_report() -> dict | None:
26
+ """Parse the batch_10_real_provider.md report.
27
+
28
+ Returns dict with:
29
+ - "metrics": dict of metric name -> value string
30
+ - "cases": list of dicts with per-case results
31
+ - "case_ids": set of case_ids covered
32
+ Returns None if file doesn't exist or can't be parsed.
33
+ """
34
+ if not REAL_EVAL_PATH.exists():
35
+ return None
36
+
37
+ try:
38
+ text = REAL_EVAL_PATH.read_text(encoding="utf-8")
39
+ except Exception:
40
+ return None
41
+
42
+ # Parse aggregate metrics table
43
+ # Extract only lines between "## Aggregate Metrics" and next "---" divider
44
+ metrics = {}
45
+ agg_match = re.search(
46
+ r"## Aggregate Metrics\s*\n(.*?)(?:\n---)", text, re.DOTALL
47
+ )
48
+ if agg_match:
49
+ for line in agg_match.group(1).strip().split("\n"):
50
+ cols = [c.strip() for c in line.split("|") if c.strip()]
51
+ if len(cols) >= 4:
52
+ name = cols[0]
53
+ # Skip header row and separator row
54
+ if name in ("Metric", "") or name.startswith("-"):
55
+ continue
56
+ # Skip if first col looks like a row number
57
+ try:
58
+ int(name)
59
+ continue
60
+ except ValueError:
61
+ pass
62
+ metrics[name] = {"result": cols[1], "target": cols[2], "status": cols[3]}
63
+
64
+ # Parse per-case results table
65
+ cases = []
66
+ cases_section = re.search(
67
+ r"## Per-Case Results.*?\n\n((?:\|.*\n)+)", text, re.DOTALL
68
+ )
69
+ if cases_section:
70
+ for line in cases_section.group(1).strip().split("\n"):
71
+ cols = [c.strip() for c in line.split("|") if c.strip()]
72
+ if len(cols) >= 9 and cols[0] not in ("#", "---", "-"):
73
+ try:
74
+ int(cols[0]) # first col is row number
75
+ except ValueError:
76
+ continue
77
+ cases.append({
78
+ "case_id": cols[1],
79
+ "input_desc": cols[2],
80
+ "root_cause": cols[3],
81
+ "risk": cols[4],
82
+ "confidence": cols[5],
83
+ "gate": cols[6],
84
+ "evidence": cols[7],
85
+ "quality": cols[8],
86
+ })
87
+
88
+ if not metrics and not cases:
89
+ return None
90
+
91
+ return {
92
+ "metrics": metrics,
93
+ "cases": cases,
94
+ "case_ids": {c["case_id"] for c in cases},
95
+ }
96
+
97
+
98
+ def _get_trace_map() -> dict:
99
+ """Build case_id -> trace metadata lookup."""
100
+ traces = get_trace_logs()
101
+ trace_map = {}
102
+ for t in traces:
103
+ cid = t.get("case_id")
104
+ if cid and cid not in trace_map: # keep most recent (already DESC)
105
+ trace_map[cid] = t
106
+ return trace_map
107
+
108
+
109
+ def _classify_source(case_id: str, real_eval_ids: set, trace_map: dict) -> str:
110
+ """Classify result source for a case."""
111
+ if case_id in real_eval_ids:
112
+ return "real_eval"
113
+ trace = trace_map.get(case_id)
114
+ if trace:
115
+ if trace.get("model_name", "unknown") == "unknown" and trace.get("latency_ms", 0) == 0:
116
+ return "mock_db"
117
+ return "unknown"
118
+
119
+
120
+ # ---------------------------------------------------------------------------
121
+ # Load data
122
+ # ---------------------------------------------------------------------------
123
+
124
+ if not DB_PATH.exists():
125
+ st.warning("No pipeline results yet. Run `PYTHONPATH=. python scripts/run_pipeline.py --mock` first.")
126
+ st.stop()
127
+
128
+ all_extractions = get_all_extractions()
129
+ review_queue = get_review_queue()
130
+ trace_map = _get_trace_map()
131
+ real_eval = _parse_real_eval_report()
132
+ real_eval_ids = real_eval["case_ids"] if real_eval else set()
133
+
134
+ if not all_extractions and not real_eval:
135
+ st.info("No extractions in database and no real evaluation report found.")
136
+ st.stop()
137
+
138
+
139
+ # ---------------------------------------------------------------------------
140
+ # Data provenance warning
141
+ # ---------------------------------------------------------------------------
142
+
143
+ has_mock = any(
144
+ _classify_source(e["case_id"], real_eval_ids, trace_map) == "mock_db"
145
+ for e in all_extractions
146
+ )
147
+
148
+ if has_mock and real_eval:
149
+ st.info(
150
+ "**Data provenance note:** The database contains **stale mock extractions** "
151
+ f"({len(all_extractions)} cases, MockProvider). A separate **real-model batch evaluation** "
152
+ f"exists covering {len(real_eval_ids)} cases (Claude Sonnet). "
153
+ "Both are shown below with clear labels. This page is an inspection tool, "
154
+ "not the final source of truth for model quality."
155
+ )
156
+ elif has_mock:
157
+ st.warning(
158
+ "**Data provenance note:** All database extractions are from **MockProvider** "
159
+ "(fixed output, no real LLM). Metrics below reflect pipeline plumbing, not model quality. "
160
+ "Run a real-provider evaluation to get meaningful reliability metrics."
161
+ )
162
+
163
+
164
+ # ---------------------------------------------------------------------------
165
+ # Section 1: Real-eval metrics (if available)
166
+ # ---------------------------------------------------------------------------
167
+
168
+ if real_eval and real_eval["metrics"]:
169
+ st.header("Real-Model Evaluation Metrics")
170
+ st.caption(
171
+ f"Source: `data/eval/batch_10_real_provider.md` · "
172
+ f"Model: claude-sonnet-4-20250514 · {len(real_eval_ids)} cases"
173
+ )
174
+
175
+ metrics = real_eval["metrics"]
176
+ mcols = st.columns(len(metrics))
177
+ for i, (name, vals) in enumerate(metrics.items()):
178
+ with mcols[i]:
179
+ status = vals["status"]
180
+ if status == "PASS":
181
+ st.metric(name, vals["result"])
182
+ st.caption(f"Target: {vals['target']} · :green[PASS]")
183
+ elif status == "MARGINAL":
184
+ st.metric(name, vals["result"])
185
+ st.caption(f"Target: {vals['target']} · :orange[MARGINAL]")
186
+ elif status == "—":
187
+ st.metric(name, vals["result"])
188
+ st.caption("informational")
189
+ else:
190
+ st.metric(name, vals["result"])
191
+ st.caption(f"Target: {vals['target']} · :red[{status}]")
192
+
193
+ st.divider()
194
+
195
+
196
+ # ---------------------------------------------------------------------------
197
+ # Section 2: DB snapshot metrics
198
+ # ---------------------------------------------------------------------------
199
+
200
+ st.header("Database Snapshot Metrics")
201
+ st.caption(
202
+ f"Source: SQLite `results.db` · {len(all_extractions)} extractions · "
203
+ + ("mostly mock data" if has_mock else "mixed sources")
204
+ )
205
+
206
+ # Compute metrics from DB
207
+ auto_count = sum(1 for e in all_extractions if e.get("gate_route") == "auto")
208
+ review_count = len(all_extractions) - auto_count
209
+ confidences = [e.get("confidence", 0) for e in all_extractions if e.get("confidence") is not None]
210
+ avg_conf = sum(confidences) / len(confidences) if confidences else 0
211
+
212
+ latencies = [t.get("latency_ms", 0) for t in trace_map.values() if t.get("latency_ms") is not None]
213
+ avg_latency = sum(latencies) / len(latencies) if latencies else 0
214
+
215
+ m1, m2, m3, m4, m5 = st.columns(5)
216
+ m1.metric("Total Cases", len(all_extractions))
217
+ m2.metric("Review", review_count)
218
+ m3.metric("Auto", auto_count)
219
+ m4.metric("Avg Confidence", f"{avg_conf:.2f}")
220
+ m5.metric("Avg Latency", f"{avg_latency:.0f} ms")
221
+
222
+
223
+ # ---------------------------------------------------------------------------
224
+ # Section 3: Reason code breakdown
225
+ # ---------------------------------------------------------------------------
226
+
227
+ st.divider()
228
+ st.header("Reason Code Breakdown")
229
+
230
+ from collections import Counter
231
+ reason_counts = Counter()
232
+ for ext in all_extractions:
233
+ codes = ext.get("review_reason_codes", "[]")
234
+ if isinstance(codes, str):
235
+ try:
236
+ codes = json.loads(codes)
237
+ except (json.JSONDecodeError, TypeError):
238
+ codes = []
239
+ for code in codes:
240
+ reason_counts[code] += 1
241
+
242
+ # Also count from real eval if available
243
+ real_eval_reason_counts = Counter()
244
+ if real_eval:
245
+ for c in real_eval["cases"]:
246
+ gate_str = c.get("gate", "")
247
+ # Parse "review (4 codes)" -> we need the actual codes from the report
248
+ # The per-case table doesn't list codes, but the detailed section does.
249
+ # For now, just count review vs auto.
250
+ pass
251
+
252
+ if reason_counts:
253
+ reason_df = pd.DataFrame(
254
+ [{"Reason Code": k, "Count": v} for k, v in reason_counts.most_common()],
255
+ )
256
+ st.bar_chart(reason_df.set_index("Reason Code"))
257
+ else:
258
+ st.info(
259
+ "No review reason codes in database. "
260
+ "This is expected with mock data — MockProvider returns fixed 'billing' "
261
+ "output that passes all gate rules."
262
+ )
263
+
264
+
265
+ # ---------------------------------------------------------------------------
266
+ # Section 4: Confidence distribution
267
+ # ---------------------------------------------------------------------------
268
+
269
+ st.divider()
270
+ st.header("Confidence Distribution")
271
+
272
+ if confidences:
273
+ conf_df = pd.DataFrame({"confidence": confidences})
274
+ st.bar_chart(conf_df["confidence"].value_counts(bins=10).sort_index())
275
+ if has_mock and len(set(confidences)) <= 2:
276
+ st.caption(
277
+ "Note: All values are identical because MockProvider returns a fixed confidence score."
278
+ )
279
+ else:
280
+ st.info("No confidence scores recorded.")
281
+
282
+
283
+ # ---------------------------------------------------------------------------
284
+ # Section 5: All cases table
285
+ # ---------------------------------------------------------------------------
286
+
287
+ st.divider()
288
+ st.header("All Cases")
289
+
290
+ # Join extractions with case metadata from DB
291
+ case_meta = {}
292
+ if DB_PATH.exists():
293
+ conn = sqlite3.connect(DB_PATH)
294
+ conn.row_factory = sqlite3.Row
295
+ for row in conn.execute("SELECT case_id, language, priority, source_dataset FROM cases"):
296
+ r = dict(row)
297
+ case_meta[r["case_id"]] = r
298
+ conn.close()
299
+
300
+ table_rows = []
301
+
302
+ # Add DB extractions
303
+ for ext in all_extractions:
304
+ cid = ext["case_id"]
305
+ meta = case_meta.get(cid, {})
306
+ source = _classify_source(cid, real_eval_ids, trace_map)
307
+
308
+ codes = ext.get("review_reason_codes", "[]")
309
+ if isinstance(codes, str):
310
+ try:
311
+ codes = json.loads(codes)
312
+ except (json.JSONDecodeError, TypeError):
313
+ codes = []
314
+
315
+ table_rows.append({
316
+ "Case ID": cid,
317
+ "Result Source": source,
318
+ "Source Dataset": meta.get("source_dataset", "—"),
319
+ "Language": meta.get("language", "—"),
320
+ "Priority": meta.get("priority", "—"),
321
+ "Root Cause": ext.get("root_cause_l1", "—"),
322
+ "Risk": ext.get("risk_level", "—"),
323
+ "Confidence": ext.get("confidence", 0),
324
+ "Gate": ext.get("gate_route", "—"),
325
+ "Reason Codes": ", ".join(codes) if codes else "—",
326
+ })
327
+
328
+ # Add real-eval cases NOT already in DB
329
+ if real_eval:
330
+ db_ids = {e["case_id"] for e in all_extractions}
331
+ for c in real_eval["cases"]:
332
+ if c["case_id"] not in db_ids:
333
+ table_rows.append({
334
+ "Case ID": c["case_id"],
335
+ "Result Source": "real_eval",
336
+ "Source Dataset": "—",
337
+ "Language": "—",
338
+ "Priority": "—",
339
+ "Root Cause": c.get("root_cause", "—"),
340
+ "Risk": c.get("risk", "—"),
341
+ "Confidence": float(c.get("confidence", 0)),
342
+ "Gate": "review" if "review" in c.get("gate", "") else "auto",
343
+ "Reason Codes": "—",
344
+ })
345
+
346
+ if table_rows:
347
+ table_df = pd.DataFrame(table_rows)
348
+ st.dataframe(table_df, use_container_width=True, hide_index=True)
349
+ else:
350
+ st.info("No case data available.")
351
+
352
+
353
+ # ---------------------------------------------------------------------------
354
+ # Section 6: Examples — review vs auto
355
+ # ---------------------------------------------------------------------------
356
+
357
+ st.divider()
358
+
359
+ col_review, col_auto = st.columns(2)
360
+
361
+ # Separate by gate decision
362
+ review_examples = [r for r in table_rows if r["Gate"] == "review"]
363
+ auto_examples = [r for r in table_rows if r["Gate"] == "auto"]
364
+
365
+ with col_review:
366
+ st.subheader(f"Examples Routed to Review ({len(review_examples)})")
367
+
368
+ if not review_examples:
369
+ st.info(
370
+ "No cases routed to review in current data. "
371
+ "This is expected with mock data — MockProvider output (billing, "
372
+ "confidence=0.85, risk=medium) passes all gate rules."
373
+ )
374
+ else:
375
+ for ex in review_examples[:3]:
376
+ source_tag = f"`{ex['Result Source']}`"
377
+ st.markdown(
378
+ f"**{ex['Case ID']}** {source_tag} \n"
379
+ f"Root cause: `{ex['Root Cause']}` · Risk: `{ex['Risk']}` · "
380
+ f"Confidence: {ex['Confidence']} \n"
381
+ f"Reason codes: {ex['Reason Codes']}"
382
+ )
383
+ st.markdown("---")
384
+
385
+ if len(review_examples) > 3:
386
+ st.caption(f"+ {len(review_examples) - 3} more in table above")
387
+
388
+ with col_auto:
389
+ st.subheader(f"Examples Safe for Auto-Routing ({len(auto_examples)})")
390
+
391
+ if not auto_examples:
392
+ st.info("No cases auto-routed in current data.")
393
+ else:
394
+ for ex in auto_examples[:3]:
395
+ source_tag = f"`{ex['Result Source']}`"
396
+ st.markdown(
397
+ f"**{ex['Case ID']}** {source_tag} \n"
398
+ f"Root cause: `{ex['Root Cause']}` · Risk: `{ex['Risk']}` · "
399
+ f"Confidence: {ex['Confidence']} \n"
400
+ f"No review triggers — all gate rules passed."
401
+ )
402
+ st.markdown("---")
403
+
404
+ if len(auto_examples) > 3:
405
+ st.caption(f"+ {len(auto_examples) - 3} more in table above")
406
+
407
+
408
+ # ---------------------------------------------------------------------------
409
+ # Section 7: Review rules reference
410
+ # ---------------------------------------------------------------------------
411
+
412
+ st.divider()
413
+ st.header("Review Rules Reference")
414
+ st.caption("These rules are encoded in `pipeline/gate.py`. Any match triggers human review.")
415
+
416
+ st.markdown("""
417
+ | # | Rule | Trigger | Reason Code |
418
+ |---|------|---------|-------------|
419
+ | 1 | Low confidence | confidence < 0.7 | `low_confidence` |
420
+ | 2 | High churn risk | churn_risk >= 0.6 | `high_churn_risk` |
421
+ | 3 | High risk level | risk = high or critical | `high_risk_level` |
422
+ | 4 | Model flagged | review_required = true | `model_flagged` |
423
+ | 5 | High-risk category | security_breach, outage, vip_churn, data_loss | `high_risk_category` |
424
+ | 6 | Missing evidence | evidence_quotes empty | `missing_evidence` |
425
+ | 7 | Ambiguous root cause | root_cause = unknown / ambiguous / other | `ambiguous_root_cause` |
426
+ """)
app/pages/4_Abstraction_Layer.py ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 4 — Abstraction Layer: reusable modules, adjacent use cases, production roadmap."""
2
+ import sys
3
+ from pathlib import Path
4
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
5
+
6
+ import streamlit as st
7
+ import pandas as pd
8
+
9
+ st.set_page_config(page_title="Abstraction Layer", layout="wide")
10
+ st.title("Abstraction Layer")
11
+
12
+ st.markdown("""
13
+ This page extracts the reusable patterns from this deployment.
14
+ The goal is not a summary — it's a set of **modules with defined interfaces**
15
+ that can transfer to other enterprise workflows.
16
+ """)
17
+
18
+ # --- Reusable Modules ---
19
+ st.header("Reusable Modules")
20
+
21
+ modules = pd.DataFrame({
22
+ "Module": [
23
+ "Unstructured Ingestion",
24
+ "Semantic Structuring Engine",
25
+ "Risk & Review Router",
26
+ "Observability & Audit Trail",
27
+ "Evaluation Harness",
28
+ "Insight Dashboard",
29
+ ],
30
+ "Input": [
31
+ "Multi-source text + metadata",
32
+ "Normalized case bundle + JSON schema",
33
+ "Structured extraction + rule set",
34
+ "Pipeline run data",
35
+ "Predictions + gold labels",
36
+ "Aggregated structured data",
37
+ ],
38
+ "Output": [
39
+ "Normalized case bundle",
40
+ "Structured extraction (root cause, sentiment, risk, reco, evidence)",
41
+ "Gate decision + review queue assignment + reason codes",
42
+ "Trace logs, evidence links, version records, JSONL audit trail",
43
+ "Metrics, failure mode library, regression tests, markdown report",
44
+ "Cross-tabs, top drivers, exportable briefings",
45
+ ],
46
+ "This Repo": [
47
+ "pipeline/loaders.py + normalize.py",
48
+ "pipeline/extract.py + schemas.py",
49
+ "pipeline/gate.py",
50
+ "pipeline/storage.py (trace_logs table + JSONL)",
51
+ "eval/metrics.py + failure_modes.py + run_eval.py",
52
+ "app/pages/ (Streamlit)",
53
+ ],
54
+ })
55
+ st.dataframe(modules, use_container_width=True, hide_index=True)
56
+
57
+ # --- Module Interfaces ---
58
+ st.header("Key Interfaces")
59
+
60
+ st.subheader("1. Case Bundle (Input)")
61
+ st.code("""
62
+ CaseBundle:
63
+ case_id: str
64
+ ticket_text: str # required
65
+ email_thread: list[str]
66
+ conversation_snippet: str
67
+ vip_tier: str # standard | vip | unknown
68
+ priority: str # low | medium | high | critical | unknown
69
+ handle_time_minutes: float
70
+ churned_within_30d: bool
71
+ """, language="python")
72
+
73
+ st.subheader("2. Extraction Output")
74
+ st.code("""
75
+ ExtractionOutput:
76
+ root_cause_l1: str
77
+ root_cause_l2: str
78
+ sentiment_score: float # -1.0 to 1.0
79
+ risk_level: str # low | medium | high | critical
80
+ review_required: bool
81
+ next_best_actions: list[str]
82
+ evidence_quotes: list[str] # must quote source text
83
+ confidence: float # 0.0 to 1.0
84
+ churn_risk: float # 0.0 to 1.0
85
+ """, language="python")
86
+
87
+ st.subheader("3. Gate Decision")
88
+ st.code("""
89
+ GateDecision:
90
+ route: str # auto | review
91
+ reasons: list[str] # human-readable
92
+ review_reason_codes: list[str] # machine-readable
93
+ """, language="python")
94
+
95
+ # --- Adjacent Use Cases ---
96
+ st.header("Adjacent Use Cases")
97
+
98
+ use_cases = pd.DataFrame({
99
+ "Industry": ["Healthcare", "E-commerce", "Insurance", "Manufacturing"],
100
+ "Input Data": [
101
+ "Intake notes, triage forms, patient messages",
102
+ "Post-sale tickets, returns, reviews",
103
+ "Claims forms, adjuster notes, police reports",
104
+ "Field repair logs, maintenance tickets",
105
+ ],
106
+ "Structuring Task": [
107
+ "Risk stratification, triage routing, urgency classification",
108
+ "Return root cause, experience defect aggregation",
109
+ "Claim classification, missing info detection, fraud signals",
110
+ "Fault attribution, spare parts prediction, escalation routing",
111
+ ],
112
+ "Key Difference": [
113
+ "Stronger compliance (HIPAA), higher stakes",
114
+ "Higher volume, lower risk per case",
115
+ "Document-heavy, multi-step verification",
116
+ "Domain-specific vocabulary, equipment codes",
117
+ ],
118
+ })
119
+ st.dataframe(use_cases, use_container_width=True, hide_index=True)
120
+
121
+ # --- Production Roadmap ---
122
+ st.header("Production Roadmap")
123
+
124
+ st.markdown("""
125
+ This is a strategy, not an implementation plan.
126
+
127
+ | Phase | What | Why |
128
+ |-------|------|-----|
129
+ | **Auth & RBAC** | User roles: analyst, reviewer, admin | Control who sees what, who can approve |
130
+ | **Real data connectors** | Zendesk, ServiceNow, Salesforce adapters | Replace synthetic ingestion with live data |
131
+ | **Model evaluation loop** | A/B prompt versions, automated regression | Catch quality regressions before they reach users |
132
+ | **Feedback integration** | Reviewer edits flow back to eval set | Close the loop — human corrections improve the system |
133
+ | **Monitoring & alerting** | Schema fail rate, drift detection, latency SLOs | Know when the system degrades before users complain |
134
+ | **Compliance & audit** | Immutable trace logs, data retention policies | Enterprise requirement for regulated industries |
135
+ """)
136
+
137
+ # --- What we actually built ---
138
+ st.header("What We Actually Built & Measured")
139
+
140
+ db_path = Path("data/processed/results.db")
141
+ if db_path.exists():
142
+ from pipeline.storage import get_all_extractions, get_review_queue
143
+ all_ext = get_all_extractions()
144
+ review_q = get_review_queue()
145
+
146
+ c1, c2, c3 = st.columns(3)
147
+ c1.metric("Cases Processed", len(all_ext))
148
+ c2.metric("Auto-Routed", len(all_ext) - len(review_q))
149
+ c3.metric("Sent to Review", len(review_q))
150
+
151
+ # Run quick eval if we have data
152
+ if all_ext:
153
+ from eval.metrics import schema_pass_rate, evidence_coverage_rate
154
+ ext_dicts = []
155
+ for e in all_ext:
156
+ import json
157
+ d = dict(e)
158
+ for field in ("next_best_actions", "evidence_quotes"):
159
+ if d.get(field) and isinstance(d[field], str):
160
+ try:
161
+ d[field] = json.loads(d[field])
162
+ except (json.JSONDecodeError, TypeError):
163
+ pass
164
+ ext_dicts.append(d)
165
+
166
+ c4, c5 = st.columns(2)
167
+ c4.metric("Schema Pass Rate", f"{schema_pass_rate(ext_dicts):.0%}")
168
+ c5.metric("Evidence Coverage", f"{evidence_coverage_rate(ext_dicts):.0%}")
169
+ else:
170
+ st.info("Run the pipeline to see measured results here.")
app/pages/5_Executive_Summary.py ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 5 — Executive Summary: C-suite view of operational insight.
2
+
3
+ This page answers the questions a COO/CXO actually asks:
4
+ - What are the top drivers of VIP churn?
5
+ - How much of our review workload can be automated?
6
+ - Where should we intervene first?
7
+ """
8
+ import sys
9
+ import json
10
+ import sqlite3
11
+ from pathlib import Path
12
+
13
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
14
+
15
+ import streamlit as st
16
+ import pandas as pd
17
+
18
+ from pipeline.storage import get_all_extractions, get_review_queue
19
+
20
+ st.set_page_config(page_title="Executive Summary", layout="wide")
21
+
22
+ DB_PATH = Path("data/processed/results.db")
23
+
24
+ if not DB_PATH.exists():
25
+ st.warning("No pipeline results yet. Run `PYTHONPATH=. python scripts/run_pipeline.py --mock` first.")
26
+ st.stop()
27
+
28
+ # ---------------------------------------------------------------------------
29
+ # Load data
30
+ # ---------------------------------------------------------------------------
31
+
32
+ conn = sqlite3.connect(DB_PATH)
33
+ conn.row_factory = sqlite3.Row
34
+
35
+ cases = [dict(r) for r in conn.execute("SELECT * FROM cases").fetchall()]
36
+ extractions = [dict(r) for r in conn.execute("SELECT * FROM extractions").fetchall()]
37
+
38
+ # Join cases + extractions
39
+ case_map = {c["case_id"]: c for c in cases}
40
+ joined = []
41
+ for ext in extractions:
42
+ c = case_map.get(ext["case_id"], {})
43
+ joined.append({**c, **ext})
44
+
45
+ conn.close()
46
+
47
+ if not joined:
48
+ st.info("No data available. Run the pipeline first.")
49
+ st.stop()
50
+
51
+ df = pd.DataFrame(joined)
52
+
53
+ # ---------------------------------------------------------------------------
54
+ # Header
55
+ # ---------------------------------------------------------------------------
56
+
57
+ st.title("Executive Summary")
58
+ st.markdown(
59
+ "One-glance operational intelligence for leadership. "
60
+ "Every number below is backed by structured extraction with evidence citations — "
61
+ "not manual tagging."
62
+ )
63
+ st.markdown("---")
64
+
65
+ # ---------------------------------------------------------------------------
66
+ # KPI Row: the 4 numbers a COO cares about
67
+ # ---------------------------------------------------------------------------
68
+
69
+ total_cases = len(df)
70
+ auto_count = len(df[df["gate_route"] == "auto"])
71
+ review_count = total_cases - auto_count
72
+ automation_rate = auto_count / total_cases if total_cases else 0
73
+ churn_cases = len(df[df["churned_within_30d"] == 1])
74
+ churn_rate = churn_cases / total_cases if total_cases else 0
75
+ vip_cases = len(df[df["vip_tier"] == "vip"]) if "vip_tier" in df.columns else 0
76
+ avg_handle = df["handle_time_minutes"].mean() if "handle_time_minutes" in df.columns else 0
77
+
78
+ k1, k2, k3, k4 = st.columns(4)
79
+ k1.metric("Automation Rate", f"{automation_rate:.0%}",
80
+ help="% of cases safely auto-routed without human review")
81
+ k2.metric("Cases in Review Queue", f"{review_count}",
82
+ help="Cases flagged for human review by gate logic")
83
+ k3.metric("30-Day Churn Rate", f"{churn_rate:.0%}",
84
+ help="% of customers who churned within 30 days")
85
+ k4.metric("Avg Handle Time", f"{avg_handle:.0f} min",
86
+ help="Average time from ticket open to resolution")
87
+
88
+ st.markdown("---")
89
+
90
+ # ---------------------------------------------------------------------------
91
+ # Section 1: Top Churn Drivers
92
+ # ---------------------------------------------------------------------------
93
+
94
+ st.header("Top Churn Drivers")
95
+ st.caption("Root causes most associated with customer churn — ranked by frequency among churned accounts")
96
+
97
+ churned_df = df[df["churned_within_30d"] == 1]
98
+
99
+ if len(churned_df) > 0 and "root_cause_l1" in churned_df.columns:
100
+ churn_drivers = churned_df["root_cause_l1"].value_counts().reset_index()
101
+ churn_drivers.columns = ["Root Cause", "Churned Cases"]
102
+ churn_drivers["% of Churn"] = (churn_drivers["Churned Cases"] / len(churned_df) * 100).round(1)
103
+
104
+ col_chart, col_table = st.columns([2, 1])
105
+ with col_chart:
106
+ st.bar_chart(churn_drivers.set_index("Root Cause")["Churned Cases"])
107
+ with col_table:
108
+ st.dataframe(churn_drivers, hide_index=True, use_container_width=True)
109
+ else:
110
+ st.info("No churned cases in current dataset to analyze drivers.")
111
+
112
+ # ---------------------------------------------------------------------------
113
+ # Section 2: VIP Risk Heat Map
114
+ # ---------------------------------------------------------------------------
115
+
116
+ st.markdown("---")
117
+ st.header("VIP Risk Overview")
118
+ st.caption("VIP customers by risk level and churn status — where to intervene first")
119
+
120
+ if "vip_tier" in df.columns and "risk_level" in df.columns:
121
+ vip_df = df[df["vip_tier"] == "vip"]
122
+ if len(vip_df) > 0:
123
+ vip_summary = vip_df.groupby(["risk_level", "churned_within_30d"]).size().reset_index(name="Count")
124
+ vip_summary["Churn Status"] = vip_summary["churned_within_30d"].map({0: "Retained", 1: "Churned"})
125
+
126
+ v1, v2, v3 = st.columns(3)
127
+ v1.metric("Total VIP Cases", len(vip_df))
128
+ vip_churned = len(vip_df[vip_df["churned_within_30d"] == 1])
129
+ v2.metric("VIP Churned", vip_churned)
130
+ vip_high_risk = len(vip_df[vip_df["risk_level"].isin(["high", "critical"])])
131
+ v3.metric("VIP High/Critical Risk", vip_high_risk)
132
+
133
+ # Cross-tab: risk level × churn
134
+ if len(vip_df) > 1:
135
+ cross = pd.crosstab(vip_df["risk_level"], vip_df["churned_within_30d"].map({0: "Retained", 1: "Churned"}))
136
+ st.dataframe(cross, use_container_width=True)
137
+ else:
138
+ st.info("No VIP cases in current dataset.")
139
+ else:
140
+ st.info("VIP tier data not available.")
141
+
142
+ # ---------------------------------------------------------------------------
143
+ # Section 3: Priority × Risk Distribution
144
+ # ---------------------------------------------------------------------------
145
+
146
+ st.markdown("---")
147
+ st.header("Priority vs. Risk Alignment")
148
+ st.caption("Are high-priority tickets actually high-risk? Misalignment = triage failure")
149
+
150
+ if "priority" in df.columns and "risk_level" in df.columns:
151
+ priority_order = ["low", "medium", "high", "critical"]
152
+ risk_order = ["low", "medium", "high", "critical"]
153
+
154
+ cross_pr = pd.crosstab(
155
+ df["priority"].astype(pd.CategoricalDtype(priority_order, ordered=True)),
156
+ df["risk_level"].astype(pd.CategoricalDtype(risk_order, ordered=True)),
157
+ )
158
+ st.dataframe(cross_pr, use_container_width=True)
159
+
160
+ # Flag misalignments
161
+ misaligned = df[
162
+ ((df["priority"] == "low") & (df["risk_level"].isin(["high", "critical"]))) |
163
+ ((df["priority"] == "critical") & (df["risk_level"] == "low"))
164
+ ]
165
+ if len(misaligned) > 0:
166
+ st.warning(
167
+ f"**{len(misaligned)} cases** show priority/risk misalignment. "
168
+ "These are either under-prioritized high-risk tickets or over-prioritized low-risk ones. "
169
+ "Review recommended."
170
+ )
171
+
172
+ # ---------------------------------------------------------------------------
173
+ # Section 4: Review Queue Breakdown
174
+ # ---------------------------------------------------------------------------
175
+
176
+ st.markdown("---")
177
+ st.header("Review Queue Analysis")
178
+ st.caption("Why are cases going to human review? Understanding trigger patterns optimizes staffing")
179
+
180
+ review_df = df[df["gate_route"] == "review"]
181
+
182
+ if len(review_df) > 0:
183
+ from collections import Counter
184
+ reason_counts = Counter()
185
+ for _, row in review_df.iterrows():
186
+ codes = row.get("review_reason_codes", "[]")
187
+ if isinstance(codes, str):
188
+ try:
189
+ codes = json.loads(codes)
190
+ except (json.JSONDecodeError, TypeError):
191
+ codes = []
192
+ for code in codes:
193
+ reason_counts[code] += 1
194
+
195
+ if reason_counts:
196
+ reason_df = pd.DataFrame(
197
+ [{"Trigger Rule": k, "Cases Triggered": v} for k, v in reason_counts.most_common()]
198
+ )
199
+ st.bar_chart(reason_df.set_index("Trigger Rule"))
200
+ st.dataframe(reason_df, hide_index=True, use_container_width=True)
201
+ else:
202
+ st.info("Review cases present but no reason codes recorded.")
203
+ else:
204
+ st.info(
205
+ "All cases auto-routed (0 in review queue). "
206
+ "With mock data, the fixed extraction output passes all gate rules. "
207
+ "Run with a real provider to see meaningful review routing."
208
+ )
209
+
210
+ # ---------------------------------------------------------------------------
211
+ # Section 5: Operational Efficiency Summary
212
+ # ---------------------------------------------------------------------------
213
+
214
+ st.markdown("---")
215
+ st.header("Operational Efficiency")
216
+
217
+ # Time savings estimate
218
+ MANUAL_MINUTES_PER_TICKET = 15 # industry benchmark: manual read + tag + route
219
+ AI_MINUTES_PER_TICKET = 0.5 # AI extraction + human spot-check for auto-routed
220
+ REVIEW_MINUTES_PER_TICKET = 5 # human review with AI pre-analysis
221
+
222
+ manual_total = total_cases * MANUAL_MINUTES_PER_TICKET
223
+ ai_total = auto_count * AI_MINUTES_PER_TICKET + review_count * REVIEW_MINUTES_PER_TICKET
224
+ time_saved = manual_total - ai_total
225
+ time_saved_pct = time_saved / manual_total if manual_total else 0
226
+
227
+ e1, e2, e3, e4 = st.columns(4)
228
+ e1.metric("Manual Process", f"{manual_total:.0f} min",
229
+ help=f"{total_cases} cases x {MANUAL_MINUTES_PER_TICKET} min/case (industry benchmark)")
230
+ e2.metric("AI-Assisted Process", f"{ai_total:.0f} min",
231
+ help=f"{auto_count} auto x {AI_MINUTES_PER_TICKET} min + {review_count} review x {REVIEW_MINUTES_PER_TICKET} min")
232
+ e3.metric("Time Saved", f"{time_saved:.0f} min",
233
+ delta=f"{time_saved_pct:.0%} reduction")
234
+ ai_minutes_per_case = ai_total / total_cases if total_cases else 0
235
+ projected_savings_hrs = 10000 * (MANUAL_MINUTES_PER_TICKET - ai_minutes_per_case) / 60
236
+ e4.metric("Monthly Projection (10k cases)", f"{projected_savings_hrs:.0f} hrs saved",
237
+ help=f"10,000 cases × ({MANUAL_MINUTES_PER_TICKET} - {ai_minutes_per_case:.1f}) min/case ÷ 60")
238
+
239
+ st.markdown("---")
240
+
241
+ # ---------------------------------------------------------------------------
242
+ # Key Insight Callout
243
+ # ---------------------------------------------------------------------------
244
+
245
+ st.header("Key Insight for Leadership")
246
+ st.markdown(f"""
247
+ > **At current automation rate ({automation_rate:.0%})**, the system can process
248
+ > **{auto_count} of {total_cases} cases** without human intervention.
249
+ > Each auto-routed case saves ~{MANUAL_MINUTES_PER_TICKET - AI_MINUTES_PER_TICKET:.0f} minutes of analyst time.
250
+ >
251
+ > **Top action items:**
252
+ > 1. Investigate top churn drivers — root causes driving the most customer loss
253
+ > 2. Review VIP cases flagged as high-risk — highest-value intervention targets
254
+ > 3. Address priority/risk misalignments — triage process may need calibration
255
+ >
256
+ > *All insights are auditable: every extraction includes evidence quotes from source text,
257
+ > and every routing decision has machine-readable reason codes.*
258
+ """)
259
+
260
+ st.caption("Data provenance: Results reflect current pipeline run. Mock data shows system structure; real provider data shows model quality.")
app/pages/6_ROI_Model.py ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 6 — ROI Model: quantified business case for AI-assisted support operations.
2
+
3
+ This page answers the CFO question: "What does this save us?"
4
+ Interactive sliders let stakeholders model their own scale assumptions.
5
+ """
6
+ import sys
7
+ from pathlib import Path
8
+
9
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
10
+
11
+ import streamlit as st
12
+ import pandas as pd
13
+ import sqlite3
14
+
15
+ DB_PATH = Path("data/processed/results.db")
16
+
17
+ st.set_page_config(page_title="ROI Model", layout="wide")
18
+ st.title("ROI Model")
19
+ st.markdown(
20
+ "Interactive cost-benefit analysis comparing manual support operations "
21
+ "to AI-assisted extraction and routing. Adjust assumptions with the sliders below."
22
+ )
23
+ st.markdown("---")
24
+
25
+ # ---------------------------------------------------------------------------
26
+ # Load actuals from pipeline (for grounding the model in real data)
27
+ # ---------------------------------------------------------------------------
28
+
29
+ actual_automation_rate = 0.5 # conservative default before data loads
30
+ actual_avg_handle_time = 30.0
31
+ actual_total_cases = 0
32
+ actual_review_rate = 0.5
33
+
34
+ if DB_PATH.exists():
35
+ conn = sqlite3.connect(DB_PATH)
36
+ conn.row_factory = sqlite3.Row
37
+ exts = [dict(r) for r in conn.execute("SELECT * FROM extractions").fetchall()]
38
+ cases = [dict(r) for r in conn.execute("SELECT * FROM cases").fetchall()]
39
+ conn.close()
40
+
41
+ if exts:
42
+ actual_total_cases = len(exts)
43
+ auto_count = sum(1 for e in exts if e.get("gate_route") == "auto")
44
+ actual_automation_rate = auto_count / len(exts)
45
+ actual_review_rate = 1 - actual_automation_rate
46
+
47
+ if cases:
48
+ handle_times = [c["handle_time_minutes"] for c in cases if c.get("handle_time_minutes")]
49
+ if handle_times:
50
+ actual_avg_handle_time = sum(handle_times) / len(handle_times)
51
+
52
+ # ---------------------------------------------------------------------------
53
+ # Sidebar: Assumptions (interactive)
54
+ # ---------------------------------------------------------------------------
55
+
56
+ st.sidebar.header("Model Assumptions")
57
+ st.sidebar.caption("Adjust these to match your organization's scale")
58
+
59
+ monthly_volume = st.sidebar.slider(
60
+ "Monthly ticket volume", 1000, 100000, 10000, step=1000,
61
+ help="Total support tickets per month"
62
+ )
63
+ analyst_hourly_cost = st.sidebar.slider(
64
+ "Analyst cost ($/hour)", 15, 80, 35,
65
+ help="Fully loaded cost per support analyst hour"
66
+ )
67
+ manual_minutes = st.sidebar.slider(
68
+ "Manual processing time (min/ticket)", 5, 30, 15,
69
+ help="Time to manually read, classify, tag, route, and document one ticket"
70
+ )
71
+ ai_auto_minutes = st.sidebar.slider(
72
+ "AI auto-route time (min/ticket)", 0.1, 3.0, 0.5, step=0.1,
73
+ help="Time for AI extraction + auto-routing (no human touch)"
74
+ )
75
+ ai_review_minutes = st.sidebar.slider(
76
+ "AI-assisted review time (min/ticket)", 2, 15, 5,
77
+ help="Time for human review with AI pre-analysis (vs. starting from scratch)"
78
+ )
79
+ api_cost_per_case = st.sidebar.slider(
80
+ "API cost per extraction ($)", 0.001, 0.10, 0.01, step=0.001, format="%.3f",
81
+ help="Claude API cost per structured extraction call"
82
+ )
83
+ automation_rate = st.sidebar.slider(
84
+ "Automation rate (%)", 0, 100, int(actual_automation_rate * 100),
85
+ help=f"Pipeline actual: {actual_automation_rate:.0%}. Higher = more cases auto-routed"
86
+ ) / 100
87
+
88
+ st.sidebar.markdown("---")
89
+ st.sidebar.caption(
90
+ f"Pipeline actuals: {actual_total_cases} cases processed, "
91
+ f"{actual_automation_rate:.0%} auto-routed, "
92
+ f"{actual_avg_handle_time:.0f} min avg handle time"
93
+ )
94
+
95
+ # ---------------------------------------------------------------------------
96
+ # Cost Calculations
97
+ # ---------------------------------------------------------------------------
98
+
99
+ # Manual baseline
100
+ manual_hours = monthly_volume * manual_minutes / 60
101
+ manual_cost = manual_hours * analyst_hourly_cost
102
+
103
+ # AI-assisted
104
+ auto_cases = int(monthly_volume * automation_rate)
105
+ review_cases = monthly_volume - auto_cases
106
+ ai_labor_hours = (auto_cases * ai_auto_minutes + review_cases * ai_review_minutes) / 60
107
+ ai_labor_cost = ai_labor_hours * analyst_hourly_cost
108
+ ai_api_cost = monthly_volume * api_cost_per_case
109
+ ai_infra_cost = 500 # fixed monthly: hosting, monitoring, logging
110
+ ai_total_cost = ai_labor_cost + ai_api_cost + ai_infra_cost
111
+
112
+ # Savings
113
+ monthly_savings = manual_cost - ai_total_cost
114
+ annual_savings = monthly_savings * 12
115
+ roi_pct = (monthly_savings / ai_total_cost * 100) if ai_total_cost > 0 else 0
116
+ hours_saved = manual_hours - ai_labor_hours
117
+
118
+ # ---------------------------------------------------------------------------
119
+ # Display: Side-by-side comparison
120
+ # ---------------------------------------------------------------------------
121
+
122
+ st.header("Monthly Cost Comparison")
123
+
124
+ col_manual, col_ai = st.columns(2)
125
+
126
+ with col_manual:
127
+ st.subheader("Manual Process")
128
+ st.metric("Labor Hours", f"{manual_hours:,.0f} hrs")
129
+ st.metric("Labor Cost", f"${manual_cost:,.0f}")
130
+ st.metric("API Cost", "$0")
131
+ st.metric("Infrastructure", "$0")
132
+ st.markdown("---")
133
+ st.metric("**Total Monthly Cost**", f"${manual_cost:,.0f}")
134
+
135
+ with col_ai:
136
+ st.subheader("AI-Assisted Process")
137
+ st.metric("Labor Hours", f"{ai_labor_hours:,.0f} hrs",
138
+ delta=f"-{manual_hours - ai_labor_hours:,.0f} hrs", delta_color="inverse")
139
+ st.metric("Labor Cost", f"${ai_labor_cost:,.0f}",
140
+ delta=f"-${manual_cost - ai_labor_cost:,.0f}", delta_color="inverse")
141
+ st.metric("API Cost", f"${ai_api_cost:,.0f}")
142
+ st.metric("Infrastructure", f"${ai_infra_cost:,.0f}")
143
+ st.markdown("---")
144
+ st.metric("**Total Monthly Cost**", f"${ai_total_cost:,.0f}",
145
+ delta=f"-${monthly_savings:,.0f}", delta_color="inverse")
146
+
147
+ # ---------------------------------------------------------------------------
148
+ # Savings Summary
149
+ # ---------------------------------------------------------------------------
150
+
151
+ st.markdown("---")
152
+ st.header("Savings Summary")
153
+
154
+ s1, s2, s3, s4 = st.columns(4)
155
+ s1.metric("Monthly Savings", f"${monthly_savings:,.0f}")
156
+ s2.metric("Annual Savings", f"${annual_savings:,.0f}")
157
+ s3.metric("ROI", f"{roi_pct:,.0f}%",
158
+ help="(Monthly savings / AI total cost) × 100")
159
+ s4.metric("Hours Freed / Month", f"{hours_saved:,.0f} hrs",
160
+ help="Analyst hours redirected to higher-value work")
161
+
162
+ # ---------------------------------------------------------------------------
163
+ # Break-even analysis
164
+ # ---------------------------------------------------------------------------
165
+
166
+ st.markdown("---")
167
+ st.header("Break-Even Analysis")
168
+
169
+ # At what automation rate does AI become cost-neutral?
170
+ st.markdown("**How does savings change with automation rate?**")
171
+
172
+ breakeven_data = []
173
+ for rate in range(0, 101, 5):
174
+ r = rate / 100
175
+ auto_c = int(monthly_volume * r)
176
+ review_c = monthly_volume - auto_c
177
+ labor_h = (auto_c * ai_auto_minutes + review_c * ai_review_minutes) / 60
178
+ labor_c = labor_h * analyst_hourly_cost
179
+ total_c = labor_c + ai_api_cost + ai_infra_cost
180
+ saving = manual_cost - total_c
181
+ breakeven_data.append({
182
+ "Automation Rate": f"{rate}%",
183
+ "rate_num": rate,
184
+ "Monthly Savings ($)": saving,
185
+ })
186
+
187
+ be_df = pd.DataFrame(breakeven_data)
188
+ st.line_chart(be_df.set_index("rate_num")["Monthly Savings ($)"])
189
+
190
+ # Find break-even point
191
+ breakeven_row = next((d for d in breakeven_data if d["Monthly Savings ($)"] >= 0), None)
192
+ if breakeven_row:
193
+ st.success(
194
+ f"Break-even at **{breakeven_row['Automation Rate']}** automation rate. "
195
+ f"Current pipeline achieves **{automation_rate:.0%}**."
196
+ )
197
+ else:
198
+ st.warning("AI-assisted process is more expensive at all automation rates with current assumptions.")
199
+
200
+ # ---------------------------------------------------------------------------
201
+ # Scale projection table
202
+ # ---------------------------------------------------------------------------
203
+
204
+ st.markdown("---")
205
+ st.header("Scale Projections")
206
+ st.caption("How savings scale with ticket volume (holding other assumptions constant)")
207
+
208
+ scale_data = []
209
+ for vol in [1000, 5000, 10000, 25000, 50000, 100000]:
210
+ m_cost = vol * manual_minutes / 60 * analyst_hourly_cost
211
+ a_auto = int(vol * automation_rate)
212
+ a_rev = vol - a_auto
213
+ a_labor = (a_auto * ai_auto_minutes + a_rev * ai_review_minutes) / 60 * analyst_hourly_cost
214
+ a_api = vol * api_cost_per_case
215
+ a_total = a_labor + a_api + ai_infra_cost
216
+ scale_data.append({
217
+ "Monthly Volume": f"{vol:,}",
218
+ "Manual Cost": f"${m_cost:,.0f}",
219
+ "AI Cost": f"${a_total:,.0f}",
220
+ "Monthly Savings": f"${m_cost - a_total:,.0f}",
221
+ "Annual Savings": f"${(m_cost - a_total) * 12:,.0f}",
222
+ "FTEs Freed": f"{(vol * manual_minutes / 60 - (a_auto * ai_auto_minutes + a_rev * ai_review_minutes) / 60) / 160:.1f}",
223
+ })
224
+
225
+ scale_df = pd.DataFrame(scale_data)
226
+ st.dataframe(scale_df, hide_index=True, use_container_width=True)
227
+
228
+ # ---------------------------------------------------------------------------
229
+ # Qualitative benefits
230
+ # ---------------------------------------------------------------------------
231
+
232
+ st.markdown("---")
233
+ st.header("Beyond Cost: Qualitative Benefits")
234
+
235
+ q1, q2, q3 = st.columns(3)
236
+
237
+ with q1:
238
+ st.markdown("**Consistency**")
239
+ st.markdown(
240
+ "Manual classification varies 20-40% across analysts (industry benchmark). "
241
+ "AI extraction applies the same schema to every case. "
242
+ "Remaining variance is in the data, not the tagger."
243
+ )
244
+
245
+ with q2:
246
+ st.markdown("**Speed to Insight**")
247
+ st.markdown(
248
+ "Manual: monthly retrospective reports, weeks-old data. "
249
+ "AI-assisted: real-time dashboard with structured data available "
250
+ "within seconds of ticket ingestion."
251
+ )
252
+
253
+ with q3:
254
+ st.markdown("**Auditability**")
255
+ st.markdown(
256
+ "Every extraction includes evidence quotes from source text. "
257
+ "Every routing decision has machine-readable reason codes. "
258
+ "Every pipeline run is logged to JSONL trace files. "
259
+ "Compliance teams can audit any decision."
260
+ )
261
+
262
+ st.markdown("---")
263
+ st.caption(
264
+ "Assumptions are adjustable via the sidebar. "
265
+ "API costs based on Claude Sonnet pricing. "
266
+ "FTE calculation assumes 160 working hours/month."
267
+ )
app/pages/7_Data_Quality.py ADDED
@@ -0,0 +1,325 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 7 — Data Quality Analysis: EDA of raw inputs before AI extraction.
2
+
3
+ This page demonstrates the forward-deployed mindset: before building models,
4
+ understand the data you're working with. Noise, missing fields, language mix,
5
+ and length distributions all affect extraction quality.
6
+ """
7
+ import sys
8
+ import json
9
+ import sqlite3
10
+ import re
11
+ from pathlib import Path
12
+ from collections import Counter
13
+
14
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
15
+
16
+ import streamlit as st
17
+ import pandas as pd
18
+
19
+ st.set_page_config(page_title="Data Quality Analysis", layout="wide")
20
+ st.title("Data Quality Analysis")
21
+ st.markdown(
22
+ "Understanding input data quality before extraction. "
23
+ "In a forward-deployed engagement, this analysis happens in week 1 — "
24
+ "it determines prompt design, validation rules, and reliability thresholds."
25
+ )
26
+ st.markdown("---")
27
+
28
+ # ---------------------------------------------------------------------------
29
+ # Load case bundles
30
+ # ---------------------------------------------------------------------------
31
+
32
+ CASES_DIR = Path("data/cases")
33
+ DB_PATH = Path("data/processed/results.db")
34
+
35
+ case_files = sorted(CASES_DIR.glob("*.json")) if CASES_DIR.exists() else []
36
+
37
+ if not case_files:
38
+ st.warning("No case bundles found. Run `PYTHONPATH=. python scripts/build_cases.py` first.")
39
+ st.stop()
40
+
41
+ cases = []
42
+ for f in case_files:
43
+ with open(f) as fh:
44
+ cases.append(json.load(fh))
45
+
46
+ df = pd.DataFrame(cases)
47
+
48
+ st.success(f"Loaded **{len(cases)}** case bundles from `data/cases/`")
49
+
50
+ # ---------------------------------------------------------------------------
51
+ # Section 1: Dataset Composition
52
+ # ---------------------------------------------------------------------------
53
+
54
+ st.header("Dataset Composition")
55
+ st.caption("Where does the data come from? What mix of sources feeds the pipeline?")
56
+
57
+ c1, c2, c3 = st.columns(3)
58
+
59
+ with c1:
60
+ st.markdown("**By Source Dataset**")
61
+ source_counts = df["source_dataset"].value_counts()
62
+ st.bar_chart(source_counts)
63
+
64
+ with c2:
65
+ st.markdown("**By Language**")
66
+ lang_counts = df["language"].value_counts()
67
+ st.bar_chart(lang_counts)
68
+
69
+ with c3:
70
+ st.markdown("**By Priority**")
71
+ priority_order = ["low", "medium", "high", "critical", "unknown"]
72
+ if "priority" in df.columns:
73
+ prio_counts = df["priority"].value_counts().reindex(
74
+ [p for p in priority_order if p in df["priority"].values]
75
+ )
76
+ st.bar_chart(prio_counts)
77
+
78
+ # Summary table
79
+ st.markdown("---")
80
+ composition = pd.DataFrame({
81
+ "Dimension": ["Sources", "Languages", "Priorities", "VIP Tiers"],
82
+ "Unique Values": [
83
+ df["source_dataset"].nunique(),
84
+ df["language"].nunique(),
85
+ df["priority"].nunique() if "priority" in df.columns else 0,
86
+ df["vip_tier"].nunique() if "vip_tier" in df.columns else 0,
87
+ ],
88
+ "Distribution": [
89
+ ", ".join(f"{k}: {v}" for k, v in df["source_dataset"].value_counts().items()),
90
+ ", ".join(f"{k}: {v}" for k, v in df["language"].value_counts().items()),
91
+ ", ".join(f"{k}: {v}" for k, v in df["priority"].value_counts().items()) if "priority" in df.columns else "—",
92
+ ", ".join(f"{k}: {v}" for k, v in df["vip_tier"].value_counts().items()) if "vip_tier" in df.columns else "—",
93
+ ],
94
+ })
95
+ st.dataframe(composition, hide_index=True, use_container_width=True)
96
+
97
+ # ---------------------------------------------------------------------------
98
+ # Section 2: Text Length Distribution
99
+ # ---------------------------------------------------------------------------
100
+
101
+ st.markdown("---")
102
+ st.header("Text Length Distribution")
103
+ st.caption(
104
+ "Short inputs lack context for high-confidence extraction. "
105
+ "This analysis directly informed our prompt v2 rule: "
106
+ "cap confidence at 0.7 for inputs under 30 words."
107
+ )
108
+
109
+ df["text_length_chars"] = df["ticket_text"].str.len()
110
+ df["text_length_words"] = df["ticket_text"].str.split().str.len()
111
+
112
+ l1, l2 = st.columns(2)
113
+
114
+ with l1:
115
+ st.markdown("**Character count distribution**")
116
+ st.bar_chart(df["text_length_chars"].value_counts(bins=15).sort_index())
117
+ st.caption(
118
+ f"Min: {df['text_length_chars'].min()} · "
119
+ f"Max: {df['text_length_chars'].max()} · "
120
+ f"Median: {df['text_length_chars'].median():.0f} · "
121
+ f"Mean: {df['text_length_chars'].mean():.0f}"
122
+ )
123
+
124
+ with l2:
125
+ st.markdown("**Word count distribution**")
126
+ st.bar_chart(df["text_length_words"].value_counts(bins=15).sort_index())
127
+ st.caption(
128
+ f"Min: {df['text_length_words'].min()} · "
129
+ f"Max: {df['text_length_words'].max()} · "
130
+ f"Median: {df['text_length_words'].median():.0f} · "
131
+ f"Mean: {df['text_length_words'].mean():.0f}"
132
+ )
133
+
134
+ # Flag short inputs
135
+ short_threshold = 30
136
+ short_cases = df[df["text_length_words"] < short_threshold]
137
+ if len(short_cases) > 0:
138
+ st.warning(
139
+ f"**{len(short_cases)} cases ({len(short_cases)/len(df)*100:.0f}%)** "
140
+ f"have fewer than {short_threshold} words. "
141
+ f"These are high-risk for overconfident extraction. "
142
+ f"Prompt v2 caps confidence at 0.7 for these cases."
143
+ )
144
+ with st.expander(f"View {len(short_cases)} short cases"):
145
+ for _, row in short_cases.iterrows():
146
+ st.markdown(
147
+ f"**{row['case_id']}** ({row['text_length_words']} words) — "
148
+ f"`{row['source_dataset']}`"
149
+ )
150
+ st.text(row["ticket_text"][:200])
151
+ st.markdown("---")
152
+
153
+ # ---------------------------------------------------------------------------
154
+ # Section 3: Text Quality Signals
155
+ # ---------------------------------------------------------------------------
156
+
157
+ st.markdown("---")
158
+ st.header("Text Quality Signals")
159
+ st.caption("Noise patterns that affect extraction quality — detected programmatically")
160
+
161
+
162
+ def analyze_text_quality(text: str) -> dict:
163
+ """Detect quality signals in a text input."""
164
+ signals = {}
165
+ # Encoding artifacts
166
+ signals["encoding_artifacts"] = bool(re.search(r"[äöüé]|\\u[0-9a-fA-F]{4}|&#\d+;", text))
167
+ # Excessive whitespace
168
+ signals["excessive_whitespace"] = bool(re.search(r"\n{3,}|\s{4,}", text))
169
+ # Template placeholders
170
+ signals["has_placeholders"] = bool(re.search(r"\{\{.*?\}\}|<name>|\[NAME\]|\[REDACTED\]", text))
171
+ # All caps segments (shouting)
172
+ signals["has_shouting"] = bool(re.search(r"\b[A-Z]{5,}\b", text))
173
+ # Email headers
174
+ signals["has_email_headers"] = bool(re.search(r"(From:|To:|Subject:|Date:)", text))
175
+ # Contains non-ASCII (multilingual)
176
+ signals["non_ascii"] = bool(re.search(r"[^\x00-\x7F]", text))
177
+ # Very short
178
+ signals["very_short"] = len(text.split()) < 30
179
+ # Contains numbers / IDs
180
+ signals["contains_ids"] = bool(re.search(r"(ticket|case|order|ref)[\s#:-]*\d+", text, re.I))
181
+ return signals
182
+
183
+
184
+ quality_results = []
185
+ for _, row in df.iterrows():
186
+ signals = analyze_text_quality(row["ticket_text"])
187
+ signals["case_id"] = row["case_id"]
188
+ quality_results.append(signals)
189
+
190
+ quality_df = pd.DataFrame(quality_results)
191
+
192
+ # Summary metrics
193
+ signal_cols = [c for c in quality_df.columns if c != "case_id"]
194
+ signal_summary = []
195
+ for col in signal_cols:
196
+ count = quality_df[col].sum()
197
+ signal_summary.append({
198
+ "Signal": col.replace("_", " ").title(),
199
+ "Cases Affected": count,
200
+ "% of Dataset": f"{count / len(quality_df) * 100:.0f}%",
201
+ })
202
+
203
+ signal_df = pd.DataFrame(signal_summary).sort_values("Cases Affected", ascending=False)
204
+ st.dataframe(signal_df, hide_index=True, use_container_width=True)
205
+
206
+ # Visual breakdown
207
+ st.markdown("**Signal frequency**")
208
+ chart_data = pd.DataFrame({
209
+ row["Signal"]: [row["Cases Affected"]] for _, row in signal_df.iterrows()
210
+ })
211
+ st.bar_chart(signal_df.set_index("Signal")["Cases Affected"])
212
+
213
+ # ---------------------------------------------------------------------------
214
+ # Section 4: Multilingual Analysis
215
+ # ---------------------------------------------------------------------------
216
+
217
+ st.markdown("---")
218
+ st.header("Multilingual Analysis")
219
+ st.caption("Non-English inputs require special handling — the extraction must preserve source language evidence")
220
+
221
+ lang_groups = df.groupby("language")
222
+
223
+ for lang, group in lang_groups:
224
+ with st.expander(f"**{lang.upper()}** — {len(group)} cases"):
225
+ st.markdown(f"**Avg word count:** {group['text_length_words'].mean():.0f}")
226
+ st.markdown(f"**Priority mix:** {dict(group['priority'].value_counts())}")
227
+ st.markdown(f"**Source datasets:** {dict(group['source_dataset'].value_counts())}")
228
+
229
+ # Show example
230
+ example = group.iloc[0]
231
+ st.markdown("**Example:**")
232
+ st.text(example["ticket_text"][:300])
233
+
234
+ # ---------------------------------------------------------------------------
235
+ # Section 5: Field Completeness
236
+ # ---------------------------------------------------------------------------
237
+
238
+ st.markdown("---")
239
+ st.header("Field Completeness")
240
+ st.caption("Missing or empty fields in case bundles — gaps the extraction must handle gracefully")
241
+
242
+ completeness = []
243
+ for col in ["ticket_text", "email_thread", "conversation_snippet", "vip_tier", "priority",
244
+ "handle_time_minutes", "source_dataset", "language"]:
245
+ if col not in df.columns:
246
+ continue
247
+
248
+ if col == "email_thread":
249
+ filled = df[col].apply(lambda x: len(x) > 0 if isinstance(x, list) else bool(x)).sum()
250
+ elif col == "handle_time_minutes":
251
+ filled = df[col].apply(lambda x: x > 0 if x else False).sum()
252
+ else:
253
+ filled = df[col].apply(lambda x: bool(x) and x not in ("", "unknown")).sum()
254
+
255
+ completeness.append({
256
+ "Field": col,
257
+ "Filled": filled,
258
+ "Missing/Default": len(df) - filled,
259
+ "Completeness": f"{filled / len(df) * 100:.0f}%",
260
+ })
261
+
262
+ comp_df = pd.DataFrame(completeness)
263
+ st.dataframe(comp_df, hide_index=True, use_container_width=True)
264
+
265
+ # ---------------------------------------------------------------------------
266
+ # Section 6: Churn Label Distribution
267
+ # ---------------------------------------------------------------------------
268
+
269
+ st.markdown("---")
270
+ st.header("Label Distribution")
271
+ st.caption("Synthetic labels (churn, VIP) — understanding class balance for evaluation")
272
+
273
+ l1, l2 = st.columns(2)
274
+
275
+ with l1:
276
+ st.markdown("**Churn within 30 days**")
277
+ churn_counts = df["churned_within_30d"].value_counts()
278
+ churn_display = pd.DataFrame({
279
+ "Status": ["Retained", "Churned"],
280
+ "Count": [
281
+ churn_counts.get(False, 0) + churn_counts.get(0, 0),
282
+ churn_counts.get(True, 0) + churn_counts.get(1, 0),
283
+ ],
284
+ })
285
+ st.bar_chart(churn_display.set_index("Status"))
286
+ churn_total = churn_display["Count"].sum()
287
+ churned = churn_display[churn_display["Status"] == "Churned"]["Count"].values[0]
288
+ st.caption(f"Churn rate: {churned/churn_total*100:.0f}% — {'balanced enough for evaluation' if 0.15 < churned/churn_total < 0.5 else 'may need rebalancing'}")
289
+
290
+ with l2:
291
+ st.markdown("**VIP Tier**")
292
+ if "vip_tier" in df.columns:
293
+ vip_counts = df["vip_tier"].value_counts()
294
+ st.bar_chart(vip_counts)
295
+ else:
296
+ st.info("VIP tier not available.")
297
+
298
+ # ---------------------------------------------------------------------------
299
+ # Section 7: Data Quality Score
300
+ # ---------------------------------------------------------------------------
301
+
302
+ st.markdown("---")
303
+ st.header("Overall Data Quality Score")
304
+
305
+ # Compute a simple quality score
306
+ total_signals = sum(quality_df[c].sum() for c in signal_cols)
307
+ max_signals = len(quality_df) * len(signal_cols)
308
+ quality_score = 1 - (total_signals / max_signals)
309
+
310
+ q1, q2, q3 = st.columns(3)
311
+ q1.metric("Quality Score", f"{quality_score:.0%}",
312
+ help="1 - (total noise signals / max possible signals). Higher = cleaner data.")
313
+ q2.metric("Total Noise Signals", f"{total_signals}",
314
+ help=f"Across {len(quality_df)} cases × {len(signal_cols)} signal types")
315
+ q3.metric("Cases with No Issues", f"{len(quality_df[quality_df[signal_cols].sum(axis=1) == 0])}",
316
+ help="Cases that triggered zero noise signals")
317
+
318
+ st.markdown("---")
319
+ st.markdown(
320
+ "**Why this matters for forward-deployed AI:** "
321
+ "Data quality analysis is not optional — it's the first deliverable in week 1. "
322
+ "Noise patterns directly inform prompt engineering (e.g., the short-input confidence cap), "
323
+ "validation rules (e.g., evidence grounding checks), and gate thresholds. "
324
+ "A system that doesn't understand its own input data cannot be trusted to produce reliable output."
325
+ )
app/pages/8_Human_Feedback.py ADDED
@@ -0,0 +1,369 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 8 — Human Feedback Loop: reviewers correct AI outputs, building a feedback dataset.
2
+
3
+ This page demonstrates the 'iterate to make sure this product is valuable to the end user'
4
+ principle. Every correction is saved to feedback.jsonl and used to measure human-AI agreement.
5
+ """
6
+ import sys
7
+ import json
8
+ import sqlite3
9
+ import time
10
+ from pathlib import Path
11
+
12
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
13
+
14
+ import streamlit as st
15
+ import pandas as pd
16
+
17
+ from pipeline.loaders import load_all_cases
18
+ from pipeline.normalize import normalize_case
19
+ from pipeline.feedback import (
20
+ save_feedback,
21
+ save_approval,
22
+ load_all_feedback,
23
+ compute_agreement_stats,
24
+ )
25
+ from pipeline.storage import deserialize_extraction
26
+
27
+ st.set_page_config(page_title="Human Feedback Loop", layout="wide")
28
+
29
+ DB_PATH = Path("data/processed/results.db")
30
+ CASES_DIR = Path("data/cases")
31
+
32
+ # ---------------------------------------------------------------------------
33
+ # Load data
34
+ # ---------------------------------------------------------------------------
35
+
36
+ if not DB_PATH.exists():
37
+ st.warning("No pipeline results yet. Run `PYTHONPATH=. python scripts/run_pipeline.py --mock` first.")
38
+ st.stop()
39
+
40
+ conn = sqlite3.connect(DB_PATH)
41
+ conn.row_factory = sqlite3.Row
42
+ extractions = {
43
+ dict(r)["case_id"]: dict(r)
44
+ for r in conn.execute("SELECT * FROM extractions").fetchall()
45
+ }
46
+ case_rows = {
47
+ dict(r)["case_id"]: dict(r)
48
+ for r in conn.execute("SELECT * FROM cases").fetchall()
49
+ }
50
+ conn.close()
51
+
52
+ cases_map = {}
53
+ if CASES_DIR.exists():
54
+ for c in load_all_cases(CASES_DIR):
55
+ cases_map[c.case_id] = c
56
+
57
+ if not extractions:
58
+ st.info("No extractions in database.")
59
+ st.stop()
60
+
61
+ # ---------------------------------------------------------------------------
62
+ # Page layout: tabs for Review and Analytics
63
+ # ---------------------------------------------------------------------------
64
+
65
+ st.title("Human Feedback Loop")
66
+ st.markdown(
67
+ "Review AI extractions, correct errors, and approve good outputs. "
68
+ "Every action builds a feedback dataset that measures human-AI alignment "
69
+ "and informs prompt iteration."
70
+ )
71
+
72
+ tab_review, tab_analytics = st.tabs(["Review Cases", "Agreement Analytics"])
73
+
74
+ # ===========================================================================
75
+ # TAB 1: Review Cases
76
+ # ===========================================================================
77
+
78
+ with tab_review:
79
+ st.markdown("---")
80
+
81
+ # Case selector — prioritize review-routed cases
82
+ review_cases = [cid for cid, ext in extractions.items() if ext.get("gate_route") == "review"]
83
+ auto_cases = [cid for cid, ext in extractions.items() if ext.get("gate_route") == "auto"]
84
+
85
+ # Check which cases already have feedback
86
+ existing_feedback = load_all_feedback()
87
+ reviewed_ids = {f["case_id"] for f in existing_feedback}
88
+
89
+ case_options = []
90
+ for cid in review_cases:
91
+ tag = "reviewed" if cid in reviewed_ids else "needs review"
92
+ case_options.append(f"{cid} [REVIEW] [{tag}]")
93
+ for cid in auto_cases:
94
+ tag = "reviewed" if cid in reviewed_ids else "auto-routed"
95
+ case_options.append(f"{cid} [AUTO] [{tag}]")
96
+
97
+ if not case_options:
98
+ st.info("No cases to review.")
99
+ st.stop()
100
+
101
+ selected_option = st.selectbox("Select case to review", case_options)
102
+ selected_id = selected_option.split(" ")[0]
103
+
104
+ ext = extractions[selected_id]
105
+ case_meta = case_rows.get(selected_id, {})
106
+ case_bundle = cases_map.get(selected_id)
107
+
108
+ ext = deserialize_extraction(ext)
109
+
110
+ # --- Two columns: Source Text | AI Output + Correction ---
111
+ col_source, col_review = st.columns([1, 1])
112
+
113
+ with col_source:
114
+ st.subheader("Source Text")
115
+ ticket_text = case_meta.get("ticket_text", "")
116
+ if case_bundle:
117
+ ticket_text = case_bundle.ticket_text
118
+ st.text_area("Raw input", ticket_text, height=250, disabled=True, label_visibility="collapsed")
119
+
120
+ if case_bundle and case_bundle.conversation_snippet:
121
+ with st.expander("Conversation snippet"):
122
+ st.text(case_bundle.conversation_snippet)
123
+
124
+ st.markdown("**Metadata**")
125
+ st.markdown(
126
+ f"Language: `{case_meta.get('language', '?')}` · "
127
+ f"Priority: `{case_meta.get('priority', '?')}` · "
128
+ f"VIP: `{case_meta.get('vip_tier', '?')}` · "
129
+ f"Source: `{case_meta.get('source_dataset', '?')}`"
130
+ )
131
+
132
+ # Gate decision
133
+ gate_route = ext.get("gate_route", "?")
134
+ reason_codes = ext.get("review_reason_codes", [])
135
+ if gate_route == "review":
136
+ st.error(f"Gate: **REVIEW** — {', '.join(reason_codes) if reason_codes else 'unknown reason'}")
137
+ else:
138
+ st.success("Gate: **AUTO** — all checks passed")
139
+
140
+ with col_review:
141
+ st.subheader("AI Output → Your Correction")
142
+ st.caption("Modify any field below. Leave unchanged if the AI got it right.")
143
+
144
+ # Use a form to batch the corrections
145
+ with st.form(key=f"review_form_{selected_id}"):
146
+ ROOT_CAUSE_OPTIONS = [
147
+ "billing", "network", "account", "product", "service",
148
+ "security_breach", "outage", "vip_churn", "data_loss", "other", "unknown"
149
+ ]
150
+ RISK_OPTIONS = ["low", "medium", "high", "critical"]
151
+
152
+ ai_rc_l1 = ext.get("root_cause_l1", "unknown")
153
+ ai_rc_l2 = ext.get("root_cause_l2", "")
154
+ ai_risk = ext.get("risk_level", "low")
155
+ ai_sentiment = ext.get("sentiment_score", 0.0)
156
+ ai_confidence = ext.get("confidence", 0.0)
157
+ ai_churn = ext.get("churn_risk", 0.0)
158
+ ai_review_req = bool(ext.get("review_required", False))
159
+
160
+ # Root cause
161
+ rc_l1_idx = ROOT_CAUSE_OPTIONS.index(ai_rc_l1) if ai_rc_l1 in ROOT_CAUSE_OPTIONS else 0
162
+ corrected_rc_l1 = st.selectbox(
163
+ f"Root Cause L1 (AI: `{ai_rc_l1}`)",
164
+ ROOT_CAUSE_OPTIONS, index=rc_l1_idx
165
+ )
166
+ corrected_rc_l2 = st.text_input(
167
+ f"Root Cause L2 (AI: `{ai_rc_l2}`)",
168
+ value=ai_rc_l2
169
+ )
170
+
171
+ # Risk level
172
+ risk_idx = RISK_OPTIONS.index(ai_risk) if ai_risk in RISK_OPTIONS else 0
173
+ corrected_risk = st.selectbox(
174
+ f"Risk Level (AI: `{ai_risk}`)",
175
+ RISK_OPTIONS, index=risk_idx
176
+ )
177
+
178
+ # Sentiment
179
+ corrected_sentiment = st.slider(
180
+ f"Sentiment Score (AI: `{ai_sentiment:.2f}`)",
181
+ -1.0, 1.0, float(ai_sentiment), step=0.1
182
+ )
183
+
184
+ # Confidence
185
+ corrected_confidence = st.slider(
186
+ f"Confidence (AI: `{ai_confidence:.2f}`)",
187
+ 0.0, 1.0, float(ai_confidence), step=0.05
188
+ )
189
+
190
+ # Churn risk
191
+ corrected_churn = st.slider(
192
+ f"Churn Risk (AI: `{ai_churn:.2f}`)",
193
+ 0.0, 1.0, float(ai_churn), step=0.05
194
+ )
195
+
196
+ # Review required
197
+ corrected_review_req = st.checkbox(
198
+ f"Review Required (AI: `{ai_review_req}`)",
199
+ value=ai_review_req
200
+ )
201
+
202
+ # Reviewer notes
203
+ reviewer_notes = st.text_area("Reviewer Notes", "", height=80)
204
+
205
+ # Submit buttons
206
+ col_approve, col_correct = st.columns(2)
207
+ with col_approve:
208
+ btn_approve = st.form_submit_button("Approve AI Output", type="secondary")
209
+ with col_correct:
210
+ btn_correct = st.form_submit_button("Submit Corrections", type="primary")
211
+
212
+ # Handle form submission
213
+ if btn_approve:
214
+ entry = save_approval(selected_id, ext, reviewer_notes)
215
+ st.success(f"Approved {selected_id}. Agreement rate: 100%")
216
+ st.json(entry)
217
+
218
+ if btn_correct:
219
+ # Compute which fields changed
220
+ corrected_fields = {}
221
+ if corrected_rc_l1 != ai_rc_l1:
222
+ corrected_fields["root_cause_l1"] = corrected_rc_l1
223
+ if corrected_rc_l2 != ai_rc_l2:
224
+ corrected_fields["root_cause_l2"] = corrected_rc_l2
225
+ if corrected_risk != ai_risk:
226
+ corrected_fields["risk_level"] = corrected_risk
227
+ if abs(corrected_sentiment - ai_sentiment) > 0.05:
228
+ corrected_fields["sentiment_score"] = corrected_sentiment
229
+ if abs(corrected_confidence - ai_confidence) > 0.025:
230
+ corrected_fields["confidence"] = corrected_confidence
231
+ if abs(corrected_churn - ai_churn) > 0.025:
232
+ corrected_fields["churn_risk"] = corrected_churn
233
+ if corrected_review_req != ai_review_req:
234
+ corrected_fields["review_required"] = corrected_review_req
235
+
236
+ if not corrected_fields:
237
+ st.info("No fields changed — this is equivalent to an approval.")
238
+ entry = save_approval(selected_id, ext, reviewer_notes)
239
+ st.success(f"Recorded as approval for {selected_id}.")
240
+ else:
241
+ entry = save_feedback(selected_id, ext, corrected_fields, reviewer_notes)
242
+ st.success(
243
+ f"Saved corrections for {selected_id}. "
244
+ f"Fields corrected: {', '.join(corrected_fields.keys())}. "
245
+ f"Agreement: {entry['agreement']['agreement_rate']:.0%}"
246
+ )
247
+ st.json(entry)
248
+
249
+
250
+ # ===========================================================================
251
+ # TAB 2: Agreement Analytics
252
+ # ===========================================================================
253
+
254
+ with tab_analytics:
255
+ st.markdown("---")
256
+
257
+ all_feedback = load_all_feedback()
258
+
259
+ if not all_feedback:
260
+ st.info(
261
+ "No feedback recorded yet. Use the **Review Cases** tab to approve or correct "
262
+ "AI extractions. Each action builds the feedback dataset."
263
+ )
264
+
265
+ st.markdown("---")
266
+ st.header("What This Page Will Show")
267
+ st.markdown("""
268
+ Once reviewers start providing feedback, this page displays:
269
+
270
+ - **Overall human-AI agreement rate** — % of fields where the reviewer agreed with AI
271
+ - **Per-field agreement** — which extraction fields are most/least reliable
272
+ - **Most corrected fields** — where the AI consistently gets it wrong
273
+ - **Correction timeline** — how agreement changes over time (ideally improves with prompt iteration)
274
+ - **Feedback log** — full audit trail of every review action
275
+
276
+ This is the data that drives prompt iteration: if reviewers keep correcting `risk_level`,
277
+ the prompt needs better risk assessment instructions.
278
+ """)
279
+ st.stop()
280
+
281
+ # Compute stats
282
+ stats = compute_agreement_stats(all_feedback)
283
+
284
+ # --- KPI Row ---
285
+ st.header("Human-AI Agreement")
286
+
287
+ k1, k2, k3, k4 = st.columns(4)
288
+ k1.metric("Total Reviews", stats["total_reviews"])
289
+ k2.metric("Approvals", stats["approvals"],
290
+ help="Cases where the reviewer accepted AI output without changes")
291
+ k3.metric("Corrections", stats["corrections"],
292
+ help="Cases where the reviewer changed at least one field")
293
+ k4.metric("Overall Agreement Rate", f"{stats['overall_agreement_rate']:.0%}",
294
+ help="% of reviewed fields where human agreed with AI")
295
+
296
+ # --- Per-field agreement ---
297
+ st.markdown("---")
298
+ st.header("Per-Field Agreement")
299
+ st.caption("Which extraction fields are most reliable? Fields with low agreement need prompt attention.")
300
+
301
+ if stats["per_field_agreement"]:
302
+ field_df = pd.DataFrame([
303
+ {"Field": field, "Agreement Rate": rate}
304
+ for field, rate in sorted(stats["per_field_agreement"].items(), key=lambda x: x[1])
305
+ ])
306
+ st.bar_chart(field_df.set_index("Field")["Agreement Rate"])
307
+ st.dataframe(field_df, hide_index=True, use_container_width=True)
308
+
309
+ # --- Most corrected fields ---
310
+ if stats["most_corrected_fields"]:
311
+ st.markdown("---")
312
+ st.header("Most Corrected Fields")
313
+ st.caption("These fields are corrected most often — primary targets for prompt improvement")
314
+
315
+ corrected_df = pd.DataFrame(
316
+ stats["most_corrected_fields"],
317
+ columns=["Field", "Correction Count"],
318
+ )
319
+ st.bar_chart(corrected_df.set_index("Field"))
320
+ st.dataframe(corrected_df, hide_index=True, use_container_width=True)
321
+
322
+ # --- Feedback timeline ---
323
+ st.markdown("---")
324
+ st.header("Review Timeline")
325
+
326
+ timeline_data = []
327
+ for entry in all_feedback:
328
+ ts = entry.get("timestamp", 0)
329
+ timeline_data.append({
330
+ "Time": pd.Timestamp.fromtimestamp(ts),
331
+ "Case": entry.get("case_id", "?"),
332
+ "Action": entry.get("action", "?"),
333
+ "Agreement": entry.get("agreement", {}).get("agreement_rate", 0),
334
+ })
335
+
336
+ if timeline_data:
337
+ timeline_df = pd.DataFrame(timeline_data)
338
+ st.line_chart(timeline_df.set_index("Time")["Agreement"])
339
+ st.dataframe(timeline_df, hide_index=True, use_container_width=True)
340
+
341
+ # --- Full feedback log ---
342
+ st.markdown("---")
343
+ st.header("Feedback Log")
344
+ st.caption(f"Full audit trail — {len(all_feedback)} entries in `data/processed/feedback.jsonl`")
345
+
346
+ log_rows = []
347
+ for entry in all_feedback:
348
+ corrected = entry.get("corrected", {})
349
+ log_rows.append({
350
+ "Timestamp": pd.Timestamp.fromtimestamp(entry.get("timestamp", 0)).strftime("%Y-%m-%d %H:%M"),
351
+ "Case ID": entry.get("case_id", "?"),
352
+ "Action": entry.get("action", "?"),
353
+ "Fields Corrected": ", ".join(corrected.keys()) if corrected else "—",
354
+ "Agreement": f"{entry.get('agreement', {}).get('agreement_rate', 0):.0%}",
355
+ "Notes": entry.get("reviewer_notes", "")[:80],
356
+ })
357
+
358
+ if log_rows:
359
+ st.dataframe(pd.DataFrame(log_rows), hide_index=True, use_container_width=True)
360
+
361
+ # --- Insight callout ---
362
+ st.markdown("---")
363
+ st.markdown(
364
+ "**How this drives iteration:** Every correction is a training signal. "
365
+ "If `root_cause_l1` agreement drops below 80%, the prompt's classification "
366
+ "instructions need refinement. If `confidence` is consistently corrected downward, "
367
+ "the model is overconfident and needs calibration rules. "
368
+ "This feedback loop closes the gap between 'works in demo' and 'works in production'."
369
+ )
app/pages/9_Prompt_AB_Testing.py ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Page 9 — Prompt A/B Testing: compare prompt versions with quantified metrics.
2
+
3
+ Demonstrates continuous optimization capability — the kind of iteration that makes
4
+ a forward-deployed AI product valuable over time, not just at launch.
5
+ """
6
+ import sys
7
+ from pathlib import Path
8
+
9
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
10
+
11
+ import streamlit as st
12
+ import pandas as pd
13
+
14
+ st.set_page_config(page_title="Prompt A/B Testing", layout="wide")
15
+ st.title("Prompt A/B Testing")
16
+ st.markdown(
17
+ "Side-by-side comparison of prompt versions. Every prompt change is tested against "
18
+ "the same cases with the same metrics — no guessing whether a change helped."
19
+ )
20
+ st.markdown("---")
21
+
22
+ # ---------------------------------------------------------------------------
23
+ # Prompt Version Registry
24
+ # ---------------------------------------------------------------------------
25
+
26
+ # Each version records: the change, the hypothesis, and the measured results.
27
+ # In production, this would be stored in a database. Here we hardcode the
28
+ # actual results from our documented experiments.
29
+
30
+ PROMPT_VERSIONS = {
31
+ "v1": {
32
+ "label": "v1 — Baseline",
33
+ "description": "Initial extraction prompt with structured JSON schema, evidence grounding rules, and ambiguity handling.",
34
+ "change": "N/A (baseline)",
35
+ "hypothesis": "N/A (baseline)",
36
+ "prompt_diff": None,
37
+ "eval_cases": 10,
38
+ "model": "claude-sonnet-4-20250514",
39
+ "metrics": {
40
+ "Schema pass rate": {"value": 1.00, "target": 0.98, "pass": True},
41
+ "Evidence coverage": {"value": 1.00, "target": 0.90, "pass": True},
42
+ "Hallucinated quotes": {"value": 0.027, "target": 0.02, "pass": False},
43
+ "Review-required rate": {"value": 0.80, "target": None, "pass": None},
44
+ "Avg confidence": {"value": 0.82, "target": None, "pass": None},
45
+ "Avg latency (ms)": {"value": 6341, "target": None, "pass": None},
46
+ },
47
+ "issues_found": [
48
+ "Overconfidence on short inputs (2 of 4 short cases got 0.90 confidence)",
49
+ "Metadata line quoted as evidence (1 of 37 quotes)",
50
+ "Risk underestimation on termination/churn signals",
51
+ ],
52
+ "per_case_confidence": {
53
+ "case-acaecb0d": {"words": 14, "confidence": 0.90},
54
+ "case-f541aaa0": {"words": 8, "confidence": 0.90},
55
+ "case-652870dc": {"words": 95, "confidence": 0.90},
56
+ "case-ac7b0b06": {"words": 84, "confidence": 0.90},
57
+ "case-2bd562d3": {"words": 7, "confidence": 0.60},
58
+ "case-5f87257e": {"words": 11, "confidence": 0.60},
59
+ },
60
+ },
61
+ "v2": {
62
+ "label": "v2 — Short-Input Confidence Cap",
63
+ "description": "Added one rule: 'If the case text is very short (under ~30 words), cap confidence at 0.7 — brief inputs lack context for high-certainty analysis.'",
64
+ "change": "One prompt line added to RULES section",
65
+ "hypothesis": "Short inputs (< 30 words) will get capped confidence without affecting long inputs.",
66
+ "prompt_diff": (
67
+ '+ - If the case text is very short (under ~30 words), cap confidence at 0.7 — '
68
+ 'brief inputs lack context for high-certainty analysis'
69
+ ),
70
+ "eval_cases": 10,
71
+ "model": "claude-sonnet-4-20250514",
72
+ "metrics": {
73
+ "Schema pass rate": {"value": 1.00, "target": 0.98, "pass": True},
74
+ "Evidence coverage": {"value": 1.00, "target": 0.90, "pass": True},
75
+ "Hallucinated quotes": {"value": 0.027, "target": 0.02, "pass": False},
76
+ "Review-required rate": {"value": 0.90, "target": None, "pass": None},
77
+ "Avg confidence": {"value": 0.77, "target": None, "pass": None},
78
+ "Avg latency (ms)": {"value": 6400, "target": None, "pass": None},
79
+ },
80
+ "issues_found": [
81
+ "Hallucinated metadata quote still present (prompt clarification needed)",
82
+ "Risk underestimation on termination/churn signals (separate issue from confidence)",
83
+ ],
84
+ "per_case_confidence": {
85
+ "case-acaecb0d": {"words": 14, "confidence": 0.70},
86
+ "case-f541aaa0": {"words": 8, "confidence": 0.60},
87
+ "case-652870dc": {"words": 95, "confidence": 0.90},
88
+ "case-ac7b0b06": {"words": 84, "confidence": 0.90},
89
+ "case-2bd562d3": {"words": 7, "confidence": 0.60},
90
+ "case-5f87257e": {"words": 11, "confidence": 0.60},
91
+ },
92
+ },
93
+ }
94
+
95
+ # Future prompt versions would be added here:
96
+ # "v3": { ... evidence boundary clarification ... }
97
+ # "v4": { ... churn signal boosting ... }
98
+
99
+ # ---------------------------------------------------------------------------
100
+ # Version selector
101
+ # ---------------------------------------------------------------------------
102
+
103
+ st.header("Select Versions to Compare")
104
+
105
+ versions = list(PROMPT_VERSIONS.keys())
106
+ col_a, col_b = st.columns(2)
107
+
108
+ with col_a:
109
+ version_a = st.selectbox("Version A", versions, index=0)
110
+ with col_b:
111
+ version_b = st.selectbox("Version B", versions, index=len(versions) - 1)
112
+
113
+ va = PROMPT_VERSIONS[version_a]
114
+ vb = PROMPT_VERSIONS[version_b]
115
+
116
+ # ---------------------------------------------------------------------------
117
+ # Section 1: Version Details
118
+ # ---------------------------------------------------------------------------
119
+
120
+ st.markdown("---")
121
+ st.header("Version Details")
122
+
123
+ d1, d2 = st.columns(2)
124
+
125
+ with d1:
126
+ st.subheader(va["label"])
127
+ st.markdown(f"**Description:** {va['description']}")
128
+ st.markdown(f"**Model:** `{va['model']}`")
129
+ st.markdown(f"**Eval cases:** {va['eval_cases']}")
130
+ if va["prompt_diff"]:
131
+ st.code(va["prompt_diff"], language="diff")
132
+
133
+ with d2:
134
+ st.subheader(vb["label"])
135
+ st.markdown(f"**Description:** {vb['description']}")
136
+ st.markdown(f"**Change:** {vb['change']}")
137
+ st.markdown(f"**Hypothesis:** {vb['hypothesis']}")
138
+ st.markdown(f"**Model:** `{vb['model']}`")
139
+ st.markdown(f"**Eval cases:** {vb['eval_cases']}")
140
+ if vb["prompt_diff"]:
141
+ st.code(vb["prompt_diff"], language="diff")
142
+
143
+ # ---------------------------------------------------------------------------
144
+ # Section 2: Metrics Comparison
145
+ # ---------------------------------------------------------------------------
146
+
147
+ st.markdown("---")
148
+ st.header("Metrics Comparison")
149
+
150
+ # Build comparison table
151
+ all_metrics = sorted(set(list(va["metrics"].keys()) + list(vb["metrics"].keys())))
152
+ comparison_rows = []
153
+
154
+ for metric in all_metrics:
155
+ ma = va["metrics"].get(metric, {})
156
+ mb = vb["metrics"].get(metric, {})
157
+
158
+ val_a = ma.get("value", "—")
159
+ val_b = mb.get("value", "—")
160
+ target = ma.get("target") or mb.get("target")
161
+
162
+ # Format values
163
+ if isinstance(val_a, float) and val_a < 1:
164
+ fmt_a = f"{val_a:.1%}" if metric != "Avg latency (ms)" else f"{val_a:,.0f}"
165
+ else:
166
+ fmt_a = f"{val_a:,.0f}" if isinstance(val_a, (int, float)) else str(val_a)
167
+
168
+ if isinstance(val_b, float) and val_b < 1:
169
+ fmt_b = f"{val_b:.1%}" if metric != "Avg latency (ms)" else f"{val_b:,.0f}"
170
+ else:
171
+ fmt_b = f"{val_b:,.0f}" if isinstance(val_b, (int, float)) else str(val_b)
172
+
173
+ # Compute delta
174
+ delta = ""
175
+ if isinstance(val_a, (int, float)) and isinstance(val_b, (int, float)):
176
+ diff = val_b - val_a
177
+ if metric == "Avg latency (ms)":
178
+ delta = f"{diff:+,.0f} ms"
179
+ elif abs(diff) > 0.001:
180
+ delta = f"{diff:+.1%}" if abs(val_a) < 10 else f"{diff:+,.0f}"
181
+
182
+ # Determine if delta is improvement
183
+ # Lower is better for: hallucinated quotes, latency
184
+ # Higher is better for: schema pass rate, evidence coverage
185
+ direction = ""
186
+ if delta and isinstance(val_a, (int, float)) and isinstance(val_b, (int, float)):
187
+ diff = val_b - val_a
188
+ lower_better = metric in ("Hallucinated quotes", "Avg latency (ms)")
189
+ if abs(diff) > 0.001:
190
+ is_better = (diff < 0) if lower_better else (diff > 0)
191
+ direction = "better" if is_better else "worse" if abs(diff) > 0.001 else "same"
192
+
193
+ comparison_rows.append({
194
+ "Metric": metric,
195
+ f"{version_a}": fmt_a,
196
+ f"{version_b}": fmt_b,
197
+ "Delta": delta,
198
+ "Direction": direction,
199
+ "Target": f"{target:.0%}" if isinstance(target, float) and target < 1 else (str(target) if target else "—"),
200
+ })
201
+
202
+ comp_df = pd.DataFrame(comparison_rows)
203
+
204
+ # Style the dataframe
205
+ st.dataframe(comp_df, hide_index=True, use_container_width=True)
206
+
207
+ # Metrics as cards
208
+ st.markdown("### Key Deltas")
209
+ delta_cols = st.columns(len(all_metrics))
210
+ for i, row in enumerate(comparison_rows):
211
+ with delta_cols[i % len(delta_cols)]:
212
+ val_a_raw = va["metrics"].get(row["Metric"], {}).get("value", 0)
213
+ val_b_raw = vb["metrics"].get(row["Metric"], {}).get("value", 0)
214
+ if isinstance(val_a_raw, (int, float)) and isinstance(val_b_raw, (int, float)):
215
+ if row["Metric"] == "Avg latency (ms)":
216
+ st.metric(row["Metric"], f"{val_b_raw:,.0f}", delta=row["Delta"])
217
+ elif val_b_raw < 1:
218
+ st.metric(row["Metric"], f"{val_b_raw:.1%}", delta=row["Delta"])
219
+ else:
220
+ st.metric(row["Metric"], f"{val_b_raw}", delta=row["Delta"])
221
+
222
+ # ---------------------------------------------------------------------------
223
+ # Section 3: Per-Case Confidence Comparison
224
+ # ---------------------------------------------------------------------------
225
+
226
+ st.markdown("---")
227
+ st.header("Per-Case Confidence: v1 → v2")
228
+ st.caption("The specific cases that motivated the prompt change — did the fix work?")
229
+
230
+ case_ids = sorted(
231
+ set(list(va.get("per_case_confidence", {}).keys()) + list(vb.get("per_case_confidence", {}).keys()))
232
+ )
233
+
234
+ case_comparison = []
235
+ for cid in case_ids:
236
+ ca = va.get("per_case_confidence", {}).get(cid, {})
237
+ cb = vb.get("per_case_confidence", {}).get(cid, {})
238
+ words = ca.get("words") or cb.get("words", "?")
239
+ conf_a = ca.get("confidence", "—")
240
+ conf_b = cb.get("confidence", "—")
241
+
242
+ delta = ""
243
+ if isinstance(conf_a, (int, float)) and isinstance(conf_b, (int, float)):
244
+ diff = conf_b - conf_a
245
+ delta = f"{diff:+.2f}" if abs(diff) > 0.001 else "0.00"
246
+
247
+ is_short = isinstance(words, int) and words < 30
248
+ case_comparison.append({
249
+ "Case ID": cid,
250
+ "Words": words,
251
+ "Short Input": "yes" if is_short else "no",
252
+ f"Confidence ({version_a})": conf_a if isinstance(conf_a, str) else f"{conf_a:.2f}",
253
+ f"Confidence ({version_b})": conf_b if isinstance(conf_b, str) else f"{conf_b:.2f}",
254
+ "Delta": delta,
255
+ "Fixed?": "YES" if is_short and isinstance(conf_b, (int, float)) and conf_b <= 0.7 else
256
+ ("n/a" if not is_short else "no"),
257
+ })
258
+
259
+ case_df = pd.DataFrame(case_comparison)
260
+ st.dataframe(case_df, hide_index=True, use_container_width=True)
261
+
262
+ # Highlight results
263
+ short_cases = [c for c in case_comparison if c["Short Input"] == "yes"]
264
+ fixed_cases = [c for c in short_cases if c["Fixed?"] == "YES"]
265
+
266
+ if short_cases:
267
+ st.success(
268
+ f"**{len(fixed_cases)} of {len(short_cases)} short-input cases fixed** — "
269
+ f"confidence capped at 0.7 or below. "
270
+ f"Long inputs ({len(case_comparison) - len(short_cases)} cases) unaffected."
271
+ )
272
+
273
+ # ---------------------------------------------------------------------------
274
+ # Section 4: Issues Resolved / Remaining
275
+ # ---------------------------------------------------------------------------
276
+
277
+ st.markdown("---")
278
+ st.header("Issues Tracking")
279
+
280
+ i1, i2 = st.columns(2)
281
+
282
+ with i1:
283
+ st.subheader(f"Issues in {version_a}")
284
+ for issue in va.get("issues_found", []):
285
+ st.markdown(f"- {issue}")
286
+
287
+ with i2:
288
+ st.subheader(f"Issues in {version_b}")
289
+ for issue in vb.get("issues_found", []):
290
+ st.markdown(f"- {issue}")
291
+
292
+ resolved = set(va.get("issues_found", [])) - set(vb.get("issues_found", []))
293
+ if resolved:
294
+ st.markdown("**Resolved:**")
295
+ for r in resolved:
296
+ st.markdown(f"- ~~{r}~~")
297
+
298
+ # ---------------------------------------------------------------------------
299
+ # Section 5: Iteration Framework
300
+ # ---------------------------------------------------------------------------
301
+
302
+ st.markdown("---")
303
+ st.header("Prompt Iteration Framework")
304
+ st.caption("The systematic process used for every prompt change")
305
+
306
+ st.markdown("""
307
+ | Step | Action | Example (v1 → v2) |
308
+ |------|--------|--------------------|
309
+ | 1. **Observe** | Identify failure mode in eval data | 2 of 4 short inputs got 0.90 confidence |
310
+ | 2. **Hypothesize** | Root-cause the failure | Prompt says "if ambiguous, lower confidence" but short ≠ ambiguous |
311
+ | 3. **Change** | Minimal prompt edit (one rule) | Added: "If text < 30 words, cap confidence at 0.7" |
312
+ | 4. **Measure** | Re-run same cases, same metrics | Short-input confidence: 0.90 → 0.65 avg |
313
+ | 5. **Verify** | Check for regressions | Long-input confidence unchanged (0.90 → 0.90) |
314
+ | 6. **Document** | Record change, results, and remaining issues | This page |
315
+ """)
316
+
317
+ st.markdown("---")
318
+ st.header("Next Prompt Iterations (Planned)")
319
+
320
+ st.markdown("""
321
+ | Version | Change | Hypothesis | Status |
322
+ |---------|--------|------------|--------|
323
+ | **v3** | Clarify evidence boundary: "Do NOT quote metadata lines" | Eliminates metadata-as-evidence hallucination (1/37 quotes) | Planned |
324
+ | **v4** | Boost churn signal: "Termination/cancellation inquiries indicate high churn risk" | Catches risk underestimation on churn signals | Planned |
325
+ | **v5** | Add L2 taxonomy: controlled vocabulary for sub-categories | Improves cross-run consistency for root cause analysis | Planned |
326
+ """)
327
+
328
+ st.markdown("---")
329
+ st.caption(
330
+ "Each prompt version is tested on the same 10-case diverse sample. "
331
+ "Zero code changes between versions — only prompt text and version bump. "
332
+ "This demonstrates that the system is designed for continuous improvement, "
333
+ "not one-shot deployment."
334
+ )
data/cases/.gitkeep ADDED
File without changes
data/cases/case-076438cd.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-076438cd",
3
+ "ticket_text": "[Network Connection Problems]\nDear Customer Support Team,\\n\\nI am reaching out to report persistent issues with network connectivity that are significantly disrupting my workflow. I've observed sporadic interruptions across several devices, which I believe may be caused by a malfunctioning repeater or potential interference from other devices within the network configuration.\\n\\nIn an attempt to resolve the problem, I have already taken several measures, such as restarting the affected equipment and repositioning the repeater to different locations within the premises. Despite these efforts, the connectivity issues persist.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for reaching out to us, <name>. We understand the importance of reliable and stable network connectivity. Currently, widespread problems with repeaters and interference in the environment are being reported. To assist you further, could you please specify which devices are impacted, provide details about your network setup, and mention any recent modifications? If possible, kindly share relevant error logs. We are prepared to guide you through advanced troubleshooting procedures.",
6
+ "vip_tier": "unknown",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 79.2,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-07fdaad5.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-07fdaad5",
3
+ "ticket_text": "[INVOICE] assistance downloading my invoice from {{Person Name}}",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: assistance downloading my invoice from {{Person Name}}\nAgent: I'll take care of it! I completely understand your need for assistance in downloading your invoice from {{Salutation}} {{Client First Name}} {{Client Last Name}}. Let me guide you through the process step by step. First, please log in to your account and navigate to the 'Invoices' section. From there, you should be able to view and download your invoice for your reference. If you encounter any issues or have further questions, please don't hesitate to let me know. I'm here to help ensure a smooth experience for you.",
6
+ "vip_tier": "standard",
7
+ "priority": "critical",
8
+ "handle_time_minutes": 78.7,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-19fc09e8.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-19fc09e8",
3
+ "ticket_text": "[Inquiry for Comprehensive Marketing Service Details]\nDear Customer Support Team,\\n\\nI hope this message reaches you well. I am writing to request detailed information about your marketing offerings. Our company is currently considering new marketing approaches to strengthen our brand visibility and boost customer engagement, and we are keen to understand the array of services your team provides.\\n\\nSpecifically, I would like to learn about the various marketing packages available, including digital marketing solutions such as social media management, email marketing campaigns, and search engine optimization.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for your interest in our marketing solutions. We offer a variety of digital and traditional marketing packages, including social media management, SEO, email marketing, content development, print advertising, and more. Our pricing is adaptable, allowing us to create customized plans that suit your specific business requirements. We can also provide case studies relevant to your industry upon request. Please share your industry, business objectives, and budget preferences so we can deliver tailored information.",
6
+ "vip_tier": "vip",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 58.3,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-1c9c4a9b.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-1c9c4a9b",
3
+ "ticket_text": "[INVOICE] I do not know what I have to do to check my bill #12588",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: I do not know what I have to do to check my bill #12588\nAgent: I'm on the same page, your confusion about finding the invoice with the number #{{Invoice Number}}. Don't worry, I'm here to help! To check your bill, you can try the following steps:\n\n1. Log in to your account on our website or mobile app.\n2. Navigate to the \"Billing\" or \"Invoices\" section.\n3. Look for the option to view or download your invoices.\n4. Enter the invoice number {{Invoice Number}} in the search or filter bar, if available.\n5. If you can't find a specific search option, try sorting the invoices by date or contact our customer support team for further assistance.\n\nRemember, if you have any trouble locating your bill or need any further guidance, feel free to ask. We're here to make sure you can access and understand your invoice with ease.",
6
+ "vip_tier": "standard",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 71.9,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-21225a5d.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-21225a5d",
3
+ "ticket_text": "[Anfrage zur Klärung der Auswirkungen eines Serviceausfalls]\nSehr geehrtes Kundendienstteam,\\n\\nich hoffe, diese Nachricht erreicht Sie wohlbehalten. Ich schreibe, um eine Klärung bezüglich des kürzlichen Ausfalls Ihres Dienstes zu bitten, der die Plattform betroffen hat. Nutzer, die auf Ihre Dienste angewiesen sind, benötigen klare Informationen, um die betrieblichen Abläufe planen zu können, und es ist wichtig, den Umfang sowie die Folgen des Ausfalls genau zu verstehen.\\n\\nKönnten Sie bitte detaillierte Angaben zum Zeitraum des Ausfalls, welche Dienste betroffen waren und ob dabei Daten kompromittiert oder verloren gegangen sind, bereitstellen? Zudem würde ich mich über aktuelle Informationen zu den Maßnahmen freuen, die ergriffen wurden, um zukünftige Vorfälle zu verhindern.\\n\\nVielen Dank für Ihre Unterstützung und die Klärung der Details.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Vielen Dank, dass Sie uns bezüglich des kürzlichen Serviceausfalls kontaktiert haben. Der Ausfall begann um 9:00 Uhr morgens und wurde um 13:30 Uhr behoben, wodurch die Hauptdatenzugangs- und Messaging-Dienste betroffen waren. Wir bestätigen, dass im betreffenden Zeitraum keine Kundendaten kompromittiert oder verloren wurden. Unser technisches Team hat die Ursache, eine vorübergehende Systemüberlastung, bereits identifiziert und zusätzliche Schutzmaßnahmen sowie Überwachungsprozesse implementiert, um zukünftige Vorfälle zu verhindern. Für spezifische Informationen über das betroffene Konto bitten wir um weitere Details.",
6
+ "vip_tier": "standard",
7
+ "priority": "high",
8
+ "handle_time_minutes": 11.1,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "de"
12
+ }
data/cases/case-2bd562d3.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-2bd562d3",
3
+ "ticket_text": "[ORDER] question about cancelling order {{Order Number}}",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: question about cancelling order {{Order Number}}\nAgent: I've understood you have a question regarding canceling order {{Order Number}}, and I'm here to provide you with the information you need. Please go ahead and ask your question, and I'll do my best to assist you.",
6
+ "vip_tier": "vip",
7
+ "priority": "critical",
8
+ "handle_time_minutes": 15.4,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-380fd7e4.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-380fd7e4",
3
+ "ticket_text": "[Query About Smart Home System Integration Features]\nDear Customer Support Team,\\n\\nI hope this message reaches you well. I am reaching out to request detailed information about the capabilities of your smart home integration products listed on your website. As a potential customer aiming to develop a seamlessly interconnected home environment, it is essential to understand how your products interact with various smart home platforms.\\n\\nCould you kindly provide detailed compatibility information with popular smart home ecosystems such as Amazon Alexa, Google Assistant, and Apple?",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for your inquiry. Our products support integration with Amazon Alexa, Google Assistant, and Apple HomeKit. Compatibility details can differ depending on the specific item; please let us know which models you are interested in. The setup process is generally user-friendly but may require professional installation. We regularly update our software to provide enhanced features. For comprehensive information on compatibility with upcoming updates, please specify the models you are considering.",
6
+ "vip_tier": "unknown",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 10.6,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-4af33b8b.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-4af33b8b",
3
+ "ticket_text": "[SUBSCRIPTION] where can I sign up to the corporate newsletter?",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: where can I sign up to the corporate newsletter?\nAgent: We appreciate your interest in signing up for our corporate newsletter! To subscribe, you can visit our website and locate the newsletter subscription section. It's usually located on the homepage or in the footer menu. Once you find it, simply enter your email address and click the \"Subscribe\" or \"Sign Up\" button. This will ensure that you receive regular updates, exclusive offers, and valuable content directly in your inbox. Thank you for choosing to stay connected with us!",
6
+ "vip_tier": "unknown",
7
+ "priority": "critical",
8
+ "handle_time_minutes": 20.3,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-4b7055cf.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-4b7055cf",
3
+ "ticket_text": "[Inquiry for Detailed Information on Agency Offerings]\nDear Customer Support Team,\\n\\nI hope this message reaches you well. I am writing to request detailed information about the services provided by your agency. Specifically, I am interested in understanding the variety of services available, the pricing models, and any special packages or offers that might be applicable.\\n\\nAdditionally, I would appreciate it if you could send brochures, service contracts, and case studies highlighting your previous work and client success stories. This information will greatly help me in making an informed decision.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for your interest in our agency's services. We offer a comprehensive selection of solutions including digital marketing, brand management, content creation, and campaign planning. Our pricing is flexible; standard packages can be customized to meet specific client requirements, and we often have special promotions. We are happy to send brochures, service agreements, and case studies that showcase our clients' successes. Please reply with your preferred email address to receive these materials. We look forward to assisting you through the onboarding process.",
6
+ "vip_tier": "unknown",
7
+ "priority": "high",
8
+ "handle_time_minutes": 80.0,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-4d87ea84.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-4d87ea84",
3
+ "ticket_text": "[CONTACT] I want assistance contacting someone",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: I want assistance contacting someone\nAgent: Thank you for contacting! I certainly recognize that you are seeking assistance in contacting someone. I'm here to help! Could you please provide more details about who you are trying to contact and the reason for your communication? With this information, I will be able to guide you to the most appropriate channel or provide you with the necessary contact details. Your satisfaction is our top priority, and we are committed to helping you connect with the right person efficiently. Take a moment to share the specific details, and we'll work together to find the best solution for you!",
6
+ "vip_tier": "unknown",
7
+ "priority": "low",
8
+ "handle_time_minutes": 68.3,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-4e9a11c7.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-4e9a11c7",
3
+ "ticket_text": "[System Interruptions]\nDear Customer Support Team,\\n\\nI am submitting a report regarding multiple system service disruptions that are currently interfering with project operations and client interactions. These interruptions have notably reduced efficiency and caused delays in meeting important deadlines.\\n\\nInitial assessments suggest that the primary issue may be linked to widespread failures in network hardware connectivity. We have already carried out several basic troubleshooting procedures, such as rebooting devices and performing initial system checks, but the problems persist.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for bringing the system disruptions to our attention. We understand the importance of maintaining stable services for project workflows and client communications, and we apologize for the inconvenience caused. To assist us in diagnosing the issue, could you please provide specific details about the affected systems or services, the approximate start time of the outages, and any error messages or logs you've encountered? Additionally, let us know if the outages are impacting users at a single location or across multiple sites.",
6
+ "vip_tier": "standard",
7
+ "priority": "high",
8
+ "handle_time_minutes": 73.4,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-4f8d8abf.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-4f8d8abf",
3
+ "ticket_text": "[ORDER] shop several articles",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: shop several articles\nAgent: Thank you for your interest in shopping for several articles! I'm here to assist you in finding the perfect items that meet your requirements. To get started, could you please provide me with some details such as the type of articles you're looking for, any specific brands or styles you prefer, and your budget? This will help me tailor my recommendations and ensure a successful shopping experience for you. Let's make your shopping journey a delightful one!",
6
+ "vip_tier": "unknown",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 86.0,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-5f87257e.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-5f87257e",
3
+ "ticket_text": "[FEEDBACK] I'm dissatisfied with ur work help me making a complaint",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: I'm dissatisfied with ur work help me making a complaint\nAgent: I'm truly sorry to hear that you're dissatisfied with our work. Your feedback is essential to us as we continuously strive to improve our services and meet your expectations. We value your concerns and would like to address them promptly and effectively. Could you please provide me with more details about your complaint? This will help us thoroughly investigate the matter and provide you with a satisfactory resolution.",
6
+ "vip_tier": "unknown",
7
+ "priority": "critical",
8
+ "handle_time_minutes": 77.9,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-624cb348.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-624cb348",
3
+ "ticket_text": "[SHIPPING] can you help me correct my delivery address?",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: can you help me correct my delivery address?\nAgent: Sure! I'm here to assist you with correcting your delivery address. Please provide me with the necessary details, such as your current address and the correct address you would like to change it to, and I'll guide you through the process.",
6
+ "vip_tier": "vip",
7
+ "priority": "critical",
8
+ "handle_time_minutes": 34.5,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-64a32dc8.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-64a32dc8",
3
+ "ticket_text": "[ACCOUNT] deleting {{Account Category}} account",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: deleting {{Account Category}} account\nAgent: We've received your message to us regarding the deletion of your {{Account Category}} account. I'm here to assist you throughout the process and make sure your concerns are addressed. Could you please provide me with more information about the issues you are facing or any specific questions you have about deleting your account? By understanding your concerns fully, I'll be able to provide you with the most accurate and helpful guidance.",
6
+ "vip_tier": "standard",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 28.5,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-652870dc.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-652870dc",
3
+ "ticket_text": "[Inquiry Regarding Invoice Details]\nDear Customer Support Team,\\n\\nI hope this message finds you well. I am reaching out to request clarification about the billing and payment procedures linked to my account. Recently, I observed some inconsistencies in the charges applied and would like to ensure I fully understand the billing cycle, accepted payment options, and any potential extra charges.\\n\\nFirstly, I would be grateful if you could provide a detailed explanation of how the billing cycle functions. Specifically, I am interested in knowing the start and end dates.\\n\\nThank you for your assistance regarding these billing inquiries.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "We appreciate you reaching out with your billing questions. The billing period generally begins on the first day of the month and concludes on the last day, with payments due by the 10th of the following month. We accept credit cards, bank transfers, and certain online payment services; credit card transactions are typically processed the quickest. Late payments may incur fees based on the due date, and any additional processing charges depend on the chosen payment method. You can review your statements for detailed payment information.",
6
+ "vip_tier": "standard",
7
+ "priority": "low",
8
+ "handle_time_minutes": 11.2,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-6f37a2d1.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-6f37a2d1",
3
+ "ticket_text": "[Unable to Access Office Applications]\nDear Customer Support,\\n\\nWe are encountering a problem where employees are unable to open Excel, PowerPoint, and other Office programs on MacBook Air devices, despite having valid licenses. The issue started after a recent macOS update, which we suspect may have caused compatibility problems, possibly due to expired authentication tokens.\\n\\nTo attempt a fix, we rebooted the laptops, tried repairing Office, and re-entered Microsoft credentials. Regrettably, none of these actions resolved the issue, and the applications still cannot be accessed.\\n\\nWe would",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for providing a detailed explanation of the issue. To assist you further, please specify any error messages encountered when launching Office applications. Also, verify whether your macOS version is up to date and confirm that the latest versions of Microsoft Office are installed. Since immediate access is critical, we can schedule a call at a convenient time to guide you through advanced troubleshooting steps. Please let us know your availability and any additional information.",
6
+ "vip_tier": "unknown",
7
+ "priority": "high",
8
+ "handle_time_minutes": 22.1,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-70e84066.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-70e84066",
3
+ "ticket_text": "[Enhancing Multi-Unit Marketing Processes]\nDear Customer Support Team,\\n\\nI am reaching out to request comprehensive details on optimizing marketing workflows across multiple departments by utilizing advanced analytics, automation, and centralized account management. Our organization aims to improve campaign coordination and boost performance metrics across various marketing channels, believing that implementing such strategies will greatly enhance our overall marketing success.\\n\\nIn particular, I would like to understand the best practices for integrating data analytics tools that offer real-time insights across different teams.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for your inquiry, <name>. To provide relevant assistance, could you please specify which analytics and automation tools your teams are currently using? This will enable us to suggest compatible solutions, effective practices, and relevant case studies tailored to your environment.",
6
+ "vip_tier": "standard",
7
+ "priority": "high",
8
+ "handle_time_minutes": 78.4,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-7928f5fa.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-7928f5fa",
3
+ "ticket_text": "[Anfrage nach detaillierten Angaben zur Systemarchitektur der Plattform]\nSehr geehrtes Kundensupport-Team,\\n\\nich hoffe, diese Nachricht trifft Sie wohl. Ich nehme Kontakt auf, um umfassende Informationen zur Architektur der Plattform zu erfragen. Das Verständnis der zugrunde liegenden Struktur, Komponenten und deren Zusammenhänge ist entscheidend, um eine reibungslose Integration zu gewährleisten und die Nutzung der Dienste zu optimieren.\\n\\nBesonders interessieren mich Details zu den Kernmodulen der Plattform, Datenströmen, Sicherheitsmaßnahmen, Skalierbarkeitsmerkmalen sowie verfügbaren APIs und Schnittstellen zur Anpassung. Zudem wären Einblicke in den Technologiestack sowie die Bereitstellungsumgebung sehr hilfreich.\\n\\nDer Zugriff auf diese Informationen ermöglicht es dem technischen Team, die Infrastrukturprozesse besser zu planen und zu steuern.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Vielen Dank für Ihre Anfrage. Wir stellen Ihnen die verfügbaren technischen Dokumentationen zur Verfügung. Falls notwendig, lassen Sie uns gern einen passenden Termin mit unseren Spezialisten vereinbaren.",
6
+ "vip_tier": "standard",
7
+ "priority": "low",
8
+ "handle_time_minutes": 21.7,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "de"
12
+ }
data/cases/case-7febc51e.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-7febc51e",
3
+ "ticket_text": "[VPN Access Issue]\nCustomer Support,\\n\\nWe are encountering a disruption in VPN-router connectivity that is impacting several devices, notably essential remote telemedicine systems and EMR integrations. Attempts to resolve the issue by restarting affected devices and resetting the router have been unsuccessful. We suspect the problem may be related to firmware discrepancies following recent network configuration updates. This disruption is significantly affecting our operations, and we urgently need assistance to identify and fix the root cause. Kindly advise on additional troubleshooting steps.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for reporting this problem. Please provide the model of your VPN router, the current firmware version, and details of any recent network modifications. This information will assist us in diagnosing the issue and recommending suitable troubleshooting measures or firmware updates.",
6
+ "vip_tier": "standard",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 55.5,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-8ba05714.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-8ba05714",
3
+ "ticket_text": "[Issue with SaaS Platform Functionality]\nSehr geehrtes Support-Team,\\n\\nich möchte Sie auf einen Ausfall der Funktionen unserer SaaS-Plattform aufmerksam machen, den wir momentan erleben. In den letzten Stunden sind mehrere zentrale Features der Plattform langsamer geworden, was die Arbeitsabläufe erheblich beeinträchtigt und die Produktivität verringert.\\n\\nBesonders betroffen sind die Ladezeiten der Dashboards, es gibt Inkonsistenzen bei der Daten-Synchronisation sowie gelegentliche Fehler im Benutzer-Authentifizierungsprozess. Trotz Versuchen, die Anwendung neu zu starten und den Browser-Cache zu leeren, bestehen die Probleme weiterhin.\\n\\nDa die Plattform für unsere täglichen Geschäftsprozesse unverzichtbar ist, bitten wir um eine schnelle Bearbeitung dieses Anliegens. Könnten Sie uns bitte Unterstützung bieten?",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Vielen Dank für Ihre Kontaktaufnahme bezüglich der Probleme mit unserer SaaS-Plattform. Wir verstehen die Dringlichkeit, insbesondere bei den Funktionen der Dashboards, der Datensynchronisierung und der Benutzer-Authentifizierung, und entschuldigen uns für die entstandenen Unannehmlichkeiten. Unser technisches Team arbeitet derzeit an der Analyse der Situation, um die Ursache zu identifizieren und die normale Funktion so schnell wie möglich wiederherzustellen. Wir können momentan keinen genauen Zeitrahmen für die Lösung nennen, werden Sie jedoch umgehend informieren, sobald wir weitere Erkenntnisse haben. In der Zwischenzeit empfehlen wir, die Plattform vorübergehend nur bei dringendem Bedarf zu nutzen, um den Support zu gewährleisten.",
6
+ "vip_tier": "vip",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 49.7,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "de"
12
+ }
data/cases/case-937b0422.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-937b0422",
3
+ "ticket_text": "[PAYMENT] see what payment optionsx are allowed",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: see what payment optionsx are allowed\nAgent: I appreciate your inquiry! I'm here to assist you in discovering the payment options we accept at our company. By offering a variety of convenient choices, we aim to accommodate your preferences. Here are the payment methods we currently allow:\n\n- **Credit/Debit Card:** Visa, Mastercard, and American Express are all accepted.\n- **PayPal:** A widely recognized and secure online payment platform.\n- **Bank Transfer:** This option allows for direct payments from your bank account.\n- **Apple Pay:** For Apple users, this method provides a seamless payment experience.\n- **Google Wallet:** Another popular payment system for quick and easy checkouts.\n\nPlease let me know if you require more information or assistance with any of these methods. It's our pleasure to be of service to you.",
6
+ "vip_tier": "vip",
7
+ "priority": "low",
8
+ "handle_time_minutes": 36.2,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-9ad5d3ab.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-9ad5d3ab",
3
+ "ticket_text": "[Immediate Help Needed: Technical Problem with Cloud SaaS Service]\nDear Customer Support Team,\\n\\nI am submitting a report regarding a technical problem encountered with the Cloud SaaS platform, which is currently disrupting our business activities. I have observed that certain features are not functioning as expected, causing interruptions that hinder workflow efficiency.\\n\\nIn particular, I am facing sporadic connectivity issues when trying to access the platform. Sometimes, the system fails to load the dashboard, and the data displayed appears outdated or incomplete. Furthermore, the response times for executing commands have significantly increased, resulting in delays.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thanks for providing detailed information about the issue with the Cloud SaaS platform. We apologize for the inconvenience and understand the impact on your business. To assist us further, could you please confirm if the problem is affecting specific user accounts, and share any relevant error messages or screenshots? Also, let us know your current browser and operating system versions. Our technical team is ready to escalate this matter and work towards a swift resolution. Feel free to contact us by phone if needed.",
6
+ "vip_tier": "vip",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 9.9,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-9c147cfc.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-9c147cfc",
3
+ "ticket_text": "[Inquiry for In-Depth Details on Financial Institution Offerings]\nDear Customer Support Team,\\n\\nI hope this message reaches you in good health. I am writing to request detailed information about the spectrum of products provided by your financial institution. As a potential client, I am particularly eager to learn about the features, advantages, and terms linked to your investment and savings offerings.\\n\\nWould you be able to send comprehensive brochures or documentation that specify the details of your products? I am interested in information regarding account types, interest rates, fees, minimum deposit amounts, and any current promotional deals.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for your interest in our financial products. We offer a diverse selection of investment and savings solutions tailored to various needs, including high-yield savings accounts, fixed-term deposits, mutual funds, and retirement plans. Each product features specific benefits, interest rates, fees, minimum deposit requirements, and caters to different risk levels suitable for various customer profiles. We will provide detailed brochures that cover all these aspects, along with information on our current promotional offers.",
6
+ "vip_tier": "standard",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 70.3,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-a7068c14.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-a7068c14",
3
+ "ticket_text": "[Guidelines for Incorporating Seagate Expansion Drives]\nDear Customer Support Team,\\n\\nI hope this message reaches you in good health. I am seeking comprehensive instructions on how to effectively integrate Seagate Expansion Desktop 6TB drives into healthcare storage solutions. My main priority is to guarantee that data management and storage procedures fully adhere to HIPAA and GDPR standards.\\n\\nCould you please share your suggestions for the best configuration of these drives within a healthcare setting? In particular, I am keen to learn about secure setup options that can assist in maintaining compliance.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for your query. To provide precise advice, please specify your operating system and storage environment. We recommend implementing hardware encryption, enforcing strict access controls, performing regular firmware updates, and adopting secure backup practices to ensure compliance with HIPAA and GDPR regulations.",
6
+ "vip_tier": "standard",
7
+ "priority": "high",
8
+ "handle_time_minutes": 17.2,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-ac7b0b06.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-ac7b0b06",
3
+ "ticket_text": "[Wesentlicher Sicherheitsvorfall]\nSehr geehrtes Support-Team,\\n\\nich möchte einen gravierenden Sicherheitsvorfall melden, der gegenwärtig mehrere Komponenten unserer Infrastruktur betrifft. Betroffene Geräte umfassen Projektoren, Bildschirme und Speicherlösungen auf Cloud-Plattformen. Der Grund für die Annahme ist, dass der Vorfall eine potenzielle Datenverletzung im Zusammenhang mit einer Cyberattacke darstellt, was ein erhebliches Risiko für sensible Informationen und den laufenden Geschäftsbetrieb unserer Organisation bedeutet.\\n\\nUnsere initialen Untersuchungen haben ungewöhnliche Aktivitäten und Abweichungen bei den Geräten ergeben. Trotz der Umsetzung unserer standardisierten Behebungs- und Eindämmungsmaßnahmen konnte die Bedrohung bislang nicht vollständig eliminiert.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Vielen Dank für die Meldung des kritischen Sicherheitsvorfalls und die Bereitstellung der Übersicht über die betroffenen Geräte sowie der ergriffenen ersten Maßnahmen. Wir erkennen die Dringlichkeit und Schwere der Lage an und setzen alles daran, den Fall prioritär zu bearbeiten. Für eine umgehende Untersuchung benötigen wir zusätzliche Informationen: Bitte senden Sie uns spezifische Protokolle der betroffenen Projektoren, Bildschirme und Cloud-Speichersysteme, inklusive Zeitstempel verdächtiger Aktivitäten sowie ungewöhnlicher Fehlermeldungen. Falls möglich, fügen Sie auch eine Zusammenfassung der bereits durchgeführten Maßnahmen bei.",
6
+ "vip_tier": "standard",
7
+ "priority": "high",
8
+ "handle_time_minutes": 5.2,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "de"
12
+ }
data/cases/case-acaecb0d.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-acaecb0d",
3
+ "ticket_text": "[ACCOUNT] i cant open an accojnt help me to notify of a sign-up issue",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: i cant open an accojnt help me to notify of a sign-up issue\nAgent: I'm glad you contacted us to us and expressing your difficulty in opening an account. We understand the frustration that arises from encountering sign-up issues. Rest assured, we are here to help you navigate through this process smoothly. To notify us of the sign-up issue you're facing, please provide us with more details regarding the problem. This will enable us to assist you promptly and find the best possible solution. We appreciate your patience, and we are committed to resolving the issue you're facing as quickly as possible.",
6
+ "vip_tier": "unknown",
7
+ "priority": "low",
8
+ "handle_time_minutes": 31.2,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-b20a7628.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-b20a7628",
3
+ "ticket_text": "[Query]\nDear Customer Support Team,\\n\\nI hope this message reaches you well. I am reaching out to request detailed billing details and payment options for a QuickBooks Online subscription. Specifically, I am interested in understanding the available plans, their pricing structures, and any tailored options for institutional clients within the financial services industry. It would be helpful to receive documentation on invoicing procedures, payment timelines, discounts, and any special terms applicable to organizations like ours.\\n\\nAdditionally, please",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for reaching out to us regarding your QuickBooks Online subscription inquiry. We are pleased to assist you with detailed billing information and payment options tailored for institutional clients in the financial services sector. QuickBooks Online offers several subscription tiers with varying features; institutional pricing discounts may also be available based on your organization's size and specific needs. We will send comprehensive documentation covering current plans, pricing details, invoicing processes, payment schedules, and any applicable discounts or special terms.",
6
+ "vip_tier": "standard",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 22.9,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-bf7cc420.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-bf7cc420",
3
+ "ticket_text": "[Question About Marketing Agency Software Compatibility]\nDear Support Team,\\n\\nI hope this message reaches you well. I am reaching out to ask about the compatibility of your products with the specific needs of marketing agencies. Our company is considering adopting these solutions to streamline our current marketing processes and wants to confirm that the products are fully compatible with the tools and platforms we currently utilize.\\n\\nCould you please supply detailed information regarding the compatibility of your products with popular marketing software, CRM systems, email marketing applications, and analytics platforms? Additionally, I would appreciate any relevant case studies or documentation tailored to our use case.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for your inquiry. Our product supports integration with major CRM, email marketing, and analytics platforms through APIs and customization options. Please specify the tools you are using so we can provide you with detailed documentation and relevant case studies.",
6
+ "vip_tier": "unknown",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 5.3,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-c0e2500e.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-c0e2500e",
3
+ "ticket_text": "[ACCOUNT] I want to change to the gold account, help me",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: I want to change to the gold account, help me\nAgent: I'll do my best! I'm thrilled to assist you with upgrading to the {{Account Type}} account:\n\n1. Log In: Start by logging into our platform using your credentials.\n2. Navigate to Account Settings: Once logged in, locate the '{{Settings}}' or '{{Profile}}' section of your account.\n3. Upgrade Option: Within this section, you will find an option labeled '{{Upgrade Account}}' or '{{Account Change}}'.\n4. Select {{Account Type}} Account: Choose the '{{Account Type}}' option from the available choices.\n5. Finalize the Switch: Follow the on-screen prompts and instructions to complete the switch to the {{Account Type}} account.\n\nFeel free to reach out if you encounter any challenges or have further questions. Your decision to upgrade is highly appreciated, and I'm here to ensure a seamless and enjoyable experience with the {{Account Type}} account. Enjoy the exclusive benefits that come with it!",
6
+ "vip_tier": "standard",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 23.8,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-ce2076c3.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-ce2076c3",
3
+ "ticket_text": "[Feature Query]\nDear Customer Support,\\n\\nI hope this message reaches you in good health. I am eager to learn more about the features of one of your products. Would you be able to share comprehensive details about its functionalities, specifications, and any distinctive characteristics it may possess? Additionally, if there are user manuals, tutorials, or demonstration videos available, I would be grateful if you could provide those resources. Gaining a thorough understanding of the features will assist me in making an informed decision regarding the product.\\n\\nThank you very much for your assistance. I look forward to your prompt reply.\\n\\nBest regards",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for your inquiry. Please specify which product you are interested in, so I can provide detailed information, features, and relevant resources.",
6
+ "vip_tier": "unknown",
7
+ "priority": "high",
8
+ "handle_time_minutes": 39.5,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-ce230c3e.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-ce230c3e",
3
+ "ticket_text": "[Connectivity Problems with Printer on MacBook Pro]\nDear Support Team,\\n\\nI am reporting a recurring issue with the Laser Printer when printing from MacBook Pros running macOS 15. Several team members have recently encountered this problem, which appears to be connected to the latest macOS 15 system updates.\\n\\nWe believe the root cause might be a driver compatibility issue due to the updated operating systems or printer firmware. To troubleshoot, we have restarted the printers and MacBook devices, reinstalled the printer drivers, and verified configurations.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for reaching out regarding the connectivity problems with the Laser Printer on MacBook Pros after recent macOS 15 updates. We understand how disruptive this can be and appreciate your efforts in troubleshooting so far. To assist you further, please confirm the specific MacBook model and current macOS version you are using (including whether you have iOS devices like iPhones or iPads). Also, provide the model and driver version of the Laser Printer installed. Meanwhile, we recommend...",
6
+ "vip_tier": "standard",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 63.7,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-d1c3b227.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-d1c3b227",
3
+ "ticket_text": "[DELIVERY] need to check how soon can i expect my product",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Customer: need to check how soon can i expect my product\nAgent: For sure! I completely understand your need to have a clear idea of when your product will arrive. To provide you with accurate information, could you please provide me with the {{Tracking Number}} or {{Order Number}}? With that information, I'll be able to give you an estimate of the expected delivery date. Your patience is highly appreciated!",
6
+ "vip_tier": "standard",
7
+ "priority": "low",
8
+ "handle_time_minutes": 87.6,
9
+ "churned_within_30d": false,
10
+ "source_dataset": "bitext_dialogues (real)",
11
+ "language": "en"
12
+ }
data/cases/case-d37c0bca.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-d37c0bca",
3
+ "ticket_text": "[Account Disruption]\nDear Customer Support Team,\\n\\nI am writing to report a significant problem with the centralized account management portal, which currently appears to be offline. This outage is blocking access to account settings, leading to substantial inconvenience. I have attempted to log in multiple times using different browsers and devices, but the issue persists.\\n\\nCould you please provide an update on the outage status and an estimated time for resolution? Also, are there any alternative ways to access and manage my account during this downtime?",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for reaching out, <name>. We are aware of the outage affecting the centralized account management system, and our technical team is actively working to resolve the issue. In the meantime, we suggest using alternative methods to manage your account, with a focus on restoring service as quickly as possible. We will provide an update as soon as the service is back online. We apologize for the inconvenience and appreciate your patience. If you have any further questions, please let us know.",
6
+ "vip_tier": "standard",
7
+ "priority": "high",
8
+ "handle_time_minutes": 15.1,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }
data/cases/case-e2a80316.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "case_id": "case-e2a80316",
3
+ "ticket_text": "[Multiple Device Connection Problems]\nDear Customer Support,\\n\\nWe are experiencing extensive connectivity problems impacting numerous devices throughout the office. The issues have been observed with headsets, printers, and workstations all at once, significantly disrupting daily activities. Our initial investigation indicates that the cause may be a network outage or a misconfiguration within the system infrastructure.\\n\\nOur team has already tried several troubleshooting methods, including rebooting affected devices and swapping hardware components, but unfortunately, these efforts did not resolve the disruptions.",
4
+ "email_thread": [],
5
+ "conversation_snippet": "Thank you for providing details about the connectivity problems affecting various devices. To assist you further, could you please specify whether the network outage affects both wired and wireless connections, and if any error messages are displayed on the devices? Also, kindly inform us of any recent modifications to your network configuration or infrastructure. If possible, please share relevant network logs or screenshots. We will prioritize your case and, if necessary, arrange a call at your convenience to accelerate the troubleshooting process.",
6
+ "vip_tier": "standard",
7
+ "priority": "medium",
8
+ "handle_time_minutes": 37.9,
9
+ "churned_within_30d": true,
10
+ "source_dataset": "support_tickets (real)",
11
+ "language": "en"
12
+ }