Spaces:

AshwinP
/

compounding-test

Sleeping

apingali Claude Opus 4.7 (1M context) commited on 12 days ago

Commit

ebbca73

1 Parent(s): 69d1664

feat(hf-space): user-friendly copy + API key field + 4 sample initiatives

UX polish on AshwinP/compounding-test based on first-look feedback.
The dropdown labels and intro were full of operator vocabulary
(ZeroGPU, HF_TOKEN, SPACE_ID) that a casual visitor wouldn't know.

What changed:
- Provider dropdown labels rewritten as plain free-vs-premium:
Free · Phi-4-mini-instruct (Microsoft) — runs on GPU
Free · Gemma 2 9B (Google) — runs via HuggingFace
Premium · Claude Opus 4.7 (Anthropic) — paste your API key below
Free options listed first; "Premium" framing for paid.
- Default selection is now the first free option (never anthropic
on a public Space — that would error without a key).
- Optional Anthropic API key Textbox (type=password) appears only
when Premium is selected (provider.change toggles visibility).
Per-call use; never persisted. Friendly error if Premium is
picked without a key.
- Intro markdown rewritten: short pitch, four-outcome verdict
names highlighted, one-line guidance on free-vs-premium choice,
link to the framework essay. Dropped the "Default model
provider: zerogpu" auto-detect note (jargon for casual users).
- gr.Examples component with 4 realistic ~300-400-word sample
initiatives — one per verdict quadrant:
compounder — regional commercial insurance carrier
one-shot win — community bank loan document extraction
wrong thing — 3PL warehouse picking-route AI
Roman Candle — QSR chain email/SMS personalization
Each sample pre-fills the description + industry + scale +
budget so the user can click and immediately hit Diagnose.

Tests: 31/31 still pass (parser + provider routing unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

app.py +183 -31

app.py CHANGED Viewed

@@ -391,6 +391,96 @@ SCALES = ["pilot", "department", "business unit", "enterprise"]
 BUDGETS = ["<$100K", "$100K–$1M", "$1M–$10M", ">$10M"]
 def _load_reference():
     """Read the prompt template + reference JSONs from disk at app start."""
     prompt_template = (ROOT / "prompts" / "diagnose.txt").read_text()
@@ -444,14 +534,21 @@ def diagnose(
     scale: Optional[str],
     budget: Optional[str],
     provider: Optional[str] = None,
 ) -> tuple[str, str]:
     """Validate input, call the selected model with the cached system
     block, parse the response, and return (markdown_writeup,
     raw_json_string) for the two Gradio tabs.
-    `provider` (anthropic | huggingface) defaults to DEFAULT_PROVIDER
-    when not supplied — the Gradio dropdown always supplies it on a
-    real submission.
     Per F14 + contract §2, all error paths surface a user-friendly message
     in the markdown tab and an empty JSON tab; nothing leaks a stack trace.
@@ -483,6 +580,24 @@ def diagnose(
             "",
         )
     user_prompt = (
         PROMPT_TEMPLATE
         .replace("{{user_input}}", description)
@@ -543,57 +658,94 @@ def build_demo():
     """Build and return the Gradio Blocks UI. Called only by __main__."""
     import gradio as gr
-    provider_choices = [
-        (f"Anthropic — {ANTHROPIC_MODEL_ID} (requires ANTHROPIC_API_KEY)", "anthropic"),
-        (f"HuggingFace API — {HF_MODEL_ID} (free on HF Spaces; HF_TOKEN locally)", "huggingface"),
-    ]
     if _zerogpu_available():
         provider_choices.append((
-            f"ZeroGPU — {ZEROGPU_MODEL_ID} (free GPU via HuggingFace Pro plan)",
             "zerogpu",
         ))
     with gr.Blocks(title="The Compounding Test") as demo:
         gr.Markdown(
             "# The Compounding Test\n\n"
-            "A diagnostic for AI investments at non-technology companies. Paste a "
-            "description of your AI initiative (200–5000 words); receive a scored "
-            "writeup in one of four quadrants — compounder, one-shot win, compounding "
-            "the wrong thing, or Roman Candle. The framework is at "
-            "https://www.mile-hi.ai/journal/the-berkshire-test\n\n"
-            f"_Default model provider: **{DEFAULT_PROVIDER}** "
-            f"(auto-detected from your environment — pick a different one in the "
-            f"dropdown below to compare)._"
         )
         with gr.Row():
             description = gr.Textbox(
                 label=f"Describe your AI initiative ({MIN_DESCRIPTION_WORDS}–{MAX_DESCRIPTION_WORDS} words)",
                 placeholder=(
-                    "Describe the bottleneck of your operation, the AI investment, "
-                    "what data feeds it, where the labels come from, and how you "
-                    "expect competitors to respond. Be specific about the workflow."
                 ),
                 lines=12,
             )
         with gr.Row():
             industry = gr.Dropdown(INDUSTRIES, label="Industry (optional)", value=None)
             scale = gr.Dropdown(SCALES, label="Scale (optional)", value=None)
             budget = gr.Dropdown(BUDGETS, label="Budget tier (optional)", value=None)
         with gr.Row():
             provider = gr.Dropdown(
                 choices=provider_choices,
-                value=DEFAULT_PROVIDER,
-                label="Model provider",
-                info=(
-                    "Claude gives the highest-quality writeups but needs your "
-                    "own ANTHROPIC_API_KEY. ZeroGPU runs an open-weight model "
-                    "(Phi-4-mini-instruct by default) on the Space's free Pro "
-                    "GPU — no API costs, no inference credits. HuggingFace API "
-                    "routes to an open model through the HF Inference Providers "
-                    "API — works without any keys on a Space via the Space's "
-                    "monthly credits. Switch to compare writeup quality."
-                ),
             )
         submit = gr.Button("Diagnose", variant="primary")
         with gr.Tabs():
             with gr.Tab("Diagnosis"):
@@ -602,7 +754,7 @@ def build_demo():
                 json_out = gr.Code(language="json")
         submit.click(
             diagnose,
-            inputs=[description, industry, scale, budget, provider],
             outputs=[writeup_out, json_out],
         )

 BUDGETS = ["<$100K", "$100K–$1M", "$1M–$10M", ">$10M"]
+# ---------------------------------------------------------------------------
+# Sample initiatives (gr.Examples) — one per verdict quadrant
+# ---------------------------------------------------------------------------
+# Realistic ~250–400-word AI-initiative descriptions that should land in a
+# specific quadrant of the 2×2 verdict matrix. Used to seed user testing
+# and give first-time visitors something concrete to click.
+_SAMPLE_COMPOUNDER = (
+    "We're a regional commercial insurance carrier specializing in restaurant "
+    "general liability. We write about 8,000 policies a year across the "
+    "Midwest. Underwriting is the bottleneck of our business — agents wait "
+    "36 to 48 hours for a quote because our underwriters manually pull "
+    "industry codes, loss runs, and prior-carrier history from three "
+    "different systems and decide whether to bind, decline, or refer. "
+    "We're deploying an LLM-powered underwriting assistant that pulls the "
+    "data automatically, flags risk factors based on patterns in our "
+    "12-year claims database, and proposes a base rate. The underwriter "
+    "reviews and approves. Every policy we write generates new claim "
+    "outcomes — fires, slip-and-falls, liquor-liability claims — and those "
+    "outcomes feed back into the next quarter's model retraining. Our "
+    "competitors mostly use Verisk's industry-standard models, which we "
+    "don't share data with, so our model gets better on our book of "
+    "business while theirs reflects the industry average. Internal goal: "
+    "cut time-to-quote from 36 hours to 4 hours, increase win rate on "
+    "profitable risks by 15%, and progressively shift the loss ratio by "
+    "1–2 points per year as the model learns from each renewal cycle."
+)
+_SAMPLE_ONE_SHOT_WIN = (
+    "We're a community bank with $4B in assets, 38 branches across two "
+    "states. Loan officers spend about 6 hours per commercial loan "
+    "reviewing financial statements, tax returns, and corporate documents "
+    "before they can write the credit memo. We're deploying GPT-4 to "
+    "extract key fields — revenue, EBITDA, debt service coverage ratio, "
+    "ownership structure, related-party transactions — from these "
+    "documents into a structured form. The loan officer reviews the "
+    "extraction and writes the credit memo by hand. We expect to cut "
+    "document review time from 6 hours to about 90 minutes per loan. "
+    "The vendor provides the model and the document templates and is "
+    "selling the same system to four of our peer community banks in the "
+    "region. The AI doesn't learn from the outcome of the loan: defaults, "
+    "prepayments, modifications, restructurings all go into our loan "
+    "servicing system, which doesn't connect back to the extraction "
+    "model. The vendor's roadmap doesn't include any feedback loop. "
+    "We're funding the project from the operations budget; the credit "
+    "team is excited about the time savings."
+)
+_SAMPLE_WRONG_THING = (
+    "We're a third-party logistics provider with 8 warehouses on the "
+    "East Coast. We're investing in computer vision software to optimize "
+    "order picking routes — the AI looks at the warehouse layout, "
+    "current orders, and worker positions and suggests optimized pick "
+    "paths in real time. Pilot results show a 12% reduction in steps "
+    "per order on the test floor. Our operations team has been excited "
+    "about this for 18 months and we just signed a multi-year contract "
+    "with the vendor. Some context on the operation: our warehouses run "
+    "2 shifts. Order volume in shift 1 is around 14,000 orders/day; "
+    "shift 2 is around 6,000. The pick wave finishes by 2pm on shift 1, "
+    "then workers wait 4 to 5 hours for shift 2 trucks to arrive at the "
+    "loading docks. The trucks are scheduled by the customer (a major "
+    "national retailer) and arrive in unpredictable windows. We don't "
+    "control the truck schedule. The CFO has been asking us why total "
+    "throughput per warehouse hasn't moved much in three years; our "
+    "answer has been that the legacy WMS is the constraint, which is "
+    "why we're investing in better picking AI."
+)
+_SAMPLE_ROMAN_CANDLE = (
+    "We run a chain of 220 quick-service restaurants across the "
+    "Southeast. Our gross margin has been under pressure from rising "
+    "ingredient costs and we're rolling out an AI-powered personalized "
+    "marketing platform that sends customized email and SMS offers "
+    "based on customer purchase history, location, and weather. The "
+    "platform is from a major QSR-tech vendor used by several of our "
+    "competitors. Our customer data — names, emails, phone numbers, "
+    "purchase frequency — lives in our point-of-sale provider's cloud, "
+    "which the marketing platform pulls from via the POS provider's "
+    "standard integration. Our purchase data and the modeling are both "
+    "the vendor's stack; we don't see the underlying model and our data "
+    "is commingled with other QSR brands the vendor serves. We expect "
+    "to lift email click-through by 8–12% based on the vendor's "
+    "benchmark studies of similar brands. The marketing team is "
+    "running the rollout; finance signed off on the multi-year "
+    "subscription. We have not measured what's actually constraining "
+    "same-store sales growth — we just know revenue has been flat for "
+    "two years and the board wants action."
+)
 def _load_reference():
     """Read the prompt template + reference JSONs from disk at app start."""
     prompt_template = (ROOT / "prompts" / "diagnose.txt").read_text()
     scale: Optional[str],
     budget: Optional[str],
     provider: Optional[str] = None,
+    anthropic_api_key: Optional[str] = None,
 ) -> tuple[str, str]:
     """Validate input, call the selected model with the cached system
     block, parse the response, and return (markdown_writeup,
     raw_json_string) for the two Gradio tabs.
+    `provider` (anthropic | huggingface | zerogpu) defaults to
+    DEFAULT_PROVIDER when not supplied — the Gradio dropdown always
+    supplies it on a real submission.
+    `anthropic_api_key` is a per-call user-supplied key. When provider
+    is "anthropic" and the key is provided, it overrides any
+    ANTHROPIC_API_KEY env var for this single request. The key is never
+    persisted (Anthropic SDK uses it once and the client object is
+    garbage-collected at function exit).
     Per F14 + contract §2, all error paths surface a user-friendly message
     in the markdown tab and an empty JSON tab; nothing leaks a stack trace.
             "",
         )
+    # If Premium (Anthropic) is selected, the user must supply a key —
+    # either via the page's API-key field (per-call) or via an
+    # ANTHROPIC_API_KEY env var on the Space. Without either, fail fast
+    # with a friendly explanation before we hit the SDK.
+    if provider == "anthropic":
+        env_key = os.environ.get("ANTHROPIC_API_KEY", "").strip()
+        user_key = (anthropic_api_key or "").strip()
+        if not env_key and not user_key:
+            return (
+                "⚠ Premium (Claude Opus) needs an Anthropic API key. Either "
+                "paste your key in the field above, or pick one of the free "
+                "options from the model dropdown.",
+                "",
+            )
+        if user_key:
+            # Per-call override; never persisted beyond this request.
+            os.environ["ANTHROPIC_API_KEY"] = user_key
     user_prompt = (
         PROMPT_TEMPLATE
         .replace("{{user_input}}", description)
     """Build and return the Gradio Blocks UI. Called only by __main__."""
     import gradio as gr
+    # Free options first, premium last. Plain-English labels with no
+    # ANTHROPIC_API_KEY / HF_TOKEN / SPACE_ID jargon — the casual user
+    # shouldn't have to know what any of those mean.
+    provider_choices = []
     if _zerogpu_available():
         provider_choices.append((
+            f"Free · Phi-4-mini-instruct (Microsoft) — runs on GPU",
             "zerogpu",
         ))
+    provider_choices.append((
+            f"Free · Gemma 2 9B (Google) — runs via HuggingFace",
+            "huggingface",
+    ))
+    provider_choices.append((
+        f"Premium · Claude Opus 4.7 (Anthropic) — paste your API key below",
+        "anthropic",
+    ))
+    # Default to the first free option; user can pick Premium if they
+    # have a key. Never default to anthropic on a public Space.
+    default_choice = provider_choices[0][1]
     with gr.Blocks(title="The Compounding Test") as demo:
         gr.Markdown(
             "# The Compounding Test\n\n"
+            "A diagnostic for AI investments at non-technology companies. "
+            "Describe your AI initiative — get a scored writeup in one of "
+            "four outcomes: **compounder**, **one-shot win**, **compounding "
+            "the wrong thing**, or **Roman Candle**.\n\n"
+            "**The default model is free.** Pick **Premium · Claude Opus** "
+            "from the dropdown if you have an Anthropic API key and want "
+            "the highest-quality writeup. Read the full framework at "
+            "[mile-hi.ai/journal/the-berkshire-test]("
+            "https://www.mile-hi.ai/journal/the-berkshire-test)."
         )
         with gr.Row():
             description = gr.Textbox(
                 label=f"Describe your AI initiative ({MIN_DESCRIPTION_WORDS}–{MAX_DESCRIPTION_WORDS} words)",
                 placeholder=(
+                    "Describe the bottleneck of your operation, the AI "
+                    "investment, what data feeds it, where the labels come "
+                    "from, and how you expect competitors to respond. Be "
+                    "specific about the workflow.\n\n"
+                    "Or pick a sample below to see how it works."
                 ),
                 lines=12,
             )
         with gr.Row():
             industry = gr.Dropdown(INDUSTRIES, label="Industry (optional)", value=None)
             scale = gr.Dropdown(SCALES, label="Scale (optional)", value=None)
             budget = gr.Dropdown(BUDGETS, label="Budget tier (optional)", value=None)
+        gr.Examples(
+            examples=[
+                [_SAMPLE_COMPOUNDER, "insurance", "business unit", "$1M–$10M"],
+                [_SAMPLE_ONE_SHOT_WIN, "banking", "business unit", "$100K–$1M"],
+                [_SAMPLE_WRONG_THING, "logistics", "enterprise", "$1M–$10M"],
+                [_SAMPLE_ROMAN_CANDLE, "retail", "enterprise", "$100K–$1M"],
+            ],
+            inputs=[description, industry, scale, budget],
+            label="Sample initiatives — click one to load it (then click Diagnose)",
+            examples_per_page=4,
+        )
         with gr.Row():
             provider = gr.Dropdown(
                 choices=provider_choices,
+                value=default_choice,
+                label="Choose a model",
             )
+        # The API-key field appears only when Premium is selected. The
+        # key is used per-request and never stored.
+        api_key = gr.Textbox(
+            label="Anthropic API key",
+            placeholder="sk-ant-...",
+            type="password",
+            info=(
+                "Used only for this request and never stored. "
+                "Get a key at console.anthropic.com."
+            ),
+            visible=False,
+        )
+        def _toggle_api_key(p):
+            return gr.update(visible=(p == "anthropic"))
+        provider.change(_toggle_api_key, inputs=[provider], outputs=[api_key])
         submit = gr.Button("Diagnose", variant="primary")
         with gr.Tabs():
             with gr.Tab("Diagnosis"):
                 json_out = gr.Code(language="json")
         submit.click(
             diagnose,
+            inputs=[description, industry, scale, budget, provider, api_key],
             outputs=[writeup_out, json_out],
         )