Spaces:

AshwinP
/

compounding-test

Sleeping

apingali Claude Opus 4.7 (1M context) commited on 22 days ago

Commit

971e3f4

1 Parent(s): 033a83e

feat(hf-space): add ZeroGPU backend for HuggingFace Pro Spaces

User has a HF Pro plan, so the Space can use ZeroGPU: free on-demand
A10g/A100/H200 allocation per request, no API round-trip, no inference
credits burned. We load an open model directly via transformers and
decorate the inference call with @spaces.GPU so the Pro plan picks up
the GPU automatically. Default model is Phi-4-mini-instruct (small
enough for fast cold start); swappable to Gemma 2 9B, Llama-3.3, or
full Phi-4 via ZEROGPU_MODEL_ID.

What changed:
app.py
- try/except imports of `spaces`, `torch`, `transformers` at
module load; sets _ZEROGPU_DEPS_AVAILABLE accordingly.
_zerogpu_available() wrapper exists so tests can monkeypatch
the answer without touching the real imports.
- _call_zerogpu(system, user) decorated with
`@spaces.GPU(duration=ZEROGPU_DURATION_SECONDS)` when deps are
available; replaced with a clear-error stub otherwise.
- Lazy model load (_load_zerogpu_model) — kept warm across
requests in module-level state; cold start only on first call
after Space process restart.
- _detect_provider precedence updated: Pro Space (SPACE_ID + deps)
→ zerogpu, else anthropic, else huggingface, else anthropic.
Explicit MODEL_PROVIDER=anthropic still wins on a Pro Space.
- PROVIDERS dict always includes zerogpu (even when deps absent —
the stub raises a clear error); UI only surfaces the dropdown
option when deps are importable.
- F14 error message now resolves the model_label across all three
providers via dict lookup.

README.md (NEW)
HuggingFace Spaces YAML header specifying `sdk: gradio`,
`app_file: app.py`, `hardware: zero-a10g` so the deployed Space
actually receives ZeroGPU allocation. Also user-facing docs:
backend table, auto-detect precedence, configuration, local-dev
setup, test commands.

requirements.txt
Restructured with section comments. Added:
+ spaces>=0.30
+ transformers>=4.45
+ torch>=2.4
+ accelerate>=1.0
Documented that local-only users (anthropic backend) can omit
the heavy zerogpu lines.

.env.example
Added ZEROGPU BACKEND section: ZEROGPU_MODEL_ID with tested
alternatives, ZEROGPU_DURATION_SECONDS. Updated PROVIDER
SELECTION docs to include zerogpu and the new precedence.

test_diagnose.py
+ 4 zerogpu detection tests (Pro Space with deps → zerogpu,
Space without deps → huggingface fallback, explicit
MODEL_PROVIDER=zerogpu wins, explicit anthropic beats zerogpu
auto-detect).
+ 1 PROVIDERS-dict test (zerogpu always present so the
dispatcher is uniform regardless of dep availability).
All 31 tests pass (15 parser + 16 provider).

specs/004-berkshire-test/contracts/hf-space-interface.md §2
Added §2.C with the full ZeroGPU invocation pattern (lazy load,
chat-template inference, prompt-stripped decode). Updated
provider-selection precedence table. Added "Required Space
metadata" subsection explaining the hardware: zero-a10g header.
Updated cache-strategy notes with ZeroGPU cold-start behavior.

specs/004-berkshire-test/tasks.md
T037 rationale updated to reflect the three-backend
architecture and the README.md Space-metadata addition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (5) hide show

.env.example +31 -5
README.md +86 -0
app.py +127 -14
requirements.txt +16 -2
test_diagnose.py +28 -2

.env.example CHANGED Viewed

@@ -7,12 +7,18 @@
 # PROVIDER SELECTION
 # ============================================================
 # Optional. If unset, the app auto-detects based on which credentials
-# are present (see app.py::_detect_provider). Valid values:
 #   anthropic    — Claude via the Anthropic SDK (best writeup quality)
-#   huggingface  — Gemma 2 / Phi-4 / Llama-3.3 / Qwen via HF Inference
-#                  Providers (works with no Anthropic key; free on
-#                  HF Spaces via the Space's identity)
-# Leave blank for auto-detect.
 # MODEL_PROVIDER=
 # ============================================================
@@ -44,6 +50,26 @@ MODEL_ID=claude-opus-4-7
 # shows a "try again" message rather than crashing.
 # HF_MODEL_ID=google/gemma-2-9b-it
 # ============================================================
 # VALIDATION
 # ============================================================

 # PROVIDER SELECTION
 # ============================================================
 # Optional. If unset, the app auto-detects based on which credentials
+# are present and whether we are running on a HuggingFace Space (see
+# app.py::_detect_provider). Valid values:
 #   anthropic    — Claude via the Anthropic SDK (best writeup quality)
+#   huggingface  — Open models (Gemma 2 / Phi-4 / Llama-3.3 / Qwen) via
+#                  HF Inference Providers API. Free on HF Spaces via
+#                  the Space's monthly credits; HF_TOKEN locally.
+#   zerogpu      — Open model (Phi-4-mini-instruct by default) loaded
+#                  locally in the Space and run on free on-demand GPU
+#                  via the HuggingFace Pro plan's ZeroGPU allocation.
+#                  No API round-trip; no inference credits burned.
+# Auto-detect precedence: Pro Space → zerogpu, else Anthropic key →
+# anthropic, else HF_TOKEN or any Space → huggingface, else anthropic.
 # MODEL_PROVIDER=
 # ============================================================
 # shows a "try again" message rather than crashing.
 # HF_MODEL_ID=google/gemma-2-9b-it
+# ============================================================
+# ZEROGPU BACKEND (HuggingFace Pro plan)
+# ============================================================
+# No credentials required — the @spaces.GPU decorator handles allocation
+# automatically when the Space has a Pro owner. Locally, the function
+# decoration is a no-op and the model runs on CPU (slow, smoke-test only).
+#
+# Optional. Default microsoft/Phi-4-mini-instruct fits on the standard
+# A100 allocation with fast cold start. Other tested choices:
+#   google/gemma-2-9b-it             — larger, slower load, more capable
+#   meta-llama/Llama-3.3-8B-Instruct — Llama 3.3 8B, good JSON adherence
+#   microsoft/phi-4                  — full 14B Phi-4, slower
+# HuggingFace's gated models (Llama, etc.) need HF_TOKEN to download.
+# ZEROGPU_MODEL_ID=microsoft/Phi-4-mini-instruct
+# Optional. Maximum GPU allocation per request, in seconds. The Pro
+# plan allows up to 120s per request; raise/lower to balance cold-start
+# coverage vs. quota use.
+# ZEROGPU_DURATION_SECONDS=120
 # ============================================================
 # VALIDATION
 # ============================================================

README.md ADDED Viewed

	@@ -0,0 +1,86 @@

+---
+title: The Compounding Test
+emoji: 📈
+colorFrom: indigo
+colorTo: purple
+sdk: gradio
+sdk_version: 4.44.0
+app_file: app.py
+pinned: false
+license: mit
+short_description: A diagnostic for AI investments at non-tech companies.
+hardware: zero-a10g
+---
+# The Compounding Test
+A diagnostic for AI investments at non-technology companies. Paste a
+description of your AI initiative (200–5000 words); receive a scored
+writeup in one of four quadrants — **compounder**, **one-shot win**,
+**compounding the wrong thing**, or **Roman Candle**.
+Framework essay: <https://www.mile-hi.ai/journal/the-berkshire-test>
+## Backends
+The Space supports three interchangeable model backends. The dropdown
+in the UI lets you switch per-submission to compare writeup quality.
+| Backend | Model (default) | Credentials | Where it runs |
+|---|---|---|---|
+| `anthropic` | `claude-opus-4-7` | `ANTHROPIC_API_KEY` (Space secret) | Anthropic API |
+| `huggingface` | `google/gemma-2-9b-it` | none on a Space; `HF_TOKEN` locally | HF Inference Providers |
+| `zerogpu` | `microsoft/Phi-4-mini-instruct` | none — Pro plan handles it | On-Space ZeroGPU |
+**Auto-detect precedence:**
+1. Explicit `MODEL_PROVIDER` env var wins.
+2. On a Pro Space (zerogpu deps installed) → `zerogpu`.
+3. Else if `ANTHROPIC_API_KEY` is set → `anthropic`.
+4. Else if `HF_TOKEN` is set, or running on any Space → `huggingface`.
+5. Else fall through to `anthropic` (call-time error guides the user).
+## Configuration
+See `.env.example` for the full list of env vars. Common overrides:
+```
+MODEL_PROVIDER=zerogpu
+ZEROGPU_MODEL_ID=google/gemma-2-9b-it     # bigger; ~30s cold start on A10
+ZEROGPU_DURATION_SECONDS=120              # max GPU allocation per request
+HF_MODEL_ID=meta-llama/Llama-3.3-70B-Instruct
+MODEL_ID=claude-sonnet-4-6                # cheaper Anthropic fallback
+```
+## Local development
+```bash
+python3 -m venv .venv && source .venv/bin/activate
+pip install -r requirements.txt           # ~2GB with torch/transformers
+cp .env.example .env                       # fill in whatever you have
+python app.py                              # http://127.0.0.1:7860
+```
+If you only need to test the `anthropic` backend locally, you can skip
+the heavy `spaces` / `torch` / `transformers` / `accelerate` lines in
+`requirements.txt` — the app degrades gracefully (the zerogpu dropdown
+option won't appear).
+## Tests
+```bash
+pytest test_diagnose.py -v
+```
+31 tests covering the parser contract (15 — what JSON shapes the parser
+accepts and rejects) and the provider routing (16 — auto-detection
+precedence, dispatcher routing, env-driven overrides).
+## Repository
+Source lives in [apingali/effectiveness][repo] under
+`gradio-apps/compounding-test/`. The Space is deployed from that path.
+The published framework essay and four portrait articles live at
+<https://www.mile-hi.ai/journal/the-berkshire-test>.
+[repo]: https://github.com/apingali/effectiveness

app.py CHANGED Viewed

@@ -5,8 +5,8 @@ the two-axis Berkshire Test for AI and returns a scored writeup.
 Architecture per specs/004-berkshire-test/contracts/hf-space-interface.md:
   - Inputs: a description (200–5000 words) + 3 optional clarifiers.
-  - Two backends, selectable by env (`MODEL_PROVIDER`) or auto-detected
-    from available credentials:
       * anthropic   — Claude Opus / Sonnet via the Anthropic SDK;
                       system block is `cache_control:ephemeral` so
                       subsequent calls hit the 5-minute prefix cache.
@@ -15,6 +15,13 @@ Architecture per specs/004-berkshire-test/contracts/hf-space-interface.md:
                       huggingface_hub InferenceClient. Works on HF
                       Spaces with the Space's free inference credits;
                       locally requires HF_TOKEN.
   - Output: two Gradio tabs — markdown writeup + raw JSON.
 Engine/Site boundary (Principle VIII): this app lives in gradio-apps/
@@ -181,10 +188,34 @@ ROOT = Path(__file__).parent
 ANTHROPIC_MODEL_ID = os.environ.get("MODEL_ID", "claude-opus-4-7")
 HF_MODEL_ID = os.environ.get("HF_MODEL_ID", "google/gemma-2-9b-it")
 MAX_DESCRIPTION_WORDS = int(os.environ.get("MAX_DESCRIPTION_WORDS", "5000"))
 MIN_DESCRIPTION_WORDS = 200
 # ---------------------------------------------------------------------------
 # Provider abstraction (anthropic vs huggingface — selectable at runtime)
 # ---------------------------------------------------------------------------
@@ -192,17 +223,22 @@ MIN_DESCRIPTION_WORDS = 200
 def _detect_provider(env=None) -> str:
     """Pick a model provider from env. Order of precedence:
-      1. Explicit MODEL_PROVIDER (anthropic | huggingface).
-      2. Presence of ANTHROPIC_API_KEY → anthropic.
-      3. Presence of HF_TOKEN / HUGGING_FACE_HUB_TOKEN, or running on
-         a HuggingFace Space (SPACE_ID set) → huggingface.
-      4. Fall through to anthropic (call-time error will tell the user
          which env to set).
     """
     env = env if env is not None else os.environ
     explicit = env.get("MODEL_PROVIDER", "").strip().lower()
-    if explicit in ("anthropic", "huggingface"):
         return explicit
     if env.get("ANTHROPIC_API_KEY"):
         return "anthropic"
     if (
@@ -260,9 +296,75 @@ def _call_huggingface(system_block: str, user_prompt: str) -> str:
     return resp.choices[0].message.content
 PROVIDERS = {
     "anthropic": _call_anthropic,
     "huggingface": _call_huggingface,
 }
@@ -394,7 +496,11 @@ def diagnose(
     except Exception as e:
         # API timeout / rate limit / auth / server / network failure
         # (Anthropic SDK or huggingface_hub InferenceClient).
-        model_label = ANTHROPIC_MODEL_ID if provider == "anthropic" else HF_MODEL_ID
         return (
             f"⚠ The diagnostic call to {provider} ({model_label}) failed "
             f"({type(e).__name__}). Try again in a moment, switch providers in "
@@ -439,8 +545,13 @@ def build_demo():
     provider_choices = [
         (f"Anthropic — {ANTHROPIC_MODEL_ID} (requires ANTHROPIC_API_KEY)", "anthropic"),
-        (f"HuggingFace — {HF_MODEL_ID} (free on HF Spaces; HF_TOKEN locally)", "huggingface"),
     ]
     with gr.Blocks(title="The Compounding Test") as demo:
         gr.Markdown(
@@ -475,10 +586,12 @@ def build_demo():
                 label="Model provider",
                 info=(
                     "Claude gives the highest-quality writeups but needs your "
-                    "own ANTHROPIC_API_KEY. The HuggingFace backend runs on a "
-                    "smaller open-weight model and works on a deployed HF Space "
-                    "without any keys, so it's the easiest way to try the "
-                    "diagnostic without signing up for anything."
                 ),
             )
         submit = gr.Button("Diagnose", variant="primary")

 Architecture per specs/004-berkshire-test/contracts/hf-space-interface.md:
   - Inputs: a description (200–5000 words) + 3 optional clarifiers.
+  - Three backends, selectable by env (`MODEL_PROVIDER`) or auto-detected
+    from available credentials and runtime environment:
       * anthropic   — Claude Opus / Sonnet via the Anthropic SDK;
                       system block is `cache_control:ephemeral` so
                       subsequent calls hit the 5-minute prefix cache.
                       huggingface_hub InferenceClient. Works on HF
                       Spaces with the Space's free inference credits;
                       locally requires HF_TOKEN.
+      * zerogpu     — Open model (Phi-4-mini-instruct by default)
+                      loaded LOCALLY in the Space via transformers,
+                      decorated with `@spaces.GPU` so a HuggingFace
+                      Pro plan gets free on-demand A100/H100 GPU
+                      allocation per request. No per-call credit burn;
+                      no API round-trip. Requires the Space to have a
+                      Pro owner; locally falls back to CPU (slow).
   - Output: two Gradio tabs — markdown writeup + raw JSON.
 Engine/Site boundary (Principle VIII): this app lives in gradio-apps/
 ANTHROPIC_MODEL_ID = os.environ.get("MODEL_ID", "claude-opus-4-7")
 HF_MODEL_ID = os.environ.get("HF_MODEL_ID", "google/gemma-2-9b-it")
+ZEROGPU_MODEL_ID = os.environ.get("ZEROGPU_MODEL_ID", "microsoft/Phi-4-mini-instruct")
+ZEROGPU_DURATION_SECONDS = int(os.environ.get("ZEROGPU_DURATION_SECONDS", "120"))
 MAX_DESCRIPTION_WORDS = int(os.environ.get("MAX_DESCRIPTION_WORDS", "5000"))
 MIN_DESCRIPTION_WORDS = 200
+# ZeroGPU availability is detected at import time. The `spaces` package
+# is HuggingFace's runtime for on-demand GPU allocation; `transformers`
+# + `torch` are required to actually load and run the model. All three
+# must be importable for the zerogpu backend to function.
+try:
+    import spaces as _spaces
+    import torch as _torch
+    from transformers import AutoModelForCausalLM as _AutoModelForCausalLM
+    from transformers import AutoTokenizer as _AutoTokenizer
+    _ZEROGPU_DEPS_AVAILABLE = True
+except ImportError:
+    _ZEROGPU_DEPS_AVAILABLE = False
+def _zerogpu_available() -> bool:
+    """Return whether the zerogpu backend can be used. Wrapped as a
+    function so tests can monkeypatch the answer without touching the
+    real torch/transformers imports."""
+    return _ZEROGPU_DEPS_AVAILABLE
 # ---------------------------------------------------------------------------
 # Provider abstraction (anthropic vs huggingface — selectable at runtime)
 # ---------------------------------------------------------------------------
 def _detect_provider(env=None) -> str:
     """Pick a model provider from env. Order of precedence:
+      1. Explicit MODEL_PROVIDER (anthropic | huggingface | zerogpu).
+      2. Running on a HuggingFace Space (SPACE_ID set) AND the zerogpu
+         deps (spaces + transformers + torch) are importable → zerogpu.
+         This is the Pro-plan free-GPU path.
+      3. Presence of ANTHROPIC_API_KEY → anthropic.
+      4. Presence of HF_TOKEN / HUGGING_FACE_HUB_TOKEN, or running on
+         a HuggingFace Space without zerogpu deps → huggingface.
+      5. Fall through to anthropic (call-time error will tell the user
          which env to set).
     """
     env = env if env is not None else os.environ
     explicit = env.get("MODEL_PROVIDER", "").strip().lower()
+    if explicit in ("anthropic", "huggingface", "zerogpu"):
         return explicit
+    if env.get("SPACE_ID") and _zerogpu_available():
+        return "zerogpu"
     if env.get("ANTHROPIC_API_KEY"):
         return "anthropic"
     if (
     return resp.choices[0].message.content
+# ZeroGPU backend. The model is loaded once on first call (lazy) and
+# kept warm in module-level state so subsequent requests reuse it.
+# The `@spaces.GPU` decorator MUST be applied at function-definition
+# time on a Pro Space — outside a Space, the decorator is a no-op and
+# the function just runs on CPU (very slow, useful only for smoke tests).
+_zerogpu_model = None
+_zerogpu_tokenizer = None
+def _load_zerogpu_model():
+    """Load the model + tokenizer once. Called lazily on first request
+    so module import stays fast (the model weights are tens of GB)."""
+    global _zerogpu_model, _zerogpu_tokenizer
+    if _zerogpu_model is not None:
+        return
+    _zerogpu_tokenizer = _AutoTokenizer.from_pretrained(ZEROGPU_MODEL_ID)
+    _zerogpu_model = _AutoModelForCausalLM.from_pretrained(
+        ZEROGPU_MODEL_ID,
+        torch_dtype=_torch.bfloat16,
+        device_map="auto",
+    )
+if _ZEROGPU_DEPS_AVAILABLE:
+    @_spaces.GPU(duration=ZEROGPU_DURATION_SECONDS)
+    def _call_zerogpu(system_block: str, user_prompt: str) -> str:
+        """ZeroGPU backend. Loads Phi-4-mini-instruct (or whatever
+        ZEROGPU_MODEL_ID points at) into the Space's allocated GPU and
+        runs chat-template inference. Returns the assistant text only —
+        prompt tokens are stripped before decoding."""
+        _load_zerogpu_model()
+        messages = [
+            {"role": "system", "content": system_block},
+            {"role": "user", "content": user_prompt},
+        ]
+        inputs = _zerogpu_tokenizer.apply_chat_template(
+            messages,
+            return_tensors="pt",
+            add_generation_prompt=True,
+        ).to(_zerogpu_model.device)
+        outputs = _zerogpu_model.generate(
+            inputs,
+            max_new_tokens=2500,
+            temperature=0.2,
+            do_sample=True,
+            pad_token_id=_zerogpu_tokenizer.eos_token_id,
+        )
+        prompt_len = inputs.shape[1]
+        return _zerogpu_tokenizer.decode(
+            outputs[0][prompt_len:], skip_special_tokens=True
+        )
+else:
+    def _call_zerogpu(system_block: str, user_prompt: str) -> str:
+        raise RuntimeError(
+            "ZeroGPU backend requires `spaces`, `transformers`, and `torch` "
+            "to be importable AND should be run on a HuggingFace Pro Space "
+            "for free on-demand GPU. Install the full requirements.txt and "
+            "deploy to a Space, or pick anthropic / huggingface from the "
+            "provider dropdown."
+        )
 PROVIDERS = {
     "anthropic": _call_anthropic,
     "huggingface": _call_huggingface,
+    "zerogpu": _call_zerogpu,
 }
     except Exception as e:
         # API timeout / rate limit / auth / server / network failure
         # (Anthropic SDK or huggingface_hub InferenceClient).
+        model_label = {
+            "anthropic": ANTHROPIC_MODEL_ID,
+            "huggingface": HF_MODEL_ID,
+            "zerogpu": ZEROGPU_MODEL_ID,
+        }.get(provider, provider)
         return (
             f"⚠ The diagnostic call to {provider} ({model_label}) failed "
             f"({type(e).__name__}). Try again in a moment, switch providers in "
     provider_choices = [
         (f"Anthropic — {ANTHROPIC_MODEL_ID} (requires ANTHROPIC_API_KEY)", "anthropic"),
+        (f"HuggingFace API — {HF_MODEL_ID} (free on HF Spaces; HF_TOKEN locally)", "huggingface"),
     ]
+    if _zerogpu_available():
+        provider_choices.append((
+            f"ZeroGPU — {ZEROGPU_MODEL_ID} (free GPU via HuggingFace Pro plan)",
+            "zerogpu",
+        ))
     with gr.Blocks(title="The Compounding Test") as demo:
         gr.Markdown(
                 label="Model provider",
                 info=(
                     "Claude gives the highest-quality writeups but needs your "
+                    "own ANTHROPIC_API_KEY. ZeroGPU runs an open-weight model "
+                    "(Phi-4-mini-instruct by default) on the Space's free Pro "
+                    "GPU — no API costs, no inference credits. HuggingFace API "
+                    "routes to an open model through the HF Inference Providers "
+                    "API — works without any keys on a Space via the Space's "
+                    "monthly credits. Switch to compare writeup quality."
                 ),
             )
         submit = gr.Button("Diagnose", variant="primary")

requirements.txt CHANGED Viewed

@@ -1,5 +1,19 @@
 gradio>=4.0
-anthropic>=0.39
-huggingface_hub>=0.27
 python-dotenv>=1.0
 pytest>=8.0

+# Core (always required)
 gradio>=4.0
 python-dotenv>=1.0
 pytest>=8.0
+# Anthropic backend (only needed if MODEL_PROVIDER=anthropic)
+anthropic>=0.39
+# HuggingFace API backend (only needed if MODEL_PROVIDER=huggingface)
+huggingface_hub>=0.27
+# ZeroGPU backend (only needed if MODEL_PROVIDER=zerogpu, i.e. running
+# on a HuggingFace Pro Space with on-demand A100/H100 allocation).
+# These are heavy (~2GB total via torch); local-only users who do not
+# plan to use the zerogpu backend can omit these.
+spaces>=0.30
+transformers>=4.45
+torch>=2.4
+accelerate>=1.0

test_diagnose.py CHANGED Viewed

@@ -16,6 +16,7 @@ from app import (
     _detect_provider,
     parse_response,
 )
 # --- Fixtures ---------------------------------------------------------------
@@ -205,11 +206,36 @@ def test_detect_provider_huggingface_when_only_hf_token_set():
     assert _detect_provider({"HF_TOKEN": "hf-xxx"}) == "huggingface"
-def test_detect_provider_huggingface_when_running_on_hf_space():
-    # HF Spaces sets SPACE_ID automatically and provides free inference credits
     assert _detect_provider({"SPACE_ID": "mile-hi-ai/compounding-test"}) == "huggingface"
 def test_detect_provider_alt_hf_token_var():
     # HuggingFace SDKs also recognize HUGGING_FACE_HUB_TOKEN
     assert _detect_provider({"HUGGING_FACE_HUB_TOKEN": "hf-xxx"}) == "huggingface"

     _detect_provider,
     parse_response,
 )
+import app as app_module
 # --- Fixtures ---------------------------------------------------------------
     assert _detect_provider({"HF_TOKEN": "hf-xxx"}) == "huggingface"
+def test_detect_provider_huggingface_when_running_on_hf_space_without_zerogpu(monkeypatch):
+    # On a Space WITHOUT ZeroGPU deps installed, fall back to the inference API.
+    monkeypatch.setattr(app_module, "_zerogpu_available", lambda: False)
     assert _detect_provider({"SPACE_ID": "mile-hi-ai/compounding-test"}) == "huggingface"
+def test_detect_provider_prefers_zerogpu_on_pro_space_with_deps(monkeypatch):
+    # On a Space WITH ZeroGPU deps installed (transformers + torch + spaces),
+    # default to the free GPU backend rather than burning inference credits.
+    monkeypatch.setattr(app_module, "_zerogpu_available", lambda: True)
+    assert _detect_provider({"SPACE_ID": "mile-hi-ai/compounding-test"}) == "zerogpu"
+def test_detect_provider_explicit_anthropic_wins_over_zerogpu(monkeypatch):
+    # Explicit MODEL_PROVIDER beats the zerogpu auto-detect even on a Pro Space.
+    monkeypatch.setattr(app_module, "_zerogpu_available", lambda: True)
+    env = {"MODEL_PROVIDER": "anthropic", "SPACE_ID": "mile-hi-ai/compounding-test"}
+    assert _detect_provider(env) == "anthropic"
+def test_detect_provider_explicit_zerogpu_wins():
+    assert _detect_provider({"MODEL_PROVIDER": "zerogpu"}) == "zerogpu"
+def test_zerogpu_is_in_providers_dict():
+    # Even when deps aren't installed locally, the provider key exists so the
+    # UI dropdown can show it (the stub raises a clear error if invoked).
+    assert "zerogpu" in PROVIDERS
 def test_detect_provider_alt_hf_token_var():
     # HuggingFace SDKs also recognize HUGGING_FACE_HUB_TOKEN
     assert _detect_provider({"HUGGING_FACE_HUB_TOKEN": "hf-xxx"}) == "huggingface"