apingali Claude Opus 4.7 (1M context) commited on
Commit
971e3f4
Β·
1 Parent(s): 033a83e

feat(hf-space): add ZeroGPU backend for HuggingFace Pro Spaces

Browse files

User has a HF Pro plan, so the Space can use ZeroGPU: free on-demand
A10g/A100/H200 allocation per request, no API round-trip, no inference
credits burned. We load an open model directly via transformers and
decorate the inference call with @spaces.GPU so the Pro plan picks up
the GPU automatically. Default model is Phi-4-mini-instruct (small
enough for fast cold start); swappable to Gemma 2 9B, Llama-3.3, or
full Phi-4 via ZEROGPU_MODEL_ID.

What changed:
app.py
- try/except imports of `spaces`, `torch`, `transformers` at
module load; sets _ZEROGPU_DEPS_AVAILABLE accordingly.
_zerogpu_available() wrapper exists so tests can monkeypatch
the answer without touching the real imports.
- _call_zerogpu(system, user) decorated with
`@spaces.GPU(duration=ZEROGPU_DURATION_SECONDS)` when deps are
available; replaced with a clear-error stub otherwise.
- Lazy model load (_load_zerogpu_model) β€” kept warm across
requests in module-level state; cold start only on first call
after Space process restart.
- _detect_provider precedence updated: Pro Space (SPACE_ID + deps)
β†’ zerogpu, else anthropic, else huggingface, else anthropic.
Explicit MODEL_PROVIDER=anthropic still wins on a Pro Space.
- PROVIDERS dict always includes zerogpu (even when deps absent β€”
the stub raises a clear error); UI only surfaces the dropdown
option when deps are importable.
- F14 error message now resolves the model_label across all three
providers via dict lookup.

README.md (NEW)
HuggingFace Spaces YAML header specifying `sdk: gradio`,
`app_file: app.py`, `hardware: zero-a10g` so the deployed Space
actually receives ZeroGPU allocation. Also user-facing docs:
backend table, auto-detect precedence, configuration, local-dev
setup, test commands.

requirements.txt
Restructured with section comments. Added:
+ spaces>=0.30
+ transformers>=4.45
+ torch>=2.4
+ accelerate>=1.0
Documented that local-only users (anthropic backend) can omit
the heavy zerogpu lines.

.env.example
Added ZEROGPU BACKEND section: ZEROGPU_MODEL_ID with tested
alternatives, ZEROGPU_DURATION_SECONDS. Updated PROVIDER
SELECTION docs to include zerogpu and the new precedence.

test_diagnose.py
+ 4 zerogpu detection tests (Pro Space with deps β†’ zerogpu,
Space without deps β†’ huggingface fallback, explicit
MODEL_PROVIDER=zerogpu wins, explicit anthropic beats zerogpu
auto-detect).
+ 1 PROVIDERS-dict test (zerogpu always present so the
dispatcher is uniform regardless of dep availability).
All 31 tests pass (15 parser + 16 provider).

specs/004-berkshire-test/contracts/hf-space-interface.md Β§2
Added Β§2.C with the full ZeroGPU invocation pattern (lazy load,
chat-template inference, prompt-stripped decode). Updated
provider-selection precedence table. Added "Required Space
metadata" subsection explaining the hardware: zero-a10g header.
Updated cache-strategy notes with ZeroGPU cold-start behavior.

specs/004-berkshire-test/tasks.md
T037 rationale updated to reflect the three-backend
architecture and the README.md Space-metadata addition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (5) hide show
  1. .env.example +31 -5
  2. README.md +86 -0
  3. app.py +127 -14
  4. requirements.txt +16 -2
  5. test_diagnose.py +28 -2
.env.example CHANGED
@@ -7,12 +7,18 @@
7
  # PROVIDER SELECTION
8
  # ============================================================
9
  # Optional. If unset, the app auto-detects based on which credentials
10
- # are present (see app.py::_detect_provider). Valid values:
 
11
  # anthropic β€” Claude via the Anthropic SDK (best writeup quality)
12
- # huggingface β€” Gemma 2 / Phi-4 / Llama-3.3 / Qwen via HF Inference
13
- # Providers (works with no Anthropic key; free on
14
- # HF Spaces via the Space's identity)
15
- # Leave blank for auto-detect.
 
 
 
 
 
16
  # MODEL_PROVIDER=
17
 
18
  # ============================================================
@@ -44,6 +50,26 @@ MODEL_ID=claude-opus-4-7
44
  # shows a "try again" message rather than crashing.
45
  # HF_MODEL_ID=google/gemma-2-9b-it
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  # ============================================================
48
  # VALIDATION
49
  # ============================================================
 
7
  # PROVIDER SELECTION
8
  # ============================================================
9
  # Optional. If unset, the app auto-detects based on which credentials
10
+ # are present and whether we are running on a HuggingFace Space (see
11
+ # app.py::_detect_provider). Valid values:
12
  # anthropic β€” Claude via the Anthropic SDK (best writeup quality)
13
+ # huggingface β€” Open models (Gemma 2 / Phi-4 / Llama-3.3 / Qwen) via
14
+ # HF Inference Providers API. Free on HF Spaces via
15
+ # the Space's monthly credits; HF_TOKEN locally.
16
+ # zerogpu β€” Open model (Phi-4-mini-instruct by default) loaded
17
+ # locally in the Space and run on free on-demand GPU
18
+ # via the HuggingFace Pro plan's ZeroGPU allocation.
19
+ # No API round-trip; no inference credits burned.
20
+ # Auto-detect precedence: Pro Space β†’ zerogpu, else Anthropic key β†’
21
+ # anthropic, else HF_TOKEN or any Space β†’ huggingface, else anthropic.
22
  # MODEL_PROVIDER=
23
 
24
  # ============================================================
 
50
  # shows a "try again" message rather than crashing.
51
  # HF_MODEL_ID=google/gemma-2-9b-it
52
 
53
+ # ============================================================
54
+ # ZEROGPU BACKEND (HuggingFace Pro plan)
55
+ # ============================================================
56
+ # No credentials required β€” the @spaces.GPU decorator handles allocation
57
+ # automatically when the Space has a Pro owner. Locally, the function
58
+ # decoration is a no-op and the model runs on CPU (slow, smoke-test only).
59
+ #
60
+ # Optional. Default microsoft/Phi-4-mini-instruct fits on the standard
61
+ # A100 allocation with fast cold start. Other tested choices:
62
+ # google/gemma-2-9b-it β€” larger, slower load, more capable
63
+ # meta-llama/Llama-3.3-8B-Instruct β€” Llama 3.3 8B, good JSON adherence
64
+ # microsoft/phi-4 β€” full 14B Phi-4, slower
65
+ # HuggingFace's gated models (Llama, etc.) need HF_TOKEN to download.
66
+ # ZEROGPU_MODEL_ID=microsoft/Phi-4-mini-instruct
67
+
68
+ # Optional. Maximum GPU allocation per request, in seconds. The Pro
69
+ # plan allows up to 120s per request; raise/lower to balance cold-start
70
+ # coverage vs. quota use.
71
+ # ZEROGPU_DURATION_SECONDS=120
72
+
73
  # ============================================================
74
  # VALIDATION
75
  # ============================================================
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: The Compounding Test
3
+ emoji: πŸ“ˆ
4
+ colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: A diagnostic for AI investments at non-tech companies.
12
+ hardware: zero-a10g
13
+ ---
14
+
15
+ # The Compounding Test
16
+
17
+ A diagnostic for AI investments at non-technology companies. Paste a
18
+ description of your AI initiative (200–5000 words); receive a scored
19
+ writeup in one of four quadrants β€” **compounder**, **one-shot win**,
20
+ **compounding the wrong thing**, or **Roman Candle**.
21
+
22
+ Framework essay: <https://www.mile-hi.ai/journal/the-berkshire-test>
23
+
24
+ ## Backends
25
+
26
+ The Space supports three interchangeable model backends. The dropdown
27
+ in the UI lets you switch per-submission to compare writeup quality.
28
+
29
+ | Backend | Model (default) | Credentials | Where it runs |
30
+ |---|---|---|---|
31
+ | `anthropic` | `claude-opus-4-7` | `ANTHROPIC_API_KEY` (Space secret) | Anthropic API |
32
+ | `huggingface` | `google/gemma-2-9b-it` | none on a Space; `HF_TOKEN` locally | HF Inference Providers |
33
+ | `zerogpu` | `microsoft/Phi-4-mini-instruct` | none β€” Pro plan handles it | On-Space ZeroGPU |
34
+
35
+ **Auto-detect precedence:**
36
+
37
+ 1. Explicit `MODEL_PROVIDER` env var wins.
38
+ 2. On a Pro Space (zerogpu deps installed) β†’ `zerogpu`.
39
+ 3. Else if `ANTHROPIC_API_KEY` is set β†’ `anthropic`.
40
+ 4. Else if `HF_TOKEN` is set, or running on any Space β†’ `huggingface`.
41
+ 5. Else fall through to `anthropic` (call-time error guides the user).
42
+
43
+ ## Configuration
44
+
45
+ See `.env.example` for the full list of env vars. Common overrides:
46
+
47
+ ```
48
+ MODEL_PROVIDER=zerogpu
49
+ ZEROGPU_MODEL_ID=google/gemma-2-9b-it # bigger; ~30s cold start on A10
50
+ ZEROGPU_DURATION_SECONDS=120 # max GPU allocation per request
51
+ HF_MODEL_ID=meta-llama/Llama-3.3-70B-Instruct
52
+ MODEL_ID=claude-sonnet-4-6 # cheaper Anthropic fallback
53
+ ```
54
+
55
+ ## Local development
56
+
57
+ ```bash
58
+ python3 -m venv .venv && source .venv/bin/activate
59
+ pip install -r requirements.txt # ~2GB with torch/transformers
60
+ cp .env.example .env # fill in whatever you have
61
+ python app.py # http://127.0.0.1:7860
62
+ ```
63
+
64
+ If you only need to test the `anthropic` backend locally, you can skip
65
+ the heavy `spaces` / `torch` / `transformers` / `accelerate` lines in
66
+ `requirements.txt` β€” the app degrades gracefully (the zerogpu dropdown
67
+ option won't appear).
68
+
69
+ ## Tests
70
+
71
+ ```bash
72
+ pytest test_diagnose.py -v
73
+ ```
74
+
75
+ 31 tests covering the parser contract (15 β€” what JSON shapes the parser
76
+ accepts and rejects) and the provider routing (16 β€” auto-detection
77
+ precedence, dispatcher routing, env-driven overrides).
78
+
79
+ ## Repository
80
+
81
+ Source lives in [apingali/effectiveness][repo] under
82
+ `gradio-apps/compounding-test/`. The Space is deployed from that path.
83
+ The published framework essay and four portrait articles live at
84
+ <https://www.mile-hi.ai/journal/the-berkshire-test>.
85
+
86
+ [repo]: https://github.com/apingali/effectiveness
app.py CHANGED
@@ -5,8 +5,8 @@ the two-axis Berkshire Test for AI and returns a scored writeup.
5
 
6
  Architecture per specs/004-berkshire-test/contracts/hf-space-interface.md:
7
  - Inputs: a description (200–5000 words) + 3 optional clarifiers.
8
- - Two backends, selectable by env (`MODEL_PROVIDER`) or auto-detected
9
- from available credentials:
10
  * anthropic β€” Claude Opus / Sonnet via the Anthropic SDK;
11
  system block is `cache_control:ephemeral` so
12
  subsequent calls hit the 5-minute prefix cache.
@@ -15,6 +15,13 @@ Architecture per specs/004-berkshire-test/contracts/hf-space-interface.md:
15
  huggingface_hub InferenceClient. Works on HF
16
  Spaces with the Space's free inference credits;
17
  locally requires HF_TOKEN.
 
 
 
 
 
 
 
18
  - Output: two Gradio tabs β€” markdown writeup + raw JSON.
19
 
20
  Engine/Site boundary (Principle VIII): this app lives in gradio-apps/
@@ -181,10 +188,34 @@ ROOT = Path(__file__).parent
181
 
182
  ANTHROPIC_MODEL_ID = os.environ.get("MODEL_ID", "claude-opus-4-7")
183
  HF_MODEL_ID = os.environ.get("HF_MODEL_ID", "google/gemma-2-9b-it")
 
 
184
  MAX_DESCRIPTION_WORDS = int(os.environ.get("MAX_DESCRIPTION_WORDS", "5000"))
185
  MIN_DESCRIPTION_WORDS = 200
186
 
187
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
188
  # ---------------------------------------------------------------------------
189
  # Provider abstraction (anthropic vs huggingface β€” selectable at runtime)
190
  # ---------------------------------------------------------------------------
@@ -192,17 +223,22 @@ MIN_DESCRIPTION_WORDS = 200
192
 
193
  def _detect_provider(env=None) -> str:
194
  """Pick a model provider from env. Order of precedence:
195
- 1. Explicit MODEL_PROVIDER (anthropic | huggingface).
196
- 2. Presence of ANTHROPIC_API_KEY β†’ anthropic.
197
- 3. Presence of HF_TOKEN / HUGGING_FACE_HUB_TOKEN, or running on
198
- a HuggingFace Space (SPACE_ID set) β†’ huggingface.
199
- 4. Fall through to anthropic (call-time error will tell the user
 
 
 
200
  which env to set).
201
  """
202
  env = env if env is not None else os.environ
203
  explicit = env.get("MODEL_PROVIDER", "").strip().lower()
204
- if explicit in ("anthropic", "huggingface"):
205
  return explicit
 
 
206
  if env.get("ANTHROPIC_API_KEY"):
207
  return "anthropic"
208
  if (
@@ -260,9 +296,75 @@ def _call_huggingface(system_block: str, user_prompt: str) -> str:
260
  return resp.choices[0].message.content
261
 
262
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
263
  PROVIDERS = {
264
  "anthropic": _call_anthropic,
265
  "huggingface": _call_huggingface,
 
266
  }
267
 
268
 
@@ -394,7 +496,11 @@ def diagnose(
394
  except Exception as e:
395
  # API timeout / rate limit / auth / server / network failure
396
  # (Anthropic SDK or huggingface_hub InferenceClient).
397
- model_label = ANTHROPIC_MODEL_ID if provider == "anthropic" else HF_MODEL_ID
 
 
 
 
398
  return (
399
  f"⚠ The diagnostic call to {provider} ({model_label}) failed "
400
  f"({type(e).__name__}). Try again in a moment, switch providers in "
@@ -439,8 +545,13 @@ def build_demo():
439
 
440
  provider_choices = [
441
  (f"Anthropic β€” {ANTHROPIC_MODEL_ID} (requires ANTHROPIC_API_KEY)", "anthropic"),
442
- (f"HuggingFace β€” {HF_MODEL_ID} (free on HF Spaces; HF_TOKEN locally)", "huggingface"),
443
  ]
 
 
 
 
 
444
 
445
  with gr.Blocks(title="The Compounding Test") as demo:
446
  gr.Markdown(
@@ -475,10 +586,12 @@ def build_demo():
475
  label="Model provider",
476
  info=(
477
  "Claude gives the highest-quality writeups but needs your "
478
- "own ANTHROPIC_API_KEY. The HuggingFace backend runs on a "
479
- "smaller open-weight model and works on a deployed HF Space "
480
- "without any keys, so it's the easiest way to try the "
481
- "diagnostic without signing up for anything."
 
 
482
  ),
483
  )
484
  submit = gr.Button("Diagnose", variant="primary")
 
5
 
6
  Architecture per specs/004-berkshire-test/contracts/hf-space-interface.md:
7
  - Inputs: a description (200–5000 words) + 3 optional clarifiers.
8
+ - Three backends, selectable by env (`MODEL_PROVIDER`) or auto-detected
9
+ from available credentials and runtime environment:
10
  * anthropic β€” Claude Opus / Sonnet via the Anthropic SDK;
11
  system block is `cache_control:ephemeral` so
12
  subsequent calls hit the 5-minute prefix cache.
 
15
  huggingface_hub InferenceClient. Works on HF
16
  Spaces with the Space's free inference credits;
17
  locally requires HF_TOKEN.
18
+ * zerogpu β€” Open model (Phi-4-mini-instruct by default)
19
+ loaded LOCALLY in the Space via transformers,
20
+ decorated with `@spaces.GPU` so a HuggingFace
21
+ Pro plan gets free on-demand A100/H100 GPU
22
+ allocation per request. No per-call credit burn;
23
+ no API round-trip. Requires the Space to have a
24
+ Pro owner; locally falls back to CPU (slow).
25
  - Output: two Gradio tabs β€” markdown writeup + raw JSON.
26
 
27
  Engine/Site boundary (Principle VIII): this app lives in gradio-apps/
 
188
 
189
  ANTHROPIC_MODEL_ID = os.environ.get("MODEL_ID", "claude-opus-4-7")
190
  HF_MODEL_ID = os.environ.get("HF_MODEL_ID", "google/gemma-2-9b-it")
191
+ ZEROGPU_MODEL_ID = os.environ.get("ZEROGPU_MODEL_ID", "microsoft/Phi-4-mini-instruct")
192
+ ZEROGPU_DURATION_SECONDS = int(os.environ.get("ZEROGPU_DURATION_SECONDS", "120"))
193
  MAX_DESCRIPTION_WORDS = int(os.environ.get("MAX_DESCRIPTION_WORDS", "5000"))
194
  MIN_DESCRIPTION_WORDS = 200
195
 
196
 
197
+ # ZeroGPU availability is detected at import time. The `spaces` package
198
+ # is HuggingFace's runtime for on-demand GPU allocation; `transformers`
199
+ # + `torch` are required to actually load and run the model. All three
200
+ # must be importable for the zerogpu backend to function.
201
+ try:
202
+ import spaces as _spaces
203
+ import torch as _torch
204
+ from transformers import AutoModelForCausalLM as _AutoModelForCausalLM
205
+ from transformers import AutoTokenizer as _AutoTokenizer
206
+
207
+ _ZEROGPU_DEPS_AVAILABLE = True
208
+ except ImportError:
209
+ _ZEROGPU_DEPS_AVAILABLE = False
210
+
211
+
212
+ def _zerogpu_available() -> bool:
213
+ """Return whether the zerogpu backend can be used. Wrapped as a
214
+ function so tests can monkeypatch the answer without touching the
215
+ real torch/transformers imports."""
216
+ return _ZEROGPU_DEPS_AVAILABLE
217
+
218
+
219
  # ---------------------------------------------------------------------------
220
  # Provider abstraction (anthropic vs huggingface β€” selectable at runtime)
221
  # ---------------------------------------------------------------------------
 
223
 
224
  def _detect_provider(env=None) -> str:
225
  """Pick a model provider from env. Order of precedence:
226
+ 1. Explicit MODEL_PROVIDER (anthropic | huggingface | zerogpu).
227
+ 2. Running on a HuggingFace Space (SPACE_ID set) AND the zerogpu
228
+ deps (spaces + transformers + torch) are importable β†’ zerogpu.
229
+ This is the Pro-plan free-GPU path.
230
+ 3. Presence of ANTHROPIC_API_KEY β†’ anthropic.
231
+ 4. Presence of HF_TOKEN / HUGGING_FACE_HUB_TOKEN, or running on
232
+ a HuggingFace Space without zerogpu deps β†’ huggingface.
233
+ 5. Fall through to anthropic (call-time error will tell the user
234
  which env to set).
235
  """
236
  env = env if env is not None else os.environ
237
  explicit = env.get("MODEL_PROVIDER", "").strip().lower()
238
+ if explicit in ("anthropic", "huggingface", "zerogpu"):
239
  return explicit
240
+ if env.get("SPACE_ID") and _zerogpu_available():
241
+ return "zerogpu"
242
  if env.get("ANTHROPIC_API_KEY"):
243
  return "anthropic"
244
  if (
 
296
  return resp.choices[0].message.content
297
 
298
 
299
+ # ZeroGPU backend. The model is loaded once on first call (lazy) and
300
+ # kept warm in module-level state so subsequent requests reuse it.
301
+ # The `@spaces.GPU` decorator MUST be applied at function-definition
302
+ # time on a Pro Space β€” outside a Space, the decorator is a no-op and
303
+ # the function just runs on CPU (very slow, useful only for smoke tests).
304
+ _zerogpu_model = None
305
+ _zerogpu_tokenizer = None
306
+
307
+
308
+ def _load_zerogpu_model():
309
+ """Load the model + tokenizer once. Called lazily on first request
310
+ so module import stays fast (the model weights are tens of GB)."""
311
+ global _zerogpu_model, _zerogpu_tokenizer
312
+ if _zerogpu_model is not None:
313
+ return
314
+ _zerogpu_tokenizer = _AutoTokenizer.from_pretrained(ZEROGPU_MODEL_ID)
315
+ _zerogpu_model = _AutoModelForCausalLM.from_pretrained(
316
+ ZEROGPU_MODEL_ID,
317
+ torch_dtype=_torch.bfloat16,
318
+ device_map="auto",
319
+ )
320
+
321
+
322
+ if _ZEROGPU_DEPS_AVAILABLE:
323
+
324
+ @_spaces.GPU(duration=ZEROGPU_DURATION_SECONDS)
325
+ def _call_zerogpu(system_block: str, user_prompt: str) -> str:
326
+ """ZeroGPU backend. Loads Phi-4-mini-instruct (or whatever
327
+ ZEROGPU_MODEL_ID points at) into the Space's allocated GPU and
328
+ runs chat-template inference. Returns the assistant text only β€”
329
+ prompt tokens are stripped before decoding."""
330
+ _load_zerogpu_model()
331
+ messages = [
332
+ {"role": "system", "content": system_block},
333
+ {"role": "user", "content": user_prompt},
334
+ ]
335
+ inputs = _zerogpu_tokenizer.apply_chat_template(
336
+ messages,
337
+ return_tensors="pt",
338
+ add_generation_prompt=True,
339
+ ).to(_zerogpu_model.device)
340
+ outputs = _zerogpu_model.generate(
341
+ inputs,
342
+ max_new_tokens=2500,
343
+ temperature=0.2,
344
+ do_sample=True,
345
+ pad_token_id=_zerogpu_tokenizer.eos_token_id,
346
+ )
347
+ prompt_len = inputs.shape[1]
348
+ return _zerogpu_tokenizer.decode(
349
+ outputs[0][prompt_len:], skip_special_tokens=True
350
+ )
351
+
352
+ else:
353
+
354
+ def _call_zerogpu(system_block: str, user_prompt: str) -> str:
355
+ raise RuntimeError(
356
+ "ZeroGPU backend requires `spaces`, `transformers`, and `torch` "
357
+ "to be importable AND should be run on a HuggingFace Pro Space "
358
+ "for free on-demand GPU. Install the full requirements.txt and "
359
+ "deploy to a Space, or pick anthropic / huggingface from the "
360
+ "provider dropdown."
361
+ )
362
+
363
+
364
  PROVIDERS = {
365
  "anthropic": _call_anthropic,
366
  "huggingface": _call_huggingface,
367
+ "zerogpu": _call_zerogpu,
368
  }
369
 
370
 
 
496
  except Exception as e:
497
  # API timeout / rate limit / auth / server / network failure
498
  # (Anthropic SDK or huggingface_hub InferenceClient).
499
+ model_label = {
500
+ "anthropic": ANTHROPIC_MODEL_ID,
501
+ "huggingface": HF_MODEL_ID,
502
+ "zerogpu": ZEROGPU_MODEL_ID,
503
+ }.get(provider, provider)
504
  return (
505
  f"⚠ The diagnostic call to {provider} ({model_label}) failed "
506
  f"({type(e).__name__}). Try again in a moment, switch providers in "
 
545
 
546
  provider_choices = [
547
  (f"Anthropic β€” {ANTHROPIC_MODEL_ID} (requires ANTHROPIC_API_KEY)", "anthropic"),
548
+ (f"HuggingFace API β€” {HF_MODEL_ID} (free on HF Spaces; HF_TOKEN locally)", "huggingface"),
549
  ]
550
+ if _zerogpu_available():
551
+ provider_choices.append((
552
+ f"ZeroGPU β€” {ZEROGPU_MODEL_ID} (free GPU via HuggingFace Pro plan)",
553
+ "zerogpu",
554
+ ))
555
 
556
  with gr.Blocks(title="The Compounding Test") as demo:
557
  gr.Markdown(
 
586
  label="Model provider",
587
  info=(
588
  "Claude gives the highest-quality writeups but needs your "
589
+ "own ANTHROPIC_API_KEY. ZeroGPU runs an open-weight model "
590
+ "(Phi-4-mini-instruct by default) on the Space's free Pro "
591
+ "GPU β€” no API costs, no inference credits. HuggingFace API "
592
+ "routes to an open model through the HF Inference Providers "
593
+ "API β€” works without any keys on a Space via the Space's "
594
+ "monthly credits. Switch to compare writeup quality."
595
  ),
596
  )
597
  submit = gr.Button("Diagnose", variant="primary")
requirements.txt CHANGED
@@ -1,5 +1,19 @@
 
1
  gradio>=4.0
2
- anthropic>=0.39
3
- huggingface_hub>=0.27
4
  python-dotenv>=1.0
5
  pytest>=8.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core (always required)
2
  gradio>=4.0
 
 
3
  python-dotenv>=1.0
4
  pytest>=8.0
5
+
6
+ # Anthropic backend (only needed if MODEL_PROVIDER=anthropic)
7
+ anthropic>=0.39
8
+
9
+ # HuggingFace API backend (only needed if MODEL_PROVIDER=huggingface)
10
+ huggingface_hub>=0.27
11
+
12
+ # ZeroGPU backend (only needed if MODEL_PROVIDER=zerogpu, i.e. running
13
+ # on a HuggingFace Pro Space with on-demand A100/H100 allocation).
14
+ # These are heavy (~2GB total via torch); local-only users who do not
15
+ # plan to use the zerogpu backend can omit these.
16
+ spaces>=0.30
17
+ transformers>=4.45
18
+ torch>=2.4
19
+ accelerate>=1.0
test_diagnose.py CHANGED
@@ -16,6 +16,7 @@ from app import (
16
  _detect_provider,
17
  parse_response,
18
  )
 
19
 
20
 
21
  # --- Fixtures ---------------------------------------------------------------
@@ -205,11 +206,36 @@ def test_detect_provider_huggingface_when_only_hf_token_set():
205
  assert _detect_provider({"HF_TOKEN": "hf-xxx"}) == "huggingface"
206
 
207
 
208
- def test_detect_provider_huggingface_when_running_on_hf_space():
209
- # HF Spaces sets SPACE_ID automatically and provides free inference credits
 
210
  assert _detect_provider({"SPACE_ID": "mile-hi-ai/compounding-test"}) == "huggingface"
211
 
212
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213
  def test_detect_provider_alt_hf_token_var():
214
  # HuggingFace SDKs also recognize HUGGING_FACE_HUB_TOKEN
215
  assert _detect_provider({"HUGGING_FACE_HUB_TOKEN": "hf-xxx"}) == "huggingface"
 
16
  _detect_provider,
17
  parse_response,
18
  )
19
+ import app as app_module
20
 
21
 
22
  # --- Fixtures ---------------------------------------------------------------
 
206
  assert _detect_provider({"HF_TOKEN": "hf-xxx"}) == "huggingface"
207
 
208
 
209
+ def test_detect_provider_huggingface_when_running_on_hf_space_without_zerogpu(monkeypatch):
210
+ # On a Space WITHOUT ZeroGPU deps installed, fall back to the inference API.
211
+ monkeypatch.setattr(app_module, "_zerogpu_available", lambda: False)
212
  assert _detect_provider({"SPACE_ID": "mile-hi-ai/compounding-test"}) == "huggingface"
213
 
214
 
215
+ def test_detect_provider_prefers_zerogpu_on_pro_space_with_deps(monkeypatch):
216
+ # On a Space WITH ZeroGPU deps installed (transformers + torch + spaces),
217
+ # default to the free GPU backend rather than burning inference credits.
218
+ monkeypatch.setattr(app_module, "_zerogpu_available", lambda: True)
219
+ assert _detect_provider({"SPACE_ID": "mile-hi-ai/compounding-test"}) == "zerogpu"
220
+
221
+
222
+ def test_detect_provider_explicit_anthropic_wins_over_zerogpu(monkeypatch):
223
+ # Explicit MODEL_PROVIDER beats the zerogpu auto-detect even on a Pro Space.
224
+ monkeypatch.setattr(app_module, "_zerogpu_available", lambda: True)
225
+ env = {"MODEL_PROVIDER": "anthropic", "SPACE_ID": "mile-hi-ai/compounding-test"}
226
+ assert _detect_provider(env) == "anthropic"
227
+
228
+
229
+ def test_detect_provider_explicit_zerogpu_wins():
230
+ assert _detect_provider({"MODEL_PROVIDER": "zerogpu"}) == "zerogpu"
231
+
232
+
233
+ def test_zerogpu_is_in_providers_dict():
234
+ # Even when deps aren't installed locally, the provider key exists so the
235
+ # UI dropdown can show it (the stub raises a clear error if invoked).
236
+ assert "zerogpu" in PROVIDERS
237
+
238
+
239
  def test_detect_provider_alt_hf_token_var():
240
  # HuggingFace SDKs also recognize HUGGING_FACE_HUB_TOKEN
241
  assert _detect_provider({"HUGGING_FACE_HUB_TOKEN": "hf-xxx"}) == "huggingface"