Spaces:

AshwinP
/

compounding-test

Running on Zero

apingali Claude Opus 4.7 (1M context) commited on 5 days ago

Commit

e0a9313

1 Parent(s): 6fc254e

fix(hf-space): cap ZeroGPU duration at 120s + actionable HF_TOKEN error

User-reported live errors (verbose reporting from prior commit caught
them cleanly):

1. ZeroGPU: 'The requested GPU duration (300s) is larger than the
maximum allowed.' — Pro-tier per-request cap is 120s in practice,
not 300s as I'd assumed. Reverted default; comment now explains
that the cap is quota-tier dependent and override via env var if
the owner has a higher allocation.

2. HuggingFace API: 'You must provide an api_key to work with auto
API or log in with hf auth login.' — public HF Spaces do NOT
auto-inject HF_TOKEN. The user has to add it as a Space secret
manually. _call_huggingface now:
- Checks HF_TOKEN, HUGGING_FACE_HUB_TOKEN, AND get_token() (which
reads from `hf auth login`'s cached token)
- If all three sources are empty, raises a RuntimeError that
quotes the exact UI path: 'Settings → Repository secrets → New
secret → name: HF_TOKEN' and the URL to generate one. Tells
the user to pick a different model in the meantime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

app.py +20 -2

app.py CHANGED Viewed

@@ -189,7 +189,10 @@ ROOT = Path(__file__).parent
 ANTHROPIC_MODEL_ID = os.environ.get("MODEL_ID", "claude-opus-4-7")
 HF_MODEL_ID = os.environ.get("HF_MODEL_ID", "google/gemma-2-9b-it")
 ZEROGPU_MODEL_ID = os.environ.get("ZEROGPU_MODEL_ID", "microsoft/Phi-4-mini-instruct")
-ZEROGPU_DURATION_SECONDS = int(os.environ.get("ZEROGPU_DURATION_SECONDS", "300"))
 MAX_DESCRIPTION_WORDS = int(os.environ.get("MAX_DESCRIPTION_WORDS", "5000"))
 MIN_DESCRIPTION_WORDS = 200
@@ -277,13 +280,28 @@ def _call_huggingface(system_block: str, user_prompt: str) -> str:
     Phi-4-mini-instruct, Llama-3.3, Qwen 2.5, and many others. Lower
     temperature (0.2) than the SDK default to keep JSON output stable —
     smaller open models can be looser than Claude on schema adherence.
     """
-    from huggingface_hub import InferenceClient
     token = (
         os.environ.get("HF_TOKEN")
         or os.environ.get("HUGGING_FACE_HUB_TOKEN")
     )
     client = InferenceClient(model=HF_MODEL_ID, token=token, timeout=120)
     resp = client.chat_completion(
         messages=[

 ANTHROPIC_MODEL_ID = os.environ.get("MODEL_ID", "claude-opus-4-7")
 HF_MODEL_ID = os.environ.get("HF_MODEL_ID", "google/gemma-2-9b-it")
 ZEROGPU_MODEL_ID = os.environ.get("ZEROGPU_MODEL_ID", "microsoft/Phi-4-mini-instruct")
+# ZeroGPU's per-request duration cap depends on the Space owner's quota
+# tier. 120s is the Pro-tier default; we found 300s exceeds the limit.
+# Override via env var if your tier allows longer.
+ZEROGPU_DURATION_SECONDS = int(os.environ.get("ZEROGPU_DURATION_SECONDS", "120"))
 MAX_DESCRIPTION_WORDS = int(os.environ.get("MAX_DESCRIPTION_WORDS", "5000"))
 MIN_DESCRIPTION_WORDS = 200
     Phi-4-mini-instruct, Llama-3.3, Qwen 2.5, and many others. Lower
     temperature (0.2) than the SDK default to keep JSON output stable —
     smaller open models can be looser than Claude on schema adherence.
+    Requires an HF token: HF_TOKEN env var, HUGGING_FACE_HUB_TOKEN env
+    var, or a `hf auth login`-stored token (huggingface_hub.get_token()
+    checks all three sources). HF Spaces do NOT auto-inject a token on
+    public Spaces — the Space owner has to add it as a Space secret.
+    Raise a clear, actionable error if missing.
     """
+    from huggingface_hub import InferenceClient, get_token
     token = (
         os.environ.get("HF_TOKEN")
         or os.environ.get("HUGGING_FACE_HUB_TOKEN")
+        or get_token()  # checks ~/.cache/huggingface/token from `hf auth login`
     )
+    if not token:
+        raise RuntimeError(
+            "No HuggingFace token found. The Space owner needs to add HF_TOKEN "
+            "as a Space secret (Settings → Repository secrets → New secret → "
+            "name: HF_TOKEN, value: a User Access Token from "
+            "https://huggingface.co/settings/tokens). Then restart the Space. "
+            "Until then, pick a different model from the dropdown."
+        )
     client = InferenceClient(model=HF_MODEL_ID, token=token, timeout=120)
     resp = client.chat_completion(
         messages=[