Spaces:

osunlp
/

QUEST

Running

TomLii Claude Sonnet 4.6 commited on Apr 21

Commit

d6a02bb

1 Parent(s): 3dd5ddd

Raise endpoint request timeout to 600s (configurable via QUEST_REQUEST_TIMEOUT)

Reason: 120s wasn't enough for long-horizon Quest-4B research runs on the
private HF Inference Endpoint — queries that span many turns or visit
large pages were getting killed by the urllib3 read timeout before the
model could finish a turn.

How to apply: the 10-minute default is plenty for most runs; set the
QUEST_REQUEST_TIMEOUT Space secret to override if a specific deployment
needs more or less.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show

app.py +3 -2

app.py CHANGED Viewed

@@ -1210,14 +1210,15 @@ def _build_client_for_model(model: str) -> Tuple[InferenceClient, str, List[str]
     shared HF Inference API and let the starter fall back across free models.
     """
     token = os.getenv("HF_TOKEN")
     if model == QUEST_MODEL_ID and QUEST_BASE_URL:
         client = InferenceClient(
             base_url=QUEST_BASE_URL,
             token=token,
-            timeout=120,
         )
         return client, QUEST_ENDPOINT_MODEL, []
-    client = InferenceClient(token=token, timeout=60)
     return client, model, []

     shared HF Inference API and let the starter fall back across free models.
     """
     token = os.getenv("HF_TOKEN")
+    quest_timeout = int(os.getenv("QUEST_REQUEST_TIMEOUT", "600"))
     if model == QUEST_MODEL_ID and QUEST_BASE_URL:
         client = InferenceClient(
             base_url=QUEST_BASE_URL,
             token=token,
+            timeout=quest_timeout,
         )
         return client, QUEST_ENDPOINT_MODEL, []
+    client = InferenceClient(token=token, timeout=quest_timeout)
     return client, model, []