TomLii Claude Sonnet 4.6 commited on
Commit
d6a02bb
·
1 Parent(s): 3dd5ddd

Raise endpoint request timeout to 600s (configurable via QUEST_REQUEST_TIMEOUT)

Browse files

Reason: 120s wasn't enough for long-horizon Quest-4B research runs on the
private HF Inference Endpoint — queries that span many turns or visit
large pages were getting killed by the urllib3 read timeout before the
model could finish a turn.

How to apply: the 10-minute default is plenty for most runs; set the
QUEST_REQUEST_TIMEOUT Space secret to override if a specific deployment
needs more or less.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +3 -2
app.py CHANGED
@@ -1210,14 +1210,15 @@ def _build_client_for_model(model: str) -> Tuple[InferenceClient, str, List[str]
1210
  shared HF Inference API and let the starter fall back across free models.
1211
  """
1212
  token = os.getenv("HF_TOKEN")
 
1213
  if model == QUEST_MODEL_ID and QUEST_BASE_URL:
1214
  client = InferenceClient(
1215
  base_url=QUEST_BASE_URL,
1216
  token=token,
1217
- timeout=120,
1218
  )
1219
  return client, QUEST_ENDPOINT_MODEL, []
1220
- client = InferenceClient(token=token, timeout=60)
1221
  return client, model, []
1222
 
1223
 
 
1210
  shared HF Inference API and let the starter fall back across free models.
1211
  """
1212
  token = os.getenv("HF_TOKEN")
1213
+ quest_timeout = int(os.getenv("QUEST_REQUEST_TIMEOUT", "600"))
1214
  if model == QUEST_MODEL_ID and QUEST_BASE_URL:
1215
  client = InferenceClient(
1216
  base_url=QUEST_BASE_URL,
1217
  token=token,
1218
+ timeout=quest_timeout,
1219
  )
1220
  return client, QUEST_ENDPOINT_MODEL, []
1221
+ client = InferenceClient(token=token, timeout=quest_timeout)
1222
  return client, model, []
1223
 
1224