Raise endpoint request timeout to 600s (configurable via QUEST_REQUEST_TIMEOUT)
Browse filesReason: 120s wasn't enough for long-horizon Quest-4B research runs on the
private HF Inference Endpoint — queries that span many turns or visit
large pages were getting killed by the urllib3 read timeout before the
model could finish a turn.
How to apply: the 10-minute default is plenty for most runs; set the
QUEST_REQUEST_TIMEOUT Space secret to override if a specific deployment
needs more or less.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
app.py
CHANGED
|
@@ -1210,14 +1210,15 @@ def _build_client_for_model(model: str) -> Tuple[InferenceClient, str, List[str]
|
|
| 1210 |
shared HF Inference API and let the starter fall back across free models.
|
| 1211 |
"""
|
| 1212 |
token = os.getenv("HF_TOKEN")
|
|
|
|
| 1213 |
if model == QUEST_MODEL_ID and QUEST_BASE_URL:
|
| 1214 |
client = InferenceClient(
|
| 1215 |
base_url=QUEST_BASE_URL,
|
| 1216 |
token=token,
|
| 1217 |
-
timeout=
|
| 1218 |
)
|
| 1219 |
return client, QUEST_ENDPOINT_MODEL, []
|
| 1220 |
-
client = InferenceClient(token=token, timeout=
|
| 1221 |
return client, model, []
|
| 1222 |
|
| 1223 |
|
|
|
|
| 1210 |
shared HF Inference API and let the starter fall back across free models.
|
| 1211 |
"""
|
| 1212 |
token = os.getenv("HF_TOKEN")
|
| 1213 |
+
quest_timeout = int(os.getenv("QUEST_REQUEST_TIMEOUT", "600"))
|
| 1214 |
if model == QUEST_MODEL_ID and QUEST_BASE_URL:
|
| 1215 |
client = InferenceClient(
|
| 1216 |
base_url=QUEST_BASE_URL,
|
| 1217 |
token=token,
|
| 1218 |
+
timeout=quest_timeout,
|
| 1219 |
)
|
| 1220 |
return client, QUEST_ENDPOINT_MODEL, []
|
| 1221 |
+
client = InferenceClient(token=token, timeout=quest_timeout)
|
| 1222 |
return client, model, []
|
| 1223 |
|
| 1224 |
|