Spaces:

osunlp
/

QUEST

Running

App Files Files Community

QUEST / README.md

Lzy01241010

ui: capitalize QUEST in Space title, browser tab title, footer copy

154aaf2 2 days ago

preview code

raw

history blame contribute delete

4.64 kB

	---
	title: QUEST
	emoji: 🔎
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.29.0
	app_file: app.py
	pinned: false
	---

	# DeepResearch Space

	An interactive Hugging Face Space for a Quest DeepResearch agent. The app
	can either talk to `osunlp/QUEST-35B` (our own fine-tuned research model,
	routed through a private HF Inference Endpoint) or fall back to open-weights
	models through the shared HF Inference API.

	Supported tools:
	- `search` (DuckDuckGo, multi-query)
	- `visit` (HTTP fetch + text extraction, multi-URL)
	- lightweight research-state summary to cut repeated work
	- `<answer>` extraction for the final response

	---

	## 1) Use our own `osunlp/QUEST-35B` model (recommended)

	Because the model is private during the beta, it is not on the free
	Inference API. You host it yourself on a dedicated HF Inference Endpoint
	(pay-as-you-go, scale-to-zero), and point this Space at it.

	### 1a) Create the endpoint once

	1. Open <https://ui.endpoints.huggingface.co/> and click "New endpoint".
	2. Model repository: `osunlp/QUEST-35B` (use a token with access).
	3. Hardware: `1x Nvidia L4 (24GB)` is usually the sweet spot for a 35B
	model. `Nvidia T4 small (16GB)` works too and is cheaper.
	4. Advanced → Container Type: keep `Text Generation Inference` (TGI) or
	pick `vLLM`. Both expose an OpenAI-compatible `/v1/` route.
	5. Autoscaling → Scale-to-Zero: enable it so you only pay when the
	endpoint is serving traffic.
	6. Hit Create endpoint. After ~1–2 minutes it turns `Running` and shows a
	base URL like `https://abcdef.us-east-1.aws.endpoints.huggingface.cloud`.

	### 1b) Tell the Space how to reach it

	In this Space's Settings → Secrets / Variables:

	\| Name \| Value \| Why \|
	\|---\|---\|---\|
	\| `HF_TOKEN` \| your personal HF token with read access to `osunlp/QUEST-35B` \| pulls private weights & authenticates the endpoint call \|
	\| `QUEST_BASE_URL` \| the endpoint URL ending with `/v1/` (e.g. `https://abcdef.us-east-1.aws.endpoints.huggingface.cloud/v1/`) \| tells the app to route chat completions to your endpoint \|
	\| `QUEST_ENDPOINT_MODEL` \| `tgi` (default; set to the original repo id `osunlp/QUEST-35B` if you deployed with vLLM) \| some containers need the exact model name \|
	\| `DEFAULT_MODEL` \| `osunlp/QUEST-35B` \| preselects the right option in the UI \|

	Click Restart this Space. The `Model` dropdown now shows
	`osunlp/QUEST-35B` at the top; selecting it routes requests through your
	endpoint.

	> Cost reality-check: on a 1× L4 at `$0.80/hr` with Scale-to-Zero, a small
	> internal beta (a handful of testers, dozens of queries per day) typically
	> stays under \$100/month. You can stop the endpoint manually from the UI
	> any time to freeze costs.

	---

	## 2) Fallback: free open-weights models

	If you just want to try the UI without spinning up an endpoint, pick any of
	these in the dropdown. They run through the shared HF Inference API.

	- `Qwen/Qwen3-8B`
	- `google/gemma-3-12b-it`
	- `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`
	- `Qwen/Qwen2.5-7B-Instruct`
	- `meta-llama/Llama-3.1-8B-Instruct`

	Only `HF_TOKEN` is required for this path.

	---

	## 3) Share the beta with org members (without paying for Team)

	Option A (simplest, \$0 for access, Space Hardware stays on free CPU):

	1. Keep the Space under your personal account.
	2. Settings → Visibility → Private.
	3. Settings → Collaborators → add each tester by HF username.
	4. Endpoint lives under your personal namespace too, so the bill goes to
	your personal payment method (you can expense invoices from
	<https://huggingface.co/settings/billing>).

	Option B (org-level billing): upgrade the organization to a Team plan and
	recreate both the Space and the endpoint under the org namespace.

	---

	## 4) Local development

	```bash
	python -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	export HF_TOKEN=... # required
	export QUEST_BASE_URL=https://.../v1/ # optional; only if testing against the endpoint
	python app.py
	```

	---

	## 5) Architecture notes

	- `app.py` uses `huggingface_hub.InferenceClient(base_url=QUEST_BASE_URL, ...)`
	for the private-endpoint path and the same client without `base_url` for the
	shared API path.
	- The system prompt matches the schema QUEST-35B was trained on (array-based
	`search` / `visit` with an explicit `goal`), so the private model stays
	in-distribution. The open-weights fallbacks also follow the same schema.
	- Visited URLs and search queries are cached in-process so repeated tool
	calls don't re-hit the network.
	- `<answer>...</answer>` terminates the ReAct loop.