Spaces:

cmboulanger
/

tei-annotator

Runtime error

App Files Files Community

tei-annotator / docs /huggingface-deployment.md

cmboulanger

feat: Add batch size configuration in api and frontends

b530e33 about 1 month ago

preview code

raw

history blame contribute delete

2.53 kB

	# Deploying to HuggingFace Spaces

	`app.py` at the repository root is a Gradio app ready for deployment on HuggingFace Spaces.

	How billing works: The Space owner sets their `HF_TOKEN` as a Space secret. All inference calls use that token; visitors use the app without any login or token input. HF PRO accounts include a generous free inference quota on `router.huggingface.co`.

	---

	## Step 1 — Create the Space

	On [huggingface.co/new-space](https://huggingface.co/new-space), choose Gradio as the SDK.

	HF generates a `README.md` with YAML frontmatter. Make sure it contains at minimum:

	```yaml
	---
	sdk: gradio
	sdk_version: "6.9.0"
	python_version: "3.12"
	app_file: app.py
	hardware: cpu-basic
	---
	```

	> Why `cpu-basic`? The app makes HTTP calls to external LLM APIs — it does not run any local GPU workloads. Using `cpu-basic` avoids the GPU-slot allocation overhead (5–15 s per request) and GPU-task timeout issues that come with ZeroGPU (`zero-a10g`) hardware.

	## Step 2 — Push the repository

	```bash
	git remote add space https://huggingface.co/spaces/<your-username>/<space-name>
	git push space main
	```

	HF Spaces reads `requirements.txt` at the repo root and installs dependencies automatically.

	## Step 3 — Set the HF_TOKEN secret

	In your Space's Settings → Variables and Secrets, add a Secret:

	\| Secret name \| Value \|
	\| --- \| --- \|
	\| `HF_TOKEN` \| Your HuggingFace API token ([create one here](https://huggingface.co/settings/tokens)) \|

	> Token permissions required: The token must have the "Make calls to Inference Providers" scope enabled (under "Inference" when creating/editing the token at https://huggingface.co/settings/tokens). Without this scope, all annotation and evaluation calls will return HTTP 403.

	The app shows a setup warning if this secret is missing.

	## Step 4 — Verify

	Once the Space has built, open its URL and annotate a sample text.

	---

	## Model list

	Models are defined in `app.py` (`_HF_MODELS`), mirrored in `webservice/main.py`. All are pinned to inference providers that work from AWS-hosted Spaces (nscale, scaleway). Providers blocked from AWS — groq, cerebras, together-ai, sambanova — are avoided.

	---

	## Local development

	```bash
	uv sync --extra gradio
	HF_TOKEN=hf_... uv run task gradio
	# opens at http://localhost:7860
	```

	Set `HF_TOKEN` to a token with the "Make calls to Inference Providers" scope. You can also put it in a `.env` file at the repo root:

	```bash
	echo "HF_TOKEN=hf_..." > .env
	uv run task gradio
	```