tei-annotator / docs /huggingface-deployment.md
cmboulanger's picture
feat: Add batch size configuration in api and frontends
b530e33

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

Deploying to HuggingFace Spaces

app.py at the repository root is a Gradio app ready for deployment on HuggingFace Spaces.

How billing works: The Space owner sets their HF_TOKEN as a Space secret. All inference calls use that token; visitors use the app without any login or token input. HF PRO accounts include a generous free inference quota on router.huggingface.co.


Step 1 β€” Create the Space

On huggingface.co/new-space, choose Gradio as the SDK.

HF generates a README.md with YAML frontmatter. Make sure it contains at minimum:

---
sdk: gradio
sdk_version: "6.9.0"
python_version: "3.12"
app_file: app.py
hardware: cpu-basic
---

Why cpu-basic? The app makes HTTP calls to external LLM APIs β€” it does not run any local GPU workloads. Using cpu-basic avoids the GPU-slot allocation overhead (5–15 s per request) and GPU-task timeout issues that come with ZeroGPU (zero-a10g) hardware.

Step 2 β€” Push the repository

git remote add space https://huggingface.co/spaces/<your-username>/<space-name>
git push space main

HF Spaces reads requirements.txt at the repo root and installs dependencies automatically.

Step 3 β€” Set the HF_TOKEN secret

In your Space's Settings β†’ Variables and Secrets, add a Secret:

Secret name Value
HF_TOKEN Your HuggingFace API token (create one here)

Token permissions required: The token must have the "Make calls to Inference Providers" scope enabled (under "Inference" when creating/editing the token at https://huggingface.co/settings/tokens). Without this scope, all annotation and evaluation calls will return HTTP 403.

The app shows a setup warning if this secret is missing.

Step 4 β€” Verify

Once the Space has built, open its URL and annotate a sample text.


Model list

Models are defined in app.py (_HF_MODELS), mirrored in webservice/main.py. All are pinned to inference providers that work from AWS-hosted Spaces (nscale, scaleway). Providers blocked from AWS β€” groq, cerebras, together-ai, sambanova β€” are avoided.


Local development

uv sync --extra gradio
HF_TOKEN=hf_... uv run task gradio
# opens at http://localhost:7860

Set HF_TOKEN to a token with the "Make calls to Inference Providers" scope. You can also put it in a .env file at the repo root:

echo "HF_TOKEN=hf_..." > .env
uv run task gradio