tei-annotator / docs /huggingface-deployment.md
cmboulanger's picture
feat: Add batch size configuration in api and frontends
b530e33
# Deploying to HuggingFace Spaces
`app.py` at the repository root is a Gradio app ready for deployment on HuggingFace Spaces.
**How billing works:** The Space owner sets their `HF_TOKEN` as a Space secret. All inference calls use that token; visitors use the app without any login or token input. HF PRO accounts include a generous free inference quota on `router.huggingface.co`.
---
## Step 1 β€” Create the Space
On [huggingface.co/new-space](https://huggingface.co/new-space), choose **Gradio** as the SDK.
HF generates a `README.md` with YAML frontmatter. Make sure it contains at minimum:
```yaml
---
sdk: gradio
sdk_version: "6.9.0"
python_version: "3.12"
app_file: app.py
hardware: cpu-basic
---
```
> **Why `cpu-basic`?** The app makes HTTP calls to external LLM APIs β€” it does not run any local GPU workloads. Using `cpu-basic` avoids the GPU-slot allocation overhead (5–15 s per request) and GPU-task timeout issues that come with ZeroGPU (`zero-a10g`) hardware.
## Step 2 β€” Push the repository
```bash
git remote add space https://huggingface.co/spaces/<your-username>/<space-name>
git push space main
```
HF Spaces reads `requirements.txt` at the repo root and installs dependencies automatically.
## Step 3 β€” Set the HF_TOKEN secret
In your Space's **Settings β†’ Variables and Secrets**, add a **Secret**:
| Secret name | Value |
| --- | --- |
| `HF_TOKEN` | Your HuggingFace API token ([create one here](https://huggingface.co/settings/tokens)) |
> **Token permissions required:** The token must have the **"Make calls to Inference Providers"** scope enabled (under "Inference" when creating/editing the token at https://huggingface.co/settings/tokens). Without this scope, all annotation and evaluation calls will return HTTP 403.
The app shows a setup warning if this secret is missing.
## Step 4 β€” Verify
Once the Space has built, open its URL and annotate a sample text.
---
## Model list
Models are defined in `app.py` (`_HF_MODELS`), mirrored in `webservice/main.py`. All are pinned to inference providers that work from AWS-hosted Spaces (nscale, scaleway). Providers blocked from AWS β€” groq, cerebras, together-ai, sambanova β€” are avoided.
---
## Local development
```bash
uv sync --extra gradio
HF_TOKEN=hf_... uv run task gradio
# opens at http://localhost:7860
```
Set `HF_TOKEN` to a token with the "Make calls to Inference Providers" scope. You can also put it in a `.env` file at the repo root:
```bash
echo "HF_TOKEN=hf_..." > .env
uv run task gradio
```