Spaces:

AshwinP
/

compounding-test

Sleeping

apingali

feat(hf-space): add ZeroGPU backend for HuggingFace Pro Spaces

971e3f4 about 2 months ago

4.08 kB

	# Copy to .env (gitignored). The Space supports TWO model backends; you
	# only need credentials for whichever one(s) you want to use. On the
	# deployed HuggingFace Space, leave .env empty and set these as Space
	# secrets in the Settings panel instead.
	#
	# ============================================================
	# PROVIDER SELECTION
	# ============================================================
	# Optional. If unset, the app auto-detects based on which credentials
	# are present and whether we are running on a HuggingFace Space (see
	# app.py::_detect_provider). Valid values:
	# anthropic — Claude via the Anthropic SDK (best writeup quality)
	# huggingface — Open models (Gemma 2 / Phi-4 / Llama-3.3 / Qwen) via
	# HF Inference Providers API. Free on HF Spaces via
	# the Space's monthly credits; HF_TOKEN locally.
	# zerogpu — Open model (Phi-4-mini-instruct by default) loaded
	# locally in the Space and run on free on-demand GPU
	# via the HuggingFace Pro plan's ZeroGPU allocation.
	# No API round-trip; no inference credits burned.
	# Auto-detect precedence: Pro Space → zerogpu, else Anthropic key →
	# anthropic, else HF_TOKEN or any Space → huggingface, else anthropic.
	# MODEL_PROVIDER=

	# ============================================================
	# ANTHROPIC BACKEND
	# ============================================================
	# Required for the anthropic backend. Get one at console.anthropic.com.
	ANTHROPIC_API_KEY=your-anthropic-api-key-here

	# Optional. claude-opus-4-7 is the default — produces materially better
	# diagnostic writeups. claude-sonnet-4-6 is a cost-optimized fallback;
	# benchmark before flipping (research.md R15).
	MODEL_ID=claude-opus-4-7

	# ============================================================
	# HUGGINGFACE BACKEND
	# ============================================================
	# Optional locally — get one at huggingface.co/settings/tokens. NOT
	# required on a deployed HuggingFace Space (the Space identity is used
	# automatically and includes free monthly inference credits).
	# HF_TOKEN=your-hf-token-here

	# Optional. Default google/gemma-2-9b-it works well and is widely
	# available on HF Inference Providers. Other tested choices:
	# microsoft/Phi-4-mini-instruct — smaller, faster, decent JSON
	# meta-llama/Llama-3.3-70B-Instruct — slower, very high quality
	# Qwen/Qwen2.5-72B-Instruct — strong on structured output
	# Smaller open models can be looser than Claude on schema adherence;
	# the parser raises MalformedResponseError on bad output and the UI
	# shows a "try again" message rather than crashing.
	# HF_MODEL_ID=google/gemma-2-9b-it

	# ============================================================
	# ZEROGPU BACKEND (HuggingFace Pro plan)
	# ============================================================
	# No credentials required — the @spaces.GPU decorator handles allocation
	# automatically when the Space has a Pro owner. Locally, the function
	# decoration is a no-op and the model runs on CPU (slow, smoke-test only).
	#
	# Optional. Default microsoft/Phi-4-mini-instruct fits on the standard
	# A100 allocation with fast cold start. Other tested choices:
	# google/gemma-2-9b-it — larger, slower load, more capable
	# meta-llama/Llama-3.3-8B-Instruct — Llama 3.3 8B, good JSON adherence
	# microsoft/phi-4 — full 14B Phi-4, slower
	# HuggingFace's gated models (Llama, etc.) need HF_TOKEN to download.
	# ZEROGPU_MODEL_ID=microsoft/Phi-4-mini-instruct

	# Optional. Maximum GPU allocation per request, in seconds. The Pro
	# plan allows up to 120s per request; raise/lower to balance cold-start
	# coverage vs. quota use.
	# ZEROGPU_DURATION_SECONDS=120

	# ============================================================
	# VALIDATION
	# ============================================================
	# Word-count cap on the description Textbox. The Gradio validator
	# rejects submissions outside 200–MAX_DESCRIPTION_WORDS.
	MAX_DESCRIPTION_WORDS=5000