compounding-test / .env.example
apingali
feat(hf-space): add ZeroGPU backend for HuggingFace Pro Spaces
971e3f4
# Copy to .env (gitignored). The Space supports TWO model backends; you
# only need credentials for whichever one(s) you want to use. On the
# deployed HuggingFace Space, leave .env empty and set these as Space
# secrets in the Settings panel instead.
#
# ============================================================
# PROVIDER SELECTION
# ============================================================
# Optional. If unset, the app auto-detects based on which credentials
# are present and whether we are running on a HuggingFace Space (see
# app.py::_detect_provider). Valid values:
# anthropic β€” Claude via the Anthropic SDK (best writeup quality)
# huggingface β€” Open models (Gemma 2 / Phi-4 / Llama-3.3 / Qwen) via
# HF Inference Providers API. Free on HF Spaces via
# the Space's monthly credits; HF_TOKEN locally.
# zerogpu β€” Open model (Phi-4-mini-instruct by default) loaded
# locally in the Space and run on free on-demand GPU
# via the HuggingFace Pro plan's ZeroGPU allocation.
# No API round-trip; no inference credits burned.
# Auto-detect precedence: Pro Space β†’ zerogpu, else Anthropic key β†’
# anthropic, else HF_TOKEN or any Space β†’ huggingface, else anthropic.
# MODEL_PROVIDER=
# ============================================================
# ANTHROPIC BACKEND
# ============================================================
# Required for the anthropic backend. Get one at console.anthropic.com.
ANTHROPIC_API_KEY=your-anthropic-api-key-here
# Optional. claude-opus-4-7 is the default β€” produces materially better
# diagnostic writeups. claude-sonnet-4-6 is a cost-optimized fallback;
# benchmark before flipping (research.md R15).
MODEL_ID=claude-opus-4-7
# ============================================================
# HUGGINGFACE BACKEND
# ============================================================
# Optional locally β€” get one at huggingface.co/settings/tokens. NOT
# required on a deployed HuggingFace Space (the Space identity is used
# automatically and includes free monthly inference credits).
# HF_TOKEN=your-hf-token-here
# Optional. Default google/gemma-2-9b-it works well and is widely
# available on HF Inference Providers. Other tested choices:
# microsoft/Phi-4-mini-instruct β€” smaller, faster, decent JSON
# meta-llama/Llama-3.3-70B-Instruct β€” slower, very high quality
# Qwen/Qwen2.5-72B-Instruct β€” strong on structured output
# Smaller open models can be looser than Claude on schema adherence;
# the parser raises MalformedResponseError on bad output and the UI
# shows a "try again" message rather than crashing.
# HF_MODEL_ID=google/gemma-2-9b-it
# ============================================================
# ZEROGPU BACKEND (HuggingFace Pro plan)
# ============================================================
# No credentials required β€” the @spaces.GPU decorator handles allocation
# automatically when the Space has a Pro owner. Locally, the function
# decoration is a no-op and the model runs on CPU (slow, smoke-test only).
#
# Optional. Default microsoft/Phi-4-mini-instruct fits on the standard
# A100 allocation with fast cold start. Other tested choices:
# google/gemma-2-9b-it β€” larger, slower load, more capable
# meta-llama/Llama-3.3-8B-Instruct β€” Llama 3.3 8B, good JSON adherence
# microsoft/phi-4 β€” full 14B Phi-4, slower
# HuggingFace's gated models (Llama, etc.) need HF_TOKEN to download.
# ZEROGPU_MODEL_ID=microsoft/Phi-4-mini-instruct
# Optional. Maximum GPU allocation per request, in seconds. The Pro
# plan allows up to 120s per request; raise/lower to balance cold-start
# coverage vs. quota use.
# ZEROGPU_DURATION_SECONDS=120
# ============================================================
# VALIDATION
# ============================================================
# Word-count cap on the description Textbox. The Gradio validator
# rejects submissions outside 200–MAX_DESCRIPTION_WORDS.
MAX_DESCRIPTION_WORDS=5000