# Hugging Face Deployment Primary target: Hugging Face `Docker Space` on upgraded GPU hardware. ## What Gets Deployed - FastAPI backend - static web frontend - Hugging Face OAuth routes - ephemeral SQLite-backed session queue ## Required Space Settings - SDK: `Docker` - Port: `7860` - OAuth: enabled via README metadata - Hardware: upgraded GPU recommended ## Recommended Runtime Variables Core: - `MODEL_NAME=sapientinc/HRM-Text-1B` - `DEVICE_PREFERENCE=auto` - `DTYPE_PREFERENCE=auto` - `ATTN_IMPLEMENTATION=eager` - `LOW_CPU_MEM_USAGE=true` - `TRUST_REMOTE_CODE=true` - `PRELOAD_MODEL=true` Traffic limits: - `MAX_TRACE_TOKENS=256` - `MAX_SENTENCES=16` - `JOB_WORKERS=1` - `MAX_QUEUED_JOBS=8` - `MAX_ACTIVE_JOBS_PER_USER=2` - `REQUIRE_AUTH=true` ## Deploy Flow 1. Create new Hugging Face Space with `Docker` SDK. 2. Push repo contents. 3. Set runtime variables in Space settings. 4. Upgrade hardware. 5. Wait for build. 6. Verify: - `GET /healthz` - sign-in works - one short analysis completes - JSON / CSV export works ## Operational Notes - Local disk is ephemeral. Session history disappears on restart. - OAuth helper is mocked locally but real inside Space. - Keep public defaults conservative. Long traces can OOM small GPUs. - If queue pressure grows, lower token caps before increasing worker count. ## Common Failure Modes - `attn_implementation` not eager: - attribution disabled for model - unsupported model layout: - generation may work, attribution fails early with clear error - OOM: - reduce `MAX_TRACE_TOKENS`, `MAX_SENTENCES`, or choose larger GPU - cold start slow: - keep `PRELOAD_MODEL=true`