Spaces:
Sleeping
Sleeping
| # Hugging Face Deployment | |
| Primary target: Hugging Face `Docker Space` on upgraded GPU hardware. | |
| ## What Gets Deployed | |
| - FastAPI backend | |
| - static web frontend | |
| - Hugging Face OAuth routes | |
| - ephemeral SQLite-backed session queue | |
| ## Required Space Settings | |
| - SDK: `Docker` | |
| - Port: `7860` | |
| - OAuth: enabled via README metadata | |
| - Hardware: upgraded GPU recommended | |
| ## Recommended Runtime Variables | |
| Core: | |
| - `MODEL_NAME=sapientinc/HRM-Text-1B` | |
| - `DEVICE_PREFERENCE=auto` | |
| - `DTYPE_PREFERENCE=auto` | |
| - `ATTN_IMPLEMENTATION=eager` | |
| - `LOW_CPU_MEM_USAGE=true` | |
| - `TRUST_REMOTE_CODE=true` | |
| - `PRELOAD_MODEL=true` | |
| Traffic limits: | |
| - `MAX_TRACE_TOKENS=256` | |
| - `MAX_SENTENCES=16` | |
| - `JOB_WORKERS=1` | |
| - `MAX_QUEUED_JOBS=8` | |
| - `MAX_ACTIVE_JOBS_PER_USER=2` | |
| - `REQUIRE_AUTH=true` | |
| ## Deploy Flow | |
| 1. Create new Hugging Face Space with `Docker` SDK. | |
| 2. Push repo contents. | |
| 3. Set runtime variables in Space settings. | |
| 4. Upgrade hardware. | |
| 5. Wait for build. | |
| 6. Verify: | |
| - `GET /healthz` | |
| - sign-in works | |
| - one short analysis completes | |
| - JSON / CSV export works | |
| ## Operational Notes | |
| - Local disk is ephemeral. Session history disappears on restart. | |
| - OAuth helper is mocked locally but real inside Space. | |
| - Keep public defaults conservative. Long traces can OOM small GPUs. | |
| - If queue pressure grows, lower token caps before increasing worker count. | |
| ## Common Failure Modes | |
| - `attn_implementation` not eager: | |
| - attribution disabled for model | |
| - unsupported model layout: | |
| - generation may work, attribution fails early with clear error | |
| - OOM: | |
| - reduce `MAX_TRACE_TOKENS`, `MAX_SENTENCES`, or choose larger GPU | |
| - cold start slow: | |
| - keep `PRELOAD_MODEL=true` | |