Spaces:
Sleeping
Sleeping
File size: 1,654 Bytes
fda8fb3 2620860 fda8fb3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | # Hugging Face Deployment
Primary target: Hugging Face `Docker Space` on upgraded GPU hardware.
## What Gets Deployed
- FastAPI backend
- static web frontend
- Hugging Face OAuth routes
- ephemeral SQLite-backed session queue
## Required Space Settings
- SDK: `Docker`
- Port: `7860`
- OAuth: enabled via README metadata
- Hardware: upgraded GPU recommended
## Recommended Runtime Variables
Core:
- `MODEL_NAME=sapientinc/HRM-Text-1B`
- `DEVICE_PREFERENCE=auto`
- `DTYPE_PREFERENCE=auto`
- `ATTN_IMPLEMENTATION=eager`
- `LOW_CPU_MEM_USAGE=true`
- `TRUST_REMOTE_CODE=true`
- `PRELOAD_MODEL=true`
Traffic limits:
- `MAX_TRACE_TOKENS=256`
- `MAX_SENTENCES=16`
- `JOB_WORKERS=1`
- `MAX_QUEUED_JOBS=8`
- `MAX_ACTIVE_JOBS_PER_USER=2`
- `REQUIRE_AUTH=true`
## Deploy Flow
1. Create new Hugging Face Space with `Docker` SDK.
2. Push repo contents.
3. Set runtime variables in Space settings.
4. Upgrade hardware.
5. Wait for build.
6. Verify:
- `GET /healthz`
- sign-in works
- one short analysis completes
- JSON / CSV export works
## Operational Notes
- Local disk is ephemeral. Session history disappears on restart.
- OAuth helper is mocked locally but real inside Space.
- Keep public defaults conservative. Long traces can OOM small GPUs.
- If queue pressure grows, lower token caps before increasing worker count.
## Common Failure Modes
- `attn_implementation` not eager:
- attribution disabled for model
- unsupported model layout:
- generation may work, attribution fails early with clear error
- OOM:
- reduce `MAX_TRACE_TOKENS`, `MAX_SENTENCES`, or choose larger GPU
- cold start slow:
- keep `PRELOAD_MODEL=true`
|