Spaces:
Sleeping
Sleeping
Hugging Face Deployment
Primary target: Hugging Face Docker Space on upgraded GPU hardware.
What Gets Deployed
- FastAPI backend
- static web frontend
- Hugging Face OAuth routes
- ephemeral SQLite-backed session queue
Required Space Settings
- SDK:
Docker - Port:
7860 - OAuth: enabled via README metadata
- Hardware: upgraded GPU recommended
Recommended Runtime Variables
Core:
MODEL_NAME=sapientinc/HRM-Text-1BDEVICE_PREFERENCE=autoDTYPE_PREFERENCE=autoATTN_IMPLEMENTATION=eagerLOW_CPU_MEM_USAGE=trueTRUST_REMOTE_CODE=truePRELOAD_MODEL=true
Traffic limits:
MAX_TRACE_TOKENS=256MAX_SENTENCES=16JOB_WORKERS=1MAX_QUEUED_JOBS=8MAX_ACTIVE_JOBS_PER_USER=2REQUIRE_AUTH=true
Deploy Flow
- Create new Hugging Face Space with
DockerSDK. - Push repo contents.
- Set runtime variables in Space settings.
- Upgrade hardware.
- Wait for build.
- Verify:
GET /healthz- sign-in works
- one short analysis completes
- JSON / CSV export works
Operational Notes
- Local disk is ephemeral. Session history disappears on restart.
- OAuth helper is mocked locally but real inside Space.
- Keep public defaults conservative. Long traces can OOM small GPUs.
- If queue pressure grows, lower token caps before increasing worker count.
Common Failure Modes
attn_implementationnot eager:- attribution disabled for model
- unsupported model layout:
- generation may work, attribution fails early with clear error
- OOM:
- reduce
MAX_TRACE_TOKENS,MAX_SENTENCES, or choose larger GPU
- reduce
- cold start slow:
- keep
PRELOAD_MODEL=true
- keep