Spaces:

BART-ender
/

cot-anc

Sleeping

App Files Files Community

cot-anc / docs /deploy-huggingface.md

BART-ender's picture

Switch default model to HRM-Text-1B

2620860 verified 3 days ago

|

history blame contribute delete

1.65 kB

Hugging Face Deployment

Primary target: Hugging Face Docker Space on upgraded GPU hardware.

What Gets Deployed

FastAPI backend
static web frontend
Hugging Face OAuth routes
ephemeral SQLite-backed session queue

Required Space Settings

SDK: Docker
Port: 7860
OAuth: enabled via README metadata
Hardware: upgraded GPU recommended

Recommended Runtime Variables

Core:

MODEL_NAME=sapientinc/HRM-Text-1B
DEVICE_PREFERENCE=auto
DTYPE_PREFERENCE=auto
ATTN_IMPLEMENTATION=eager
LOW_CPU_MEM_USAGE=true
TRUST_REMOTE_CODE=true
PRELOAD_MODEL=true

Traffic limits:

MAX_TRACE_TOKENS=256
MAX_SENTENCES=16
JOB_WORKERS=1
MAX_QUEUED_JOBS=8
MAX_ACTIVE_JOBS_PER_USER=2
REQUIRE_AUTH=true

Deploy Flow

Create new Hugging Face Space with Docker SDK.
Push repo contents.
Set runtime variables in Space settings.
Upgrade hardware.
Wait for build.
Verify:
- GET /healthz
- sign-in works
- one short analysis completes
- JSON / CSV export works

Operational Notes

Local disk is ephemeral. Session history disappears on restart.
OAuth helper is mocked locally but real inside Space.
Keep public defaults conservative. Long traces can OOM small GPUs.
If queue pressure grows, lower token caps before increasing worker count.

Common Failure Modes

attn_implementation not eager:
- attribution disabled for model
unsupported model layout:
- generation may work, attribution fails early with clear error
OOM:
- reduce MAX_TRACE_TOKENS, MAX_SENTENCES, or choose larger GPU
cold start slow:
- keep PRELOAD_MODEL=true