Spaces:
Configuration error
Phase 2C β Public Deployment Runbook
This runbook captures every step needed to (re)deploy the Image Captioning System to its public hosts: weights to the HuggingFace Hub, backend to a HuggingFace Space, frontend to Vercel, and the CI/CD chain wiring it all together. It is written so a future maintainer (or the author six months from now) can rebuild the public deployment from a cold start without reading commit history.
0. Topology
GitHub (apoorvrajdev/image-captioning-system, main)
βββ Actions: CI β Deploy backend to HuggingFace Space (workflow_run chained)
βββ Vercel Git Integration β image-captioning-system.vercel.app
HuggingFace Hub
βββ Model repo: apoorvrajdev/captioning-inceptionv3-transformer (weights + vocab, tag v1.0.0)
βββ Space: apoorvrajdev/image-captioning-api (Docker SDK, cpu-basic, port 7860)
The Space pulls weights from the model repo at lifespan startup via
huggingface_hub.snapshot_download, so the Space's git tree never contains
model.h5 β only the code that knows how to fetch it.
1. Live URLs
| Component | URL |
|---|---|
| Frontend SPA | https://image-captioning-system.vercel.app |
| Backend API | https://apoorvrajdev-image-captioning-api.hf.space |
| Backend health | https://apoorvrajdev-image-captioning-api.hf.space/healthz |
| Backend docs (Swagger) | https://apoorvrajdev-image-captioning-api.hf.space/docs |
| Weights repo | https://huggingface.co/apoorvrajdev/captioning-inceptionv3-transformer |
| Space console | https://huggingface.co/spaces/apoorvrajdev/image-captioning-api |
2. Prerequisites
- Local git working tree on
main, clean - Python 3.11 venv with
requirements.txt+requirements-dev.txtinstalled - A HuggingFace account and a personal access token with Write scope
(Settings β Access Tokens). Used both in the local shell (
huggingface-cli login) and as a GitHub Actions secret namedHF_TOKEN - A Vercel account connected to the GitHub repo
3. Weights upload (WS-B) β only when shipping a new checkpoint
The Space's BACKEND_WEIGHTS_HUB_REVISION variable pins which Hub revision
the backend pulls at startup, so weights and code can be versioned
independently.
# 1. Login (token cached at ~/.cache/huggingface/token)
huggingface-cli login
# 2. Upload the contents of models/vX.Y.Z/ to the Hub repo
python - <<'PY'
from huggingface_hub import HfApi
api = HfApi()
api.upload_folder(
repo_id="apoorvrajdev/captioning-inceptionv3-transformer",
folder_path="models/v1.0.0",
path_in_repo=".",
commit_message="upload v1.0.0 weights + vocab",
)
api.create_tag(
repo_id="apoorvrajdev/captioning-inceptionv3-transformer",
tag="v1.0.0",
tag_message="v1.0.0 dev-scaffold weights",
)
PY
# 3. Verify the snapshot round-trips byte-for-byte
HF_HUB_DISABLE_SYMLINKS=1 python - <<'PY'
import hashlib, pathlib
from huggingface_hub import snapshot_download
local = snapshot_download(
repo_id="apoorvrajdev/captioning-inceptionv3-transformer",
revision="v1.0.0",
)
for f in ("model.h5", "vocab.json"):
src = hashlib.sha256(pathlib.Path("models/v1.0.0", f).read_bytes()).hexdigest()
dst = hashlib.sha256(pathlib.Path(local, f).read_bytes()).hexdigest()
assert src == dst, f
print(f, "OK", src)
PY
To promote a new checkpoint after this: bump the Space variable
BACKEND_WEIGHTS_HUB_REVISION from v1.0.0 to the new tag (e.g. v2.0.0)
and the Space restarts with the new weights. No code change required.
4. Backend Space (WS-C) β one-time setup
Create the Space at https://huggingface.co/new-space
- Owner:
apoorvrajdevΒ· Name:image-captioning-api - SDK: Docker (blank template) Β· Hardware: cpu-basic (free) Β· Public
- Owner:
In the Space's Settings β Variables and secrets, add Variables (not secrets β these are non-sensitive):
Name Value BACKEND_WEIGHTS_HUB_REPOapoorvrajdev/captioning-inceptionv3-transformerBACKEND_WEIGHTS_HUB_REVISIONv1.0.0BACKEND_WEIGHTS_HUB_FILENAMEmodel.h5BACKEND_WARMUPtrueCAPTIONING__SERVE__CORS_ALLOWED_ORIGINS["https://image-captioning-system.vercel.app","http://localhost:5173","http://localhost:5174","http://127.0.0.1:5173","http://127.0.0.1:5174"]Add a
spacegit remote and pushmain:git remote add space https://huggingface.co/spaces/apoorvrajdev/image-captioning-api git push space mainWatch the Space's Logs tab. First build takes ~8β12 min (Docker base pull,
apt-get,pip install -r requirements.txtwith TensorFlow, weight download viasnapshot_download, predictor warmup).When the badge in the Space header turns Running, verify:
curl https://apoorvrajdev-image-captioning-api.hf.space/healthz # {"status":"ok","model_loaded":true,"model_version":"v1.0.0",...}
The README YAML frontmatter (title, emoji, sdk: docker, app_port: 7860,
etc.) is what tells the Space how to build. It must remain at the literal top
of README.md. GitHub auto-hides the frontmatter when rendering the README, so
the same file serves both audiences.
5. Frontend (WS-E) β Vercel one-time setup
- https://vercel.com/new β import
apoorvrajdev/image-captioning-system - Configure:
- Framework Preset: Vite (auto-detected from
frontend/package.json) - Root Directory:
frontend - Build / Output / Install commands: leave on defaults
- Framework Preset: Vite (auto-detected from
- Environment variable (Production + Preview):
VITE_API_BASE=https://apoorvrajdev-image-captioning-api.hf.space
- Deploy. First build is ~90 sec. Production alias becomes
https://image-captioning-system.vercel.app.
After the initial import every push to main triggers an automatic Vercel
build via the GitHub integration β no separate GitHub Action required.
6. CORS (WS-F)
backend/app/main.py registers CORSMiddleware with
config.serve.cors_allowed_origins. The defaults in
configs/base.yaml cover localhost dev. Production
origins are added via the Space's CAPTIONING__SERVE__CORS_ALLOWED_ORIGINS
variable (JSON array, see Β§4). To add a new origin (e.g. a custom domain):
edit that variable, save, and the Space restarts (~30 sec, no rebuild).
7. CI/CD (WS-G)
Two workflows under .github/workflows/:
ci.ymlβ runs on every push and PR tomain:python-quality: ruff lint + format, mypy strictpython-tests: pytest matrix on 3.10 / 3.11 / 3.12notebook-freeze: SHA-256 freeze check on the IEEE notebookfrontend:npm ci && npm run lint && npm run build
deploy-backend.ymlβ chained viaworkflow_run, runs only after a successfulCIrun onmain. PushesHEAD:mainto the Space remote using theHF_TOKENrepository secret. Also supportsworkflow_dispatchfor manual redeploys.
Required GitHub secret
HF_TOKEN (repo Settings β Secrets and variables β Actions β New repository
secret). Scope: Write. Used only for git push to the Space remote.
8. End-to-end smoke test
After any redeploy, verify in this order:
# 1. Backend liveness + readiness
curl https://apoorvrajdev-image-captioning-api.hf.space/healthz
# 2. Backend caption round-trip (replace path with any local JPG/PNG)
curl -X POST https://apoorvrajdev-image-captioning-api.hf.space/v1/captions \
-F "image=@assets/sample.jpg"
# 3. Frontend loads + status badge flips to green
open https://image-captioning-system.vercel.app # macOS
# start https://image-captioning-system.vercel.app # Windows
# 4. Frontend β backend integration (in the browser)
# Upload an image β expect a 200 caption response from /v1/captions
# DevTools β Network β check no CORS errors
9. Known operational quirks
- Status badge briefly flips to "offline" while a
/v1/captionsrequest is in flight on the single uvicorn worker. The/healthzpoll queues behind inference and the frontend's 3 s timeout expires. The next 10 s poll recovers. Cosmetic only β backend never actually goes down. - First request after Space idle is slow (~5β10 s extra). HF Spaces sleep idle containers; the next call wakes the container, which then runs the lifespan startup (snapshot_download cache hit + predictor rewarmup).
- Caption quality is gibberish by design at
v1.0.0. The shipped weights are dev scaffolds fromscripts/bootstrap_dev_artifacts.py. A real trained checkpoint will be uploaded asv2.0.0and promoted via the Space variable bump described in Β§3.
10. Rollback
- Bad code on the Space:
git push space <known-good-sha>:main --force(from a local checkout). Space rebuilds from that SHA. - Bad weights on the Hub: bump the Space's
BACKEND_WEIGHTS_HUB_REVISIONback to the previous tag (e.g.v1.0.0) and save. Space restarts in ~30 s with the previous weights. - Bad frontend on Vercel: dashboard β Deployments β previous green deployment β "Promote to Production" (one click, no rebuild).