Deployment topology
Riprap is composed of two HF Spaces in production. The UI Space is CPU-only and contains the FastAPI + SvelteKit front-end; the inference Space is an L4 GPU and runs vLLM (Granite 4.1 8B FP8) plus the EO model stack co-resident.
ββββββββββββββββββββββββββββββββββββββββββββββββ
β msradam/riprap-vllm (NVIDIA L4, 24 GB) β
β β
β :7860 proxy.py bearer-auth FastAPI β
β ββ /v1/chat/* /v1/embeddings β :8000 β
β ββ /v1/{prithvi,terramind,...} β :7861 β
β ββ /v1/power NVML readings β
β β
β :8000 vLLM Granite 4.1 8B FP8 β
β :7861 riprap-models β
β Prithvi-EO 2.0 NYC-Pluvial β
β TerraMind LULC + Buildings β
β Granite TTM r2 β
β GLiNER + Granite Embedding 278M β
ββββββββββββββββββββββ²ββββββββββββββββββββββββββ
β bearer-auth HTTPS
β
βββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ
β lablab-ai-amd-developer-hackathon/riprap-nyc β
β Hackathon submission UI Β· cpu-basic β
β β
β FastAPI (web/main.py) + SvelteKit static build β
β Burr FSM (app/fsm.py) β
β β
β RIPRAP_LLM_BASE_URL = β¦/v1 β
β RIPRAP_ML_BASE_URL = β¦ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The UI Space holds no GPU weights and contacts no commercial APIs. Every model call routes through the bearer-authenticated proxy on the inference Space.
Hugging Face Spaces
lablab-ai-amd-developer-hackathon/riprap-nyc β UI Space
The hackathon submission. CPU-basic tier. Image built from the root
Dockerfile. Holds no model weights β every inference call goes
remote via env vars.
Required Space variables:
RIPRAP_LLM_PRIMARY = vllm
RIPRAP_LLM_BASE_URL = https://msradam-riprap-vllm.hf.space/v1
RIPRAP_LLM_VLLM_8B_NAME = granite4.1:8b
RIPRAP_ML_BACKEND = remote
RIPRAP_ML_BASE_URL = https://msradam-riprap-vllm.hf.space
RIPRAP_NYCHA_REGISTERS = 1
RIPRAP_HEAVY_SPECIALISTS = 1
RIPRAP_PRITHVI_LIVE_ENABLE= 1
RIPRAP_TERRAMIND_ENABLE = 1
RIPRAP_EO_CHIP_ENABLE = 1
Required secrets (set via Settings β Variables and secrets):
RIPRAP_LLM_API_KEY bearer token shared with the inference Space
RIPRAP_ML_API_KEY bearer token shared with the inference Space
HF_TOKEN for register / catalog downloads
msradam/riprap-vllm β Inference Space
L4 (l4x1) tier. Image built from inference-vllm/Dockerfile.
Bakes Granite 4.1 8B FP8 weights and the EO model dependencies
(terratorch + peft + diffusers + segmentation-models-pytorch +
nvidia-ml-py for NVML power sampling).
Required secret:
RIPRAP_PROXY_TOKEN bearer token; must match RIPRAP_LLM_API_KEY /
RIPRAP_ML_API_KEY on the UI Spaces
Endpoints:
| Path | Routes to | Notes |
|---|---|---|
POST /v1/chat/completions |
vLLM | Granite 4.1 8B FP8, OpenAI-compat |
POST /v1/completions |
vLLM | OpenAI-compat |
GET /v1/models |
vLLM | served-model-name family |
POST /v1/embeddings |
riprap-models | Granite Embedding 278M |
POST /v1/prithvi-pluvial |
riprap-models | Prithvi-EO 2.0 NYC-Pluvial |
POST /v1/terramind |
riprap-models | TerraMind LULC / Buildings / synthesis |
POST /v1/ttm-forecast |
riprap-models | Granite TTM r2 + Battery surge |
POST /v1/gliner-extract |
riprap-models | GLiNER typed-entity |
GET /v1/power |
proxy | Real NVML power (W) β see docs/EMISSIONS.md |
GET /healthz |
proxy + both backends | Aggregates health status |
All /v1/* endpoints require Authorization: Bearer <PROXY_TOKEN>.
/v1/power and the bracket-sampling LLM client path are described
in docs/EMISSIONS.md.
Personal mirror β msradam/riprap
Self-contained L4 mirror that runs the full stack (UI + vLLM + EO
models) in a single container. Used for parallel demos when the
shared inference Space is busy. Built from Dockerfile.l4.
scripts/deploy_personal_space.sh
This is paused by default for the hackathon period to keep the L4 budget on the primary inference Space.
Local development
Pure local (Ollama)
uv venv && uv pip install -r requirements.txt
cd web/sveltekit && npm ci && npm run build && cd ../..
ollama pull granite4.1:3b
ollama pull granite4.1:8b
.venv/bin/uvicorn web.main:app --host 127.0.0.1 --port 7860
Visit http://127.0.0.1:7860. Inference runs locally β no GPU
power readings (the chip will display the data-sheet estimate with
a ~ icon).
Local UI, remote inference
RIPRAP_LLM_PRIMARY=vllm \
RIPRAP_LLM_BASE_URL=https://msradam-riprap-vllm.hf.space/v1 \
RIPRAP_LLM_API_KEY=<token> \
RIPRAP_ML_BACKEND=remote \
RIPRAP_ML_BASE_URL=https://msradam-riprap-vllm.hf.space \
RIPRAP_ML_API_KEY=<token> \
.venv/bin/uvicorn web.main:app --host 127.0.0.1 --port 7860
Same flow as the hosted UI Space, but rendered locally. Real NVML power readings come back through the proxy headers and bracket samples just like in production.
Deploy commands
| Target | Script | Notes |
|---|---|---|
Inference Space (msradam/riprap-vllm) |
scripts/deploy_vllm_space.sh |
Orphan-branch push from inference-vllm/ |
UI Space (lablab-ai-amd-developer-hackathon/riprap-nyc) |
cherry-pick onto huggingface/main then git push huggingface |
HF Spaces' xet hook rejects pushes that walk through commits with binaries; cherry-picking from a clean ancestor avoids it |
Personal mirror (msradam/riprap) |
scripts/deploy_personal_space.sh |
Orphan-branch push from Dockerfile.l4 |
Inference fallback (msradam/riprap-inference) |
scripts/deploy_inference_space.sh |
Ollama-backed mirror; redundant when riprap-vllm is up |
Verifying a deploy
PYTHONPATH=. uv run python scripts/probe_stones_fire.py --timeout 600
Asserts: all five Stones fire, no torchvision/terratorch dep
regression, the emissions block reports nvidia_l4 hardware, and
real NVML measurements come through (n_measured β n_calls).
The address probe sweeps the full canonical set (5 NYC addresses):
.venv/bin/python scripts/probe_addresses.py \
--base https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space
Historical notes
The hackathon submission was originally built against an AMD MI300X
DigitalOcean droplet (running both vLLM and the EO model service).
The droplet was decommissioned 2026-05-06 and inference moved
to the L4 HF Spaces above. The bring-up runbook for the MI300X
droplet is preserved in docs/DROPLET-RUNBOOK.md
for anyone reproducing the original AMD-judging setup; setting
RIPRAP_HARDWARE_LABEL=AMD MI300X on a droplet redeploy will swap
the emissions ledger back to the MI300X data-sheet figures.