replicalab / docs /max /deployment.md
maxxie114's picture
Initial HF Spaces deployment
80d8c84

Deployment Guide (Max / Person C)


Local Development

# Create and activate virtualenv
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install server deps
pip install -r server/requirements.txt

# Install replicalab package
pip install -e . --no-deps

# Run the server
uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload

Server should be available at http://localhost:7860.

Quick smoke test:

curl http://localhost:7860/health

curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"seed": 42, "scenario": "math_reasoning", "difficulty": "easy"}'

Docker (Local)

docker build -f server/Dockerfile -t replicalab .
docker run -p 7860:7860 replicalab

Verified endpoints (API 08 sign-off, 2026-03-08)

After docker run -p 7860:7860 replicalab, the following were verified against the real env (not stub):

curl http://localhost:7860/health
# β†’ {"status":"ok","env":"real"}

curl http://localhost:7860/scenarios
# β†’ {"scenarios":[{"family":"math_reasoning",...}, ...]}

curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"seed":42,"scenario":"math_reasoning","difficulty":"easy"}'
# β†’ {"session_id":"...","episode_id":"...","observation":{...}}

# Use session_id from reset response:
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"session_id":"<SESSION_ID>","action":{"action_type":"propose_protocol","sample_size":3,"controls":["baseline"],"technique":"algebraic_proof","duration_days":1,"required_equipment":[],"required_reagents":[],"questions":[],"rationale":"Test."}}'
# β†’ {"observation":{...},"reward":0.0,"done":false,"info":{...}}

With optional hosted-model secrets:

docker run -p 7860:7860 \
  -e MODEL_API_KEY=replace-me \
  replicalab

Hugging Face Spaces Deployment

What is already configured (API 09)

The repo is now deployment-ready for HF Spaces:

  • Root Dockerfile β€” HF Spaces requires the Dockerfile at repo root. The root-level Dockerfile is identical to server/Dockerfile. Keep them in sync, or delete server/Dockerfile once the team standardizes.
  • README.md frontmatter β€” The root README now contains the required YAML frontmatter that HF Spaces parses on push: ```yaml

    title: ReplicaLab emoji: πŸ§ͺ colorFrom: blue colorTo: green sdk: docker app_port: 7860 pinned: false

    
    
  • Non-root user β€” The Dockerfile creates and runs as appuser (UID 1000), which HF Spaces requires for security.
  • Port 7860 β€” Both the EXPOSE directive and the uvicorn CMD use 7860, matching the app_port in the frontmatter.

Step-by-step deployment (for Max)

1. Create the Space

  1. Go to https://huggingface.co/new-space
  2. Fill in:
    • Owner: your HF username or the team org
    • Space name: replicalab (or replicalab-demo)
    • License: MIT
    • SDK: Docker
    • Hardware: CPU Basic (free tier is fine for the server)
    • Visibility: Public
  3. Click Create Space

2. Add the Space as a git remote

# From the repo root
git remote add hf https://huggingface.co/spaces/<YOUR_HF_USERNAME>/replicalab

# If the org is different:
# git remote add hf https://huggingface.co/spaces/<ORG>/replicalab

3. Push the repo

# Push the current branch to the Space
git push hf ayush:main

# Or if deploying from master:
# git push hf master:main

HF Spaces will automatically detect the Dockerfile, build the image, and start the container.

4. Monitor the build

  1. Go to https://huggingface.co/spaces/\<YOUR_HF_USERNAME>/replicalab
  2. Click the Logs tab (or Build tab during first deploy)
  3. Wait for the build to complete (typically 2-5 minutes)
  4. The Space status should change from "Building" to "Running"

5. Verify the deployment (API 10 scope)

Once the Space is running:

# Health check
curl https://ayushozha-replicalab.hf.space/health

# Reset an episode
curl -X POST https://ayushozha-replicalab.hf.space/reset \
  -H "Content-Type: application/json" \
  -d '{"seed": 42, "scenario": "math_reasoning", "difficulty": "easy"}'

# List scenarios
curl https://ayushozha-replicalab.hf.space/scenarios

WebSocket test (using websocat or wscat):

wscat -c wss://ayushozha-replicalab.hf.space/ws
# Then type: {"type": "ping"}
# Expect: {"type": "pong"}

Verified live deployment (API 10 sign-off, 2026-03-08)

Public Space URL: https://huggingface.co/spaces/ayushozha/replicalab API base URL: https://ayushozha-replicalab.hf.space

All four endpoints verified against the live Space with real env:

GET  /health    β†’ 200 {"status":"ok","env":"real"}
GET  /scenarios β†’ 200 {"scenarios":[...3 families...]}
POST /reset     β†’ 200 {"session_id":"...","episode_id":"...","observation":{...}}
POST /step      β†’ 200 {"reward":2.312798,"done":true,"info":{"verdict":"accept",...}}

Full episode verified: reset β†’ propose_protocol β†’ accept β†’ terminal reward with real judge scoring (rigor=0.465, feasibility=1.000, fidelity=0.325, total_reward=2.313, verdict=accept).


Secrets and API Key Management (API 17)

Current state

The server is fully self-contained with no external API calls. No secrets or API keys are required to run the environment, judge, or scoring pipeline. All reward computation is deterministic and local.

Where secrets live (by context)

Context Location What to set Required?
HF Space Space Settings β†’ Repository secrets Nothing currently No
Local dev Shell env vars or .env file (gitignored) Nothing currently No
Docker -e KEY=value flags on docker run Nothing currently No
Colab notebook google.colab.userdata or env vars HF_TOKEN for model downloads, REPLICALAB_URL for hosted env Yes for training

Colab notebook secrets

When running the training notebook, the following are needed:

Secret Purpose Where to set Required?
HF_TOKEN Download gated models (Qwen3-4B) from HF Hub Colab Secrets panel (key icon) Yes
REPLICALAB_URL URL of the hosted environment Hardcode or Colab secret Optional β€” defaults to https://ayushozha-replicalab.hf.space

To set in Colab:

  1. Click the key icon in the left sidebar
  2. Add HF_TOKEN with your Hugging Face access token
  3. Access in code:
from google.colab import userdata
hf_token = userdata.get("HF_TOKEN")

Future secrets (not currently needed)

If a frontier hosted evaluator is added later:

Secret name Purpose Required?
MODEL_API_KEY Hosted evaluator access key Only if a hosted evaluator is added
MODEL_BASE_URL Alternate provider endpoint Only if using a proxy

These would be set in HF Space Settings β†’ Repository secrets, and accessed via os.environ.get("MODEL_API_KEY") in server code.

Re-deploying after code changes

# Just push again β€” HF rebuilds automatically
git push hf ayush:main

To force a full rebuild (e.g. after dependency changes):

  1. Go to Space Settings
  2. Click Factory reboot under the Danger zone section

Known limitations

  • Free CPU tier has 2 vCPU and 16 GB RAM. This is sufficient for the FastAPI server but NOT for running RL training. Training happens in Colab.
  • Cold starts β€” Free-tier Spaces sleep after 48 hours of inactivity. The first request after sleep takes 30-60 seconds to rebuild.
  • Persistent storage β€” Episode replays and logs are in-memory only. They reset when the container restarts. This is acceptable for the hackathon demo.
  • Heavy hosted models require billing-enabled hardware β€” as of 2026-03-09, the checked HF token authenticates successfully but the backing account reports canPay=false and has no org attached, so it is currently suitable for model downloads but not for provisioning paid large-model serving through HF Spaces hardware or Inference Endpoints.

Environment URLs Reference

Service Local Hosted
FastAPI app http://localhost:7860 https://ayushozha-replicalab.hf.space
Health http://localhost:7860/health https://ayushozha-replicalab.hf.space/health
WebSocket ws://localhost:7860/ws wss://ayushozha-replicalab.hf.space/ws
Scenarios http://localhost:7860/scenarios https://ayushozha-replicalab.hf.space/scenarios

Northflank CLI Access

Local verification (2026-03-08)

  • Installed globally with npm i -g @northflank/cli
  • Verified locally with northflank --version
  • Current verified version: 0.10.16

Login

northflank login -n <context-name> -t <token>

<token> must come from the user's Northflank account or team secret manager. Do not commit it to the repo.

Service access commands for replica-labs/replicalab-ai

northflank forward service --projectId replica-labs --serviceId replicalab-ai
northflank get service logs --tail --projectId replica-labs --serviceId replicalab-ai
northflank ssh service --projectId replica-labs --serviceId replicalab-ai
northflank exec service --projectId replica-labs --serviceId replicalab-ai
northflank upload service file --projectId replica-labs --serviceId replicalab-ai --localPath dir/file.txt --remotePath /home/file.txt
northflank download service file --projectId replica-labs --serviceId replicalab-ai --localPath dir/file.txt --remotePath /home/file.txt

Current Northflank runtime findings (2026-03-09)

  • The manual training job replicalab-train exists in replica-labs, but northflank start job run --projectId replica-labs --jobId replicalab-train currently fails with 409 No deployment configured.
  • The job still has runtime variables configured, including the older remote MODEL_NAME=Qwen/Qwen3-8B, so even after the missing deployment is fixed the runtime config should be reviewed before launching training.
  • The live service replicalab-ai is deployed on the same nf-gpu-hack-16-64 billing plan, but a direct probe from inside the container found no nvidia-smi binary and no /dev/nvidia* device nodes. Treat GPU/H100 availability as unverified until a container can prove hardware visibility from inside the runtime.

Current Northflank notebook findings (2026-03-09)

  • There is a separate live notebook service in project notebook-openport: jupyter-pytorch.
  • The active public notebook DNS is app--jupyter-pytorch--9y6g97v7czb9.code.run on port 8888 (/lab for the Jupyter UI).
  • Northflank reports that service with GPU config gpuType=h100-80, gpuCount=1, and an in-container probe confirmed NVIDIA H100 80GB HBM3.
  • The notebook image is quay.io/jupyter/pytorch-notebook:cuda12-2025-08-18.
  • The notebook currently contains a repo clone and GRPO outputs, but the saved notebook/log state is not clean: training produced adapter checkpoints through step 200, then later notebook evaluation/inference failed with a string indices must be integers, not 'str' content-format error.

Windows note

Global npm binaries resolve from C:\Users\ayush\AppData\Roaming\npm on this machine. If northflank is not found in a new shell, reopen the terminal so the updated PATH is reloaded.


Hand-off To Ayush

Local server:

  • WebSocket: ws://localhost:7860/ws
  • REST health: http://localhost:7860/health
  • Running against: real env (not stub)

Hosted deployment (verified 2026-03-08):

  • Base URL: https://ayushozha-replicalab.hf.space
  • /health returns 200 with {"status":"ok","env":"real"}
  • WebSocket path: wss://ayushozha-replicalab.hf.space/ws
  • Full episode tested: propose β†’ accept β†’ reward with real judge scores

Troubleshooting

Issue Fix
ReplicaLabEnv not found warning at startup The real env is now available; ensure replicalab/scoring/rubric.py is present and httpx + websocket-client are in server/requirements.txt
Docker build fails Re-check server/requirements.txt and the Docker build context
CORS error from the frontend Re-check allowed origins in server/app.py
WebSocket closes after idle time Send periodic ping messages or reconnect
Session not found (REST) Call /reset again to create a new session