replicalab / docs /max /deployment.md
maxxie114's picture
Initial HF Spaces deployment
80d8c84
# Deployment Guide (Max / Person C)
---
## Local Development
```bash
# Create and activate virtualenv
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install server deps
pip install -r server/requirements.txt
# Install replicalab package
pip install -e . --no-deps
# Run the server
uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
```
Server should be available at `http://localhost:7860`.
Quick smoke test:
```bash
curl http://localhost:7860/health
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"seed": 42, "scenario": "math_reasoning", "difficulty": "easy"}'
```
---
## Docker (Local)
```bash
docker build -f server/Dockerfile -t replicalab .
docker run -p 7860:7860 replicalab
```
### Verified endpoints (API 08 sign-off, 2026-03-08)
After `docker run -p 7860:7860 replicalab`, the following were verified
against the **real env** (not stub):
```bash
curl http://localhost:7860/health
# β†’ {"status":"ok","env":"real"}
curl http://localhost:7860/scenarios
# β†’ {"scenarios":[{"family":"math_reasoning",...}, ...]}
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"seed":42,"scenario":"math_reasoning","difficulty":"easy"}'
# β†’ {"session_id":"...","episode_id":"...","observation":{...}}
# Use session_id from reset response:
curl -X POST http://localhost:7860/step \
-H "Content-Type: application/json" \
-d '{"session_id":"<SESSION_ID>","action":{"action_type":"propose_protocol","sample_size":3,"controls":["baseline"],"technique":"algebraic_proof","duration_days":1,"required_equipment":[],"required_reagents":[],"questions":[],"rationale":"Test."}}'
# β†’ {"observation":{...},"reward":0.0,"done":false,"info":{...}}
```
With optional hosted-model secrets:
```bash
docker run -p 7860:7860 \
-e MODEL_API_KEY=replace-me \
replicalab
```
---
## Hugging Face Spaces Deployment
### What is already configured (API 09)
The repo is now deployment-ready for HF Spaces:
- **Root `Dockerfile`** β€” HF Spaces requires the Dockerfile at repo root.
The root-level `Dockerfile` is identical to `server/Dockerfile`. Keep them
in sync, or delete `server/Dockerfile` once the team standardizes.
- **`README.md` frontmatter** β€” The root README now contains the required
YAML frontmatter that HF Spaces parses on push:
```yaml
---
title: ReplicaLab
emoji: πŸ§ͺ
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---
```
- **Non-root user** β€” The Dockerfile creates and runs as `appuser` (UID 1000),
which HF Spaces requires for security.
- **Port 7860** β€” Both the `EXPOSE` directive and the `uvicorn` CMD use 7860,
matching the `app_port` in the frontmatter.
### Step-by-step deployment (for Max)
#### 1. Create the Space
1. Go to https://huggingface.co/new-space
2. Fill in:
- **Owner:** your HF username or the team org
- **Space name:** `replicalab` (or `replicalab-demo`)
- **License:** MIT
- **SDK:** Docker
- **Hardware:** CPU Basic (free tier is fine for the server)
- **Visibility:** Public
3. Click **Create Space**
#### 2. Add the Space as a git remote
```bash
# From the repo root
git remote add hf https://huggingface.co/spaces/<YOUR_HF_USERNAME>/replicalab
# If the org is different:
# git remote add hf https://huggingface.co/spaces/<ORG>/replicalab
```
#### 3. Push the repo
```bash
# Push the current branch to the Space
git push hf ayush:main
# Or if deploying from master:
# git push hf master:main
```
HF Spaces will automatically detect the `Dockerfile`, build the image, and
start the container.
#### 4. Monitor the build
1. Go to https://huggingface.co/spaces/\<YOUR_HF_USERNAME\>/replicalab
2. Click the **Logs** tab (or **Build** tab during first deploy)
3. Wait for the build to complete (typically 2-5 minutes)
4. The Space status should change from "Building" to "Running"
#### 5. Verify the deployment (API 10 scope)
Once the Space is running:
```bash
# Health check
curl https://ayushozha-replicalab.hf.space/health
# Reset an episode
curl -X POST https://ayushozha-replicalab.hf.space/reset \
-H "Content-Type: application/json" \
-d '{"seed": 42, "scenario": "math_reasoning", "difficulty": "easy"}'
# List scenarios
curl https://ayushozha-replicalab.hf.space/scenarios
```
WebSocket test (using websocat or wscat):
```bash
wscat -c wss://ayushozha-replicalab.hf.space/ws
# Then type: {"type": "ping"}
# Expect: {"type": "pong"}
```
### Verified live deployment (API 10 sign-off, 2026-03-08)
**Public Space URL:** https://huggingface.co/spaces/ayushozha/replicalab
**API base URL:** `https://ayushozha-replicalab.hf.space`
All four endpoints verified against the live Space with real env:
```
GET /health β†’ 200 {"status":"ok","env":"real"}
GET /scenarios β†’ 200 {"scenarios":[...3 families...]}
POST /reset β†’ 200 {"session_id":"...","episode_id":"...","observation":{...}}
POST /step β†’ 200 {"reward":2.312798,"done":true,"info":{"verdict":"accept",...}}
```
Full episode verified: reset β†’ propose_protocol β†’ accept β†’ terminal reward
with real judge scoring (rigor=0.465, feasibility=1.000, fidelity=0.325,
total_reward=2.313, verdict=accept).
---
## Secrets and API Key Management (API 17)
### Current state
The server is **fully self-contained with no external API calls**.
No secrets or API keys are required to run the environment, judge, or
scoring pipeline. All reward computation is deterministic and local.
### Where secrets live (by context)
| Context | Location | What to set | Required? |
|---------|----------|-------------|-----------|
| **HF Space** | Space Settings β†’ Repository secrets | Nothing currently | No |
| **Local dev** | Shell env vars or `.env` file (gitignored) | Nothing currently | No |
| **Docker** | `-e KEY=value` flags on `docker run` | Nothing currently | No |
| **Colab notebook** | `google.colab.userdata` or env vars | `HF_TOKEN` for model downloads, `REPLICALAB_URL` for hosted env | Yes for training |
### Colab notebook secrets
When running the training notebook, the following are needed:
| Secret | Purpose | Where to set | Required? |
|--------|---------|-------------|-----------|
| `HF_TOKEN` | Download gated models (Qwen3-4B) from HF Hub | Colab Secrets panel (key icon) | Yes |
| `REPLICALAB_URL` | URL of the hosted environment | Hardcode or Colab secret | Optional β€” defaults to `https://ayushozha-replicalab.hf.space` |
To set in Colab:
1. Click the key icon in the left sidebar
2. Add `HF_TOKEN` with your Hugging Face access token
3. Access in code:
```python
from google.colab import userdata
hf_token = userdata.get("HF_TOKEN")
```
### Future secrets (not currently needed)
If a frontier hosted evaluator is added later:
| Secret name | Purpose | Required? |
|-------------|---------|-----------|
| `MODEL_API_KEY` | Hosted evaluator access key | Only if a hosted evaluator is added |
| `MODEL_BASE_URL` | Alternate provider endpoint | Only if using a proxy |
These would be set in HF Space Settings β†’ Repository secrets, and
accessed via `os.environ.get("MODEL_API_KEY")` in server code.
### Re-deploying after code changes
```bash
# Just push again β€” HF rebuilds automatically
git push hf ayush:main
```
To force a full rebuild (e.g. after dependency changes):
1. Go to Space **Settings**
2. Click **Factory reboot** under the Danger zone section
### Known limitations
- **Free CPU tier** has 2 vCPU and 16 GB RAM. This is sufficient for the
FastAPI server but NOT for running RL training. Training happens in Colab.
- **Cold starts** β€” Free-tier Spaces sleep after 48 hours of inactivity.
The first request after sleep takes 30-60 seconds to rebuild.
- **Persistent storage** β€” Episode replays and logs are in-memory only.
They reset when the container restarts. This is acceptable for the
hackathon demo.
- **Heavy hosted models require billing-enabled hardware** β€” as of
2026-03-09, the checked HF token authenticates successfully but the backing
account reports `canPay=false` and has no org attached, so it is currently
suitable for model downloads but not for provisioning paid large-model
serving through HF Spaces hardware or Inference Endpoints.
---
## Environment URLs Reference
| Service | Local | Hosted |
|---------|-------|--------|
| FastAPI app | `http://localhost:7860` | `https://ayushozha-replicalab.hf.space` |
| Health | `http://localhost:7860/health` | `https://ayushozha-replicalab.hf.space/health` |
| WebSocket | `ws://localhost:7860/ws` | `wss://ayushozha-replicalab.hf.space/ws` |
| Scenarios | `http://localhost:7860/scenarios` | `https://ayushozha-replicalab.hf.space/scenarios` |
---
## Northflank CLI Access
### Local verification (2026-03-08)
- Installed globally with `npm i -g @northflank/cli`
- Verified locally with `northflank --version`
- Current verified version: `0.10.16`
### Login
```bash
northflank login -n <context-name> -t <token>
```
`<token>` must come from the user's Northflank account or team secret
manager. Do not commit it to the repo.
### Service access commands for `replica-labs/replicalab-ai`
```bash
northflank forward service --projectId replica-labs --serviceId replicalab-ai
northflank get service logs --tail --projectId replica-labs --serviceId replicalab-ai
northflank ssh service --projectId replica-labs --serviceId replicalab-ai
northflank exec service --projectId replica-labs --serviceId replicalab-ai
northflank upload service file --projectId replica-labs --serviceId replicalab-ai --localPath dir/file.txt --remotePath /home/file.txt
northflank download service file --projectId replica-labs --serviceId replicalab-ai --localPath dir/file.txt --remotePath /home/file.txt
```
### Current Northflank runtime findings (2026-03-09)
- The manual training job `replicalab-train` exists in `replica-labs`, but
`northflank start job run --projectId replica-labs --jobId replicalab-train`
currently fails with `409 No deployment configured`.
- The job still has runtime variables configured, including the older remote
`MODEL_NAME=Qwen/Qwen3-8B`, so even after the missing deployment is fixed the
runtime config should be reviewed before launching training.
- The live service `replicalab-ai` is deployed on the same
`nf-gpu-hack-16-64` billing plan, but a direct probe from inside the
container found no `nvidia-smi` binary and no `/dev/nvidia*` device nodes.
Treat GPU/H100 availability as unverified until a container can prove
hardware visibility from inside the runtime.
### Current Northflank notebook findings (2026-03-09)
- There is a separate live notebook service in project `notebook-openport`:
`jupyter-pytorch`.
- The active public notebook DNS is
`app--jupyter-pytorch--9y6g97v7czb9.code.run` on port `8888` (`/lab` for the
Jupyter UI).
- Northflank reports that service with GPU config
`gpuType=h100-80`, `gpuCount=1`, and an in-container probe confirmed
`NVIDIA H100 80GB HBM3`.
- The notebook image is `quay.io/jupyter/pytorch-notebook:cuda12-2025-08-18`.
- The notebook currently contains a repo clone and GRPO outputs, but the saved
notebook/log state is not clean: training produced adapter checkpoints
through step 200, then later notebook evaluation/inference failed with a
`string indices must be integers, not 'str'` content-format error.
### Windows note
Global npm binaries resolve from `C:\Users\ayush\AppData\Roaming\npm` on this
machine. If `northflank` is not found in a new shell, reopen the terminal so
the updated PATH is reloaded.
---
## Hand-off To Ayush
**Local server:**
- WebSocket: `ws://localhost:7860/ws`
- REST health: `http://localhost:7860/health`
- Running against: **real env** (not stub)
**Hosted deployment (verified 2026-03-08):**
- Base URL: `https://ayushozha-replicalab.hf.space`
- `/health` returns `200` with `{"status":"ok","env":"real"}`
- WebSocket path: `wss://ayushozha-replicalab.hf.space/ws`
- Full episode tested: propose β†’ accept β†’ reward with real judge scores
---
## Troubleshooting
| Issue | Fix |
|-------|-----|
| `ReplicaLabEnv not found` warning at startup | The real env is now available; ensure `replicalab/scoring/rubric.py` is present and `httpx` + `websocket-client` are in `server/requirements.txt` |
| Docker build fails | Re-check `server/requirements.txt` and the Docker build context |
| CORS error from the frontend | Re-check allowed origins in `server/app.py` |
| WebSocket closes after idle time | Send periodic ping messages or reconnect |
| Session not found (REST) | Call `/reset` again to create a new session |