Spaces:
Running
Running
| # Deployment Guide (Max / Person C) | |
| --- | |
| ## Local Development | |
| ```bash | |
| # Create and activate virtualenv | |
| python -m venv .venv | |
| source .venv/bin/activate # Windows: .venv\Scripts\activate | |
| # Install server deps | |
| pip install -r server/requirements.txt | |
| # Install replicalab package | |
| pip install -e . --no-deps | |
| # Run the server | |
| uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload | |
| ``` | |
| Server should be available at `http://localhost:7860`. | |
| Quick smoke test: | |
| ```bash | |
| curl http://localhost:7860/health | |
| curl -X POST http://localhost:7860/reset \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"seed": 42, "scenario": "math_reasoning", "difficulty": "easy"}' | |
| ``` | |
| --- | |
| ## Docker (Local) | |
| ```bash | |
| docker build -f server/Dockerfile -t replicalab . | |
| docker run -p 7860:7860 replicalab | |
| ``` | |
| ### Verified endpoints (API 08 sign-off, 2026-03-08) | |
| After `docker run -p 7860:7860 replicalab`, the following were verified | |
| against the **real env** (not stub): | |
| ```bash | |
| curl http://localhost:7860/health | |
| # β {"status":"ok","env":"real"} | |
| curl http://localhost:7860/scenarios | |
| # β {"scenarios":[{"family":"math_reasoning",...}, ...]} | |
| curl -X POST http://localhost:7860/reset \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"seed":42,"scenario":"math_reasoning","difficulty":"easy"}' | |
| # β {"session_id":"...","episode_id":"...","observation":{...}} | |
| # Use session_id from reset response: | |
| curl -X POST http://localhost:7860/step \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"session_id":"<SESSION_ID>","action":{"action_type":"propose_protocol","sample_size":3,"controls":["baseline"],"technique":"algebraic_proof","duration_days":1,"required_equipment":[],"required_reagents":[],"questions":[],"rationale":"Test."}}' | |
| # β {"observation":{...},"reward":0.0,"done":false,"info":{...}} | |
| ``` | |
| With optional hosted-model secrets: | |
| ```bash | |
| docker run -p 7860:7860 \ | |
| -e MODEL_API_KEY=replace-me \ | |
| replicalab | |
| ``` | |
| --- | |
| ## Hugging Face Spaces Deployment | |
| ### What is already configured (API 09) | |
| The repo is now deployment-ready for HF Spaces: | |
| - **Root `Dockerfile`** β HF Spaces requires the Dockerfile at repo root. | |
| The root-level `Dockerfile` is identical to `server/Dockerfile`. Keep them | |
| in sync, or delete `server/Dockerfile` once the team standardizes. | |
| - **`README.md` frontmatter** β The root README now contains the required | |
| YAML frontmatter that HF Spaces parses on push: | |
| ```yaml | |
| --- | |
| title: ReplicaLab | |
| emoji: π§ͺ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| --- | |
| ``` | |
| - **Non-root user** β The Dockerfile creates and runs as `appuser` (UID 1000), | |
| which HF Spaces requires for security. | |
| - **Port 7860** β Both the `EXPOSE` directive and the `uvicorn` CMD use 7860, | |
| matching the `app_port` in the frontmatter. | |
| ### Step-by-step deployment (for Max) | |
| #### 1. Create the Space | |
| 1. Go to https://huggingface.co/new-space | |
| 2. Fill in: | |
| - **Owner:** your HF username or the team org | |
| - **Space name:** `replicalab` (or `replicalab-demo`) | |
| - **License:** MIT | |
| - **SDK:** Docker | |
| - **Hardware:** CPU Basic (free tier is fine for the server) | |
| - **Visibility:** Public | |
| 3. Click **Create Space** | |
| #### 2. Add the Space as a git remote | |
| ```bash | |
| # From the repo root | |
| git remote add hf https://huggingface.co/spaces/<YOUR_HF_USERNAME>/replicalab | |
| # If the org is different: | |
| # git remote add hf https://huggingface.co/spaces/<ORG>/replicalab | |
| ``` | |
| #### 3. Push the repo | |
| ```bash | |
| # Push the current branch to the Space | |
| git push hf ayush:main | |
| # Or if deploying from master: | |
| # git push hf master:main | |
| ``` | |
| HF Spaces will automatically detect the `Dockerfile`, build the image, and | |
| start the container. | |
| #### 4. Monitor the build | |
| 1. Go to https://huggingface.co/spaces/\<YOUR_HF_USERNAME\>/replicalab | |
| 2. Click the **Logs** tab (or **Build** tab during first deploy) | |
| 3. Wait for the build to complete (typically 2-5 minutes) | |
| 4. The Space status should change from "Building" to "Running" | |
| #### 5. Verify the deployment (API 10 scope) | |
| Once the Space is running: | |
| ```bash | |
| # Health check | |
| curl https://ayushozha-replicalab.hf.space/health | |
| # Reset an episode | |
| curl -X POST https://ayushozha-replicalab.hf.space/reset \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"seed": 42, "scenario": "math_reasoning", "difficulty": "easy"}' | |
| # List scenarios | |
| curl https://ayushozha-replicalab.hf.space/scenarios | |
| ``` | |
| WebSocket test (using websocat or wscat): | |
| ```bash | |
| wscat -c wss://ayushozha-replicalab.hf.space/ws | |
| # Then type: {"type": "ping"} | |
| # Expect: {"type": "pong"} | |
| ``` | |
| ### Verified live deployment (API 10 sign-off, 2026-03-08) | |
| **Public Space URL:** https://huggingface.co/spaces/ayushozha/replicalab | |
| **API base URL:** `https://ayushozha-replicalab.hf.space` | |
| All four endpoints verified against the live Space with real env: | |
| ``` | |
| GET /health β 200 {"status":"ok","env":"real"} | |
| GET /scenarios β 200 {"scenarios":[...3 families...]} | |
| POST /reset β 200 {"session_id":"...","episode_id":"...","observation":{...}} | |
| POST /step β 200 {"reward":2.312798,"done":true,"info":{"verdict":"accept",...}} | |
| ``` | |
| Full episode verified: reset β propose_protocol β accept β terminal reward | |
| with real judge scoring (rigor=0.465, feasibility=1.000, fidelity=0.325, | |
| total_reward=2.313, verdict=accept). | |
| --- | |
| ## Secrets and API Key Management (API 17) | |
| ### Current state | |
| The server is **fully self-contained with no external API calls**. | |
| No secrets or API keys are required to run the environment, judge, or | |
| scoring pipeline. All reward computation is deterministic and local. | |
| ### Where secrets live (by context) | |
| | Context | Location | What to set | Required? | | |
| |---------|----------|-------------|-----------| | |
| | **HF Space** | Space Settings β Repository secrets | Nothing currently | No | | |
| | **Local dev** | Shell env vars or `.env` file (gitignored) | Nothing currently | No | | |
| | **Docker** | `-e KEY=value` flags on `docker run` | Nothing currently | No | | |
| | **Colab notebook** | `google.colab.userdata` or env vars | `HF_TOKEN` for model downloads, `REPLICALAB_URL` for hosted env | Yes for training | | |
| ### Colab notebook secrets | |
| When running the training notebook, the following are needed: | |
| | Secret | Purpose | Where to set | Required? | | |
| |--------|---------|-------------|-----------| | |
| | `HF_TOKEN` | Download gated models (Qwen3-4B) from HF Hub | Colab Secrets panel (key icon) | Yes | | |
| | `REPLICALAB_URL` | URL of the hosted environment | Hardcode or Colab secret | Optional β defaults to `https://ayushozha-replicalab.hf.space` | | |
| To set in Colab: | |
| 1. Click the key icon in the left sidebar | |
| 2. Add `HF_TOKEN` with your Hugging Face access token | |
| 3. Access in code: | |
| ```python | |
| from google.colab import userdata | |
| hf_token = userdata.get("HF_TOKEN") | |
| ``` | |
| ### Future secrets (not currently needed) | |
| If a frontier hosted evaluator is added later: | |
| | Secret name | Purpose | Required? | | |
| |-------------|---------|-----------| | |
| | `MODEL_API_KEY` | Hosted evaluator access key | Only if a hosted evaluator is added | | |
| | `MODEL_BASE_URL` | Alternate provider endpoint | Only if using a proxy | | |
| These would be set in HF Space Settings β Repository secrets, and | |
| accessed via `os.environ.get("MODEL_API_KEY")` in server code. | |
| ### Re-deploying after code changes | |
| ```bash | |
| # Just push again β HF rebuilds automatically | |
| git push hf ayush:main | |
| ``` | |
| To force a full rebuild (e.g. after dependency changes): | |
| 1. Go to Space **Settings** | |
| 2. Click **Factory reboot** under the Danger zone section | |
| ### Known limitations | |
| - **Free CPU tier** has 2 vCPU and 16 GB RAM. This is sufficient for the | |
| FastAPI server but NOT for running RL training. Training happens in Colab. | |
| - **Cold starts** β Free-tier Spaces sleep after 48 hours of inactivity. | |
| The first request after sleep takes 30-60 seconds to rebuild. | |
| - **Persistent storage** β Episode replays and logs are in-memory only. | |
| They reset when the container restarts. This is acceptable for the | |
| hackathon demo. | |
| - **Heavy hosted models require billing-enabled hardware** β as of | |
| 2026-03-09, the checked HF token authenticates successfully but the backing | |
| account reports `canPay=false` and has no org attached, so it is currently | |
| suitable for model downloads but not for provisioning paid large-model | |
| serving through HF Spaces hardware or Inference Endpoints. | |
| --- | |
| ## Environment URLs Reference | |
| | Service | Local | Hosted | | |
| |---------|-------|--------| | |
| | FastAPI app | `http://localhost:7860` | `https://ayushozha-replicalab.hf.space` | | |
| | Health | `http://localhost:7860/health` | `https://ayushozha-replicalab.hf.space/health` | | |
| | WebSocket | `ws://localhost:7860/ws` | `wss://ayushozha-replicalab.hf.space/ws` | | |
| | Scenarios | `http://localhost:7860/scenarios` | `https://ayushozha-replicalab.hf.space/scenarios` | | |
| --- | |
| ## Northflank CLI Access | |
| ### Local verification (2026-03-08) | |
| - Installed globally with `npm i -g @northflank/cli` | |
| - Verified locally with `northflank --version` | |
| - Current verified version: `0.10.16` | |
| ### Login | |
| ```bash | |
| northflank login -n <context-name> -t <token> | |
| ``` | |
| `<token>` must come from the user's Northflank account or team secret | |
| manager. Do not commit it to the repo. | |
| ### Service access commands for `replica-labs/replicalab-ai` | |
| ```bash | |
| northflank forward service --projectId replica-labs --serviceId replicalab-ai | |
| northflank get service logs --tail --projectId replica-labs --serviceId replicalab-ai | |
| northflank ssh service --projectId replica-labs --serviceId replicalab-ai | |
| northflank exec service --projectId replica-labs --serviceId replicalab-ai | |
| northflank upload service file --projectId replica-labs --serviceId replicalab-ai --localPath dir/file.txt --remotePath /home/file.txt | |
| northflank download service file --projectId replica-labs --serviceId replicalab-ai --localPath dir/file.txt --remotePath /home/file.txt | |
| ``` | |
| ### Current Northflank runtime findings (2026-03-09) | |
| - The manual training job `replicalab-train` exists in `replica-labs`, but | |
| `northflank start job run --projectId replica-labs --jobId replicalab-train` | |
| currently fails with `409 No deployment configured`. | |
| - The job still has runtime variables configured, including the older remote | |
| `MODEL_NAME=Qwen/Qwen3-8B`, so even after the missing deployment is fixed the | |
| runtime config should be reviewed before launching training. | |
| - The live service `replicalab-ai` is deployed on the same | |
| `nf-gpu-hack-16-64` billing plan, but a direct probe from inside the | |
| container found no `nvidia-smi` binary and no `/dev/nvidia*` device nodes. | |
| Treat GPU/H100 availability as unverified until a container can prove | |
| hardware visibility from inside the runtime. | |
| ### Current Northflank notebook findings (2026-03-09) | |
| - There is a separate live notebook service in project `notebook-openport`: | |
| `jupyter-pytorch`. | |
| - The active public notebook DNS is | |
| `app--jupyter-pytorch--9y6g97v7czb9.code.run` on port `8888` (`/lab` for the | |
| Jupyter UI). | |
| - Northflank reports that service with GPU config | |
| `gpuType=h100-80`, `gpuCount=1`, and an in-container probe confirmed | |
| `NVIDIA H100 80GB HBM3`. | |
| - The notebook image is `quay.io/jupyter/pytorch-notebook:cuda12-2025-08-18`. | |
| - The notebook currently contains a repo clone and GRPO outputs, but the saved | |
| notebook/log state is not clean: training produced adapter checkpoints | |
| through step 200, then later notebook evaluation/inference failed with a | |
| `string indices must be integers, not 'str'` content-format error. | |
| ### Windows note | |
| Global npm binaries resolve from `C:\Users\ayush\AppData\Roaming\npm` on this | |
| machine. If `northflank` is not found in a new shell, reopen the terminal so | |
| the updated PATH is reloaded. | |
| --- | |
| ## Hand-off To Ayush | |
| **Local server:** | |
| - WebSocket: `ws://localhost:7860/ws` | |
| - REST health: `http://localhost:7860/health` | |
| - Running against: **real env** (not stub) | |
| **Hosted deployment (verified 2026-03-08):** | |
| - Base URL: `https://ayushozha-replicalab.hf.space` | |
| - `/health` returns `200` with `{"status":"ok","env":"real"}` | |
| - WebSocket path: `wss://ayushozha-replicalab.hf.space/ws` | |
| - Full episode tested: propose β accept β reward with real judge scores | |
| --- | |
| ## Troubleshooting | |
| | Issue | Fix | | |
| |-------|-----| | |
| | `ReplicaLabEnv not found` warning at startup | The real env is now available; ensure `replicalab/scoring/rubric.py` is present and `httpx` + `websocket-client` are in `server/requirements.txt` | | |
| | Docker build fails | Re-check `server/requirements.txt` and the Docker build context | | |
| | CORS error from the frontend | Re-check allowed origins in `server/app.py` | | |
| | WebSocket closes after idle time | Send periodic ping messages or reconnect | | |
| | Session not found (REST) | Call `/reset` again to create a new session | | |