upodate Claude code and hf upload file

by BladeSzaSza - opened 23 days ago

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+476

-95

Files changed (11) hide show

.gitattributes +0 -1
.hfignore +37 -0
CLAUDE.md +26 -10
MODEL_BUDGET.md +5 -1
README.md +118 -51
formscout/agents/classifier.py +0 -1
formscout/config.py +16 -2
formscout/serving/llama_cpp.py +39 -25
scripts/hf_upload.sh +97 -0
scripts/serve_judge.sh +35 -0
tests/test_phase2.py +103 -4

.gitattributes CHANGED Viewed

@@ -35,4 +35,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 docs/FormScout-FMS-Spec.md.pdf filter=lfs diff=lfs merge=lfs -text
 docs/plans/FormScout-Build-Prompt.md.pdf filter=lfs diff=lfs merge=lfs -text
-checkpoints/mediapipe/pose_landmarker_full.task filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 docs/FormScout-FMS-Spec.md.pdf filter=lfs diff=lfs merge=lfs -text
 docs/plans/FormScout-Build-Prompt.md.pdf filter=lfs diff=lfs merge=lfs -text

.hfignore ADDED Viewed

	@@ -0,0 +1,37 @@

+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+dist/
+build/
+.eggs/
+*.egg
+# Virtual environments
+.venv/
+venv/
+env/
+# Secrets / local config
+.env
+.env.*
+# Model weights (managed separately)
+checkpoints/
+*.pt
+*.pth
+*.gguf
+*.bin
+# Run artifacts
+traces/
+*.mp4
+# Dev tooling
+.pytest_cache/
+.ruff_cache/
+.DS_Store
+.claude/
+# Git
+.git/

CLAUDE.md CHANGED Viewed

@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 FormScout is a Gradio app (Hugging Face Space) that scores Functional Movement Screen (FMS) videos 0–3 per test with a written rationale and an annotated overlay. It is a **screening aid** — not a diagnosis, not an injury predictor. Built for the Build Small Hackathon (Backyard AI track). Full product spec is in `docs/FormScout-FMS-Spec.md`; the engineering contract is in `docs/plans/FormScout-Build-Prompt.md`.
-**Current status:** Phase 2 complete. All 7 FMS test rubric scorers, JudgeAgent, MovementClassifierAgent, and ReportAgent are implemented and tested (45/46 passing). Phase 3 is next (ST-GCN fine-tune + RAG retrieval).
 ## Common commands
@@ -27,6 +27,12 @@ pytest tests/test_biomechanics.py::TestBiomechanicsAgent::test_deep_squat_score
 # Lint / format
 ruff check . && ruff format .
 # Run Svelte component tests (when frontend work is added)
 npx vitest run
 ```
@@ -43,7 +49,9 @@ IngestAgent → Pose2DAgent → [Body3DAgent — optional]
 → rubric/score_test() → JudgeAgent → ReportAgent
 ```
-The **Director** (`pipeline.py`) owns the flow. `app.py` creates one `Director()` instance and calls `director.run(video_path, test_name, side)` per submission. The Gradio UI passes `test_name` directly (from dropdown), bypassing the classifier.
 ### The tiering rule (most important invariant)
@@ -53,7 +61,7 @@ The **Director** (`pipeline.py`) owns the flow. `app.py` creates one `Director()
 | Flag | Default | Meaning |
 |------|---------|---------|
-| `ENABLE_JUDGE` | `False` | When False, JudgeAgent falls back to rubric score — no llama.cpp needed |
 | `ENABLE_3D` | `False` | When False, Body3DAgent returns `used=False` immediately |
 | `ENABLE_STGCN` | `False` | Phase 3 — ST-GCN learned scoring head |
 | `ENABLE_RAG` | `False` | Phase 3 — RetrievalAgent exemplar lookup |
@@ -66,6 +74,8 @@ All model IDs, thresholds, k-values, and feature flags live in `config.py` — n
 2. `ENABLE_JUDGE=True` + llama.cpp server unreachable → same fallback, logs a warning
 3. `ENABLE_JUDGE=True` + server available → calls Qwen3-VL-8B-Instruct at `127.0.0.1:8080`
 This means the app is **fully functional without any GPU or llama.cpp** — rubric scoring is pure Python.
 ### Rubric scorers
@@ -99,14 +109,20 @@ Every agent I/O is a frozen dataclass from `formscout/types.py`. Key types:
 `MovementResult` and `JudgeResult` validate their fields in `__post_init__` — passing invalid values raises immediately.
-### YOLO checkpoint location
-`config.YOLO_POSE_MODEL` points to `checkpoints/yolo26/yolo26l-pose.pt` (absolute path). Both `yolo26l-pose.pt` and `yolo26x-pose.pt` are committed to the repo. Models load once at module scope via `_get_model()` in `pose2d.py`.
 ### llama.cpp serving
 `formscout/serving/llama_cpp.py` provides `LlamaCppClient` (VLM, port 8080) and `EmbeddingClient` (embeddings, port 8081). Both check `/health` before use and return safe error dicts when unavailable. Only active when the corresponding `ENABLE_*` flag is True.
 ## Key constraints and invariants
 - **No cloud model APIs.** All inference runs on-Space (ZeroGPU). No OpenAI/Anthropic/Gemini calls.
@@ -132,13 +148,13 @@ Every agent I/O is a frozen dataclass from `formscout/types.py`. Key types:
 | Component | Model | Params | Status |
 |---|---|---|---|
-| 2D pose (primary) | YOLO26l-Pose | 0.026B | Ready (checkpoint committed) |
-| 2D pose (HQ alt) | YOLO26x-Pose | 0.058B | Ready (checkpoint committed) |
-| 2D pose (fallback) | `noahcao/sapiens-pose-coco` | ~0.6B | Access accepted |
 | Segmentation | SAM 3.1 base | ~0.85B | Access accepted |
 | 3D biomechanics | `facebook/sam-3d-body-dinov3` | ~0.84B | **Access ACCEPTED Jun 4 2026** |
 | Learned scoring | ST-GCN (pyskl) | ~0.03B | Phase 3 |
-| Judge + Classifier | Qwen3-VL-8B-Instruct (llama.cpp) | 8B | Ready (ENABLE_JUDGE=False for now) |
 | Retrieval | Qwen3-VL-Embedding-8B (llama.cpp) | 8B | Phase 3 |
 Track the running sum in `MODEL_BUDGET.md`. The two Qwen3-VL-8B models share a backbone.
@@ -156,7 +172,7 @@ The UI uses **Gradio `gr.Blocks`** with custom CSS/theme (`formscout/ui/theme.py
 2. **Phase 1 — Spine:** ✅ Complete. Deep Squat end-to-end.
 3. **Phase 2 — All 7 tests:** ✅ Complete. Classifier, Judge, Report agents; all rubric scorers; Gradio UI.
 4. **Phase 3 — Learned scoring + retrieval:** ST-GCN fine-tune on physio clips, publish to Hub. RetrievalAgent with embedding index.
-5. **Phase 4 — Polish + ship:** Custom Svelte UI components, overlay video, PDF export, agent trace to Hub, blog post.
 ## Known issues

 FormScout is a Gradio app (Hugging Face Space) that scores Functional Movement Screen (FMS) videos 0–3 per test with a written rationale and an annotated overlay. It is a **screening aid** — not a diagnosis, not an injury predictor. Built for the Build Small Hackathon (Backyard AI track). Full product spec is in `docs/FormScout-FMS-Spec.md`; the engineering contract is in `docs/plans/FormScout-Build-Prompt.md`.
+**Current status:** Phase 2 complete. All 7 FMS test rubric scorers, JudgeAgent, MovementClassifierAgent, ReportAgent, PoseVisualizer (overlay video), and a user-selectable pose-model registry are implemented and tested (86/87 passing). Phase 3 is next (ST-GCN fine-tune + RAG retrieval).
 ## Common commands
 # Lint / format
 ruff check . && ruff format .
+# Start the local VLM judge server (llama.cpp, port 8080)
+./scripts/serve_judge.sh
+# Push source tree to the HF model repo + Space (PRs; message from last commit)
+./scripts/hf_upload.sh
 # Run Svelte component tests (when frontend work is added)
 npx vitest run
 ```
 → rubric/score_test() → JudgeAgent → ReportAgent
 ```
+The **Director** (`pipeline.py`) owns the flow. `app.py` creates one `Director()` instance and calls `director.run(video_path, test_name, side, model_key)` per submission. The Gradio UI passes `test_name` directly (from dropdown), bypassing the classifier; `model_key` selects the pose backend from `config.POSE_MODELS`.
+`PoseVisualizer` (`formscout/agents/visualizer.py`) renders the annotated overlay video (skeleton, trails, velocity arrows) from `IngestResult` + `Pose2DResult`. It is called from `app.py` after the pipeline run — it is a UI-layer component, not a Director stage. It returns `None` on failure, never raises.
 ### The tiering rule (most important invariant)
 | Flag | Default | Meaning |
 |------|---------|---------|
+| `ENABLE_JUDGE` | `True` | Judge/Classifier call Qwen3-VL via llama-server; graceful rubric fallback when the server is down |
 | `ENABLE_3D` | `False` | When False, Body3DAgent returns `used=False` immediately |
 | `ENABLE_STGCN` | `False` | Phase 3 — ST-GCN learned scoring head |
 | `ENABLE_RAG` | `False` | Phase 3 — RetrievalAgent exemplar lookup |
 2. `ENABLE_JUDGE=True` + llama.cpp server unreachable → same fallback, logs a warning
 3. `ENABLE_JUDGE=True` + server available → calls Qwen3-VL-8B-Instruct at `127.0.0.1:8080`
+Start the VLM server with `scripts/serve_judge.sh` (downloads live in `checkpoints/qwen3-vl/`, gitignored). To use a fine-tuned GGUF, set `FORMSCOUT_JUDGE_GGUF` (and `FORMSCOUT_JUDGE_MMPROJ` if it ships its own projector) — no code change needed. Multimodal requests go through the OpenAI-compatible `/v1/chat/completions` endpoint (the legacy `/completion` + `image_data` path does not work with modern llama-server).
 This means the app is **fully functional without any GPU or llama.cpp** — rubric scoring is pure Python.
 ### Rubric scorers
 `MovementResult` and `JudgeResult` validate their fields in `__post_init__` — passing invalid values raises immediately.
+### Pose model selection and checkpoints
+`config.POSE_MODELS` is a registry of pose backends: MediaPipe (CPU-friendly), five YOLO26 sizes (n/s/m/l/x), and Sapiens2 variants (Phase 3, need the custom `sapiens` repo installed). `config.DEFAULT_POSE_MODEL` is YOLO26n. The Gradio UI exposes a dropdown built from `config.available_pose_models()` (filters to checkpoints actually present) and passes the chosen `model_key` through `Director.run` to `Pose2DAgent`. `config.YOLO_POSE_MODEL` is a backward-compat alias only.
+Checkpoints are **not** committed (`checkpoints/` is gitignored). `formscout/startup.py:ensure_checkpoints()` downloads missing YOLO26/MediaPipe files from the `silas-therapy/formscout-checkpoints` HF repo once at app startup. Models load once per process and are cached — never inside the inference hot path.
 ### llama.cpp serving
 `formscout/serving/llama_cpp.py` provides `LlamaCppClient` (VLM, port 8080) and `EmbeddingClient` (embeddings, port 8081). Both check `/health` before use and return safe error dicts when unavailable. Only active when the corresponding `ENABLE_*` flag is True.
+### Deploying to Hugging Face
+The repo deploys to both `silas-therapy/small-functional-movement-screening` (model repo) and the Space of the same name (README frontmatter is the Space config). Use `./scripts/hf_upload.sh` — never raw `hf upload .`: the `hf` CLI does **not** read `.hfignore`, so a raw upload hashes the entire `.venv` (~44k files) and pushes torch binaries. The script parses `.hfignore` into `--exclude` globs, preflights the file count, creates PRs on both repos, and auto-switches to `hf upload-large-folder` (resumable, but no PR / no commit message) above 500 files.
 ## Key constraints and invariants
 - **No cloud model APIs.** All inference runs on-Space (ZeroGPU). No OpenAI/Anthropic/Gemini calls.
 | Component | Model | Params | Status |
 |---|---|---|---|
+| 2D pose (primary) | YOLO26-Pose n/s/m/l/x (default: n) | 0.0007–0.058B | Ready (auto-downloaded at startup) |
+| 2D pose (CPU alt) | MediaPipe Pose Landmarker (full) | ~0.004B | Ready (auto-downloaded at startup) |
+| 2D pose (HQ alt) | `facebook/sapiens2-pose-0.4b/0.8b/1b/5b` | 0.4–5B | Phase 3 — needs custom `sapiens` repo |
 | Segmentation | SAM 3.1 base | ~0.85B | Access accepted |
 | 3D biomechanics | `facebook/sam-3d-body-dinov3` | ~0.84B | **Access ACCEPTED Jun 4 2026** |
 | Learned scoring | ST-GCN (pyskl) | ~0.03B | Phase 3 |
+| Judge + Classifier | Qwen3-VL-8B-Instruct (llama.cpp) | 8B | **Online** — `scripts/serve_judge.sh`, ENABLE_JUDGE=True |
 | Retrieval | Qwen3-VL-Embedding-8B (llama.cpp) | 8B | Phase 3 |
 Track the running sum in `MODEL_BUDGET.md`. The two Qwen3-VL-8B models share a backbone.
 2. **Phase 1 — Spine:** ✅ Complete. Deep Squat end-to-end.
 3. **Phase 2 — All 7 tests:** ✅ Complete. Classifier, Judge, Report agents; all rubric scorers; Gradio UI.
 4. **Phase 3 — Learned scoring + retrieval:** ST-GCN fine-tune on physio clips, publish to Hub. RetrievalAgent with embedding index.
+5. **Phase 4 — Polish + ship:** Custom Svelte UI components, PDF export, agent trace to Hub, blog post. (Overlay video already done via `PoseVisualizer`.)
 ## Known issues

MODEL_BUDGET.md CHANGED Viewed

@@ -10,7 +10,7 @@ Running sum must stay ≤ 32B params.
 | Segmentation | SAM 3.1 base | 0.85B |
 | 3D Body (optional) | SAM 3D Body DINOv3-H+ | 0.84B |
 | Scoring Head | ST-GCN (pyskl) | 0.03B |
-| Judge/Classifier | Qwen3-VL-8B-Instruct | 8B |
 | Retrieval | Qwen3-VL-Embedding-8B | 8B |
 | **Total** | | **~18.37B** |
@@ -18,3 +18,7 @@ Headroom: ~13.63B under 32B cap.
 Note: The two Qwen3-VL-8B models share a backbone (counted separately here for safety).
 Only one pose backend runs at a time (YOLO or Sapiens2, not both).

 | Segmentation | SAM 3.1 base | 0.85B |
 | 3D Body (optional) | SAM 3D Body DINOv3-H+ | 0.84B |
 | Scoring Head | ST-GCN (pyskl) | 0.03B |
+| Judge/Classifier | Qwen3-VL-8B-Instruct (Q4_K_M GGUF + F16 mmproj, llama.cpp) | 8B |
 | Retrieval | Qwen3-VL-Embedding-8B | 8B |
 | **Total** | | **~18.37B** |
 Note: The two Qwen3-VL-8B models share a backbone (counted separately here for safety).
 Only one pose backend runs at a time (YOLO or Sapiens2, not both).
+Judge/Classifier serving: `scripts/serve_judge.sh` (llama-server, port 8080).
+Default GGUF: `Qwen/Qwen3-VL-8B-Instruct-GGUF` → `checkpoints/qwen3-vl/` (gitignored).
+Fine-tuned swap: set `FORMSCOUT_JUDGE_GGUF` (+ `FORMSCOUT_JUDGE_MMPROJ`) — no code change.

README.md CHANGED Viewed

@@ -1,51 +1,118 @@
----
-title: FormScout
-emoji: 🏔️
-colorFrom: green
-colorTo: green
-sdk: gradio
-app_file: app.py
-pinned: false
-license: apache-2.0
-short_description: FMS video scoring — movement screen aid
----
-# FormScout
-FMS (Functional Movement Screen) scoring pipeline — a screening aid that scores movement videos 0–3 per test with a written rationale and annotated overlay.
-**⚠️ Screening aid — not a diagnosis. Pain or clearing tests require a clinician.**
-## Quick Start
-```bash
-# Install dependencies
-pip install -r requirements.txt
-# Run headless on a video
-python -m formscout.run sample.mp4
-# Launch Gradio app
-python app.py
-# Run tests
-pytest tests/ -v
-```
-## Architecture
-Typed specialist agents orchestrated by a deterministic Director:
-```
-Ingest → Pose2D → [Body3D optional] → Biomechanics → Rubric Score → [Judge] → Report
-```
-See [CLAUDE.md](CLAUDE.md) for full architecture details.
-## Model Budget
-~18B params total (under 32B cap). See [MODEL_BUDGET.md](MODEL_BUDGET.md).
-## License
-Built for the Build Small Hackathon (Backyard AI track).

+---
+title: FormScout
+emoji: 🏔️
+colorFrom: green
+colorTo: green
+sdk: gradio
+app_file: app.py
+pinned: false
+license: apache-2.0
+short_description: FMS video scoring — movement screen aid
+---
+# FormScout
+FMS (Functional Movement Screen) scoring pipeline — a screening aid that scores movement videos 0–3 per test with a written rationale and annotated overlay.
+**⚠️ Screening aid — not a diagnosis. Pain or clearing tests require a clinician.**
+## Running locally
+### 1. Clone and install
+```bash
+git clone https://huggingface.co/silas-therapy/small-functional-movement-screening
+cd small-functional-movement-screening
+python3 -m venv .venv && source .venv/bin/activate
+pip install -r requirements.txt
+```
+### 2. Start the VLM judge (optional but recommended)
+The judge uses Qwen3-VL-8B-Instruct via llama.cpp. Without it the app falls back to the deterministic rubric score — fully functional, no GPU needed.
+```bash
+# Install llama.cpp once
+brew install llama.cpp
+# Download the model (one-time, ~6 GB)
+python3 -c "
+from huggingface_hub import hf_hub_download
+for f in ['Qwen3VL-8B-Instruct-Q4_K_M.gguf', 'mmproj-Qwen3VL-8B-Instruct-F16.gguf']:
+    hf_hub_download('Qwen/Qwen3-VL-8B-Instruct-GGUF', f, local_dir='checkpoints/qwen3-vl')
+"
+# Start the server (keep this terminal open)
+./scripts/serve_judge.sh
+```
+To use a fine-tuned GGUF instead of the default:
+```bash
+FORMSCOUT_JUDGE_GGUF=/path/to/finetuned.gguf ./scripts/serve_judge.sh
+```
+### 3. Launch the Gradio app
+```bash
+python3 app.py
+# → http://127.0.0.1:7860
+```
+Upload a video, select the FMS test from the dropdown, and click **Analyze**.
+### 4. Headless pipeline (no Gradio)
+```bash
+python3 -m formscout.run sample.mp4
+```
+### 5. Tests
+```bash
+pytest tests/ -v
+```
+### 6. Upload to Hugging Face
+```bash
+# Pushes source to both model repo and Space, opens a PR on each
+./scripts/hf_upload.sh
+# Or with a custom commit message
+./scripts/hf_upload.sh "feat: my change"
+```
+## Architecture
+Typed specialist agents orchestrated by a deterministic Director:
+```
+Ingest → Pose2D → [Body3D optional] → Biomechanics → Rubric Score → [Judge] → Report
+```
+| Agent | Model | Status |
+|---|---|---|
+| Pose2D | YOLO26l-Pose (0.026B) + MediaPipe fallback | ✅ |
+| Body3D | SAM 3D Body DINOv3 (0.84B) | gated, off by default |
+| Judge + Classifier | Qwen3-VL-8B-Instruct via llama.cpp (8B) | ✅ |
+| Scoring Head | ST-GCN (0.03B) | Phase 3 |
+| Retrieval | Qwen3-VL-Embedding-8B (8B) | Phase 3 |
+See [CLAUDE.md](CLAUDE.md) for full architecture and invariants.
+## Feature flags (`formscout/config.py`)
+| Flag | Default | Meaning |
+|---|---|---|
+| `ENABLE_JUDGE` | `True` | VLM judge via llama-server; rubric fallback when server is down |
+| `ENABLE_3D` | `False` | SAM 3D Body — off until integrated |
+| `ENABLE_STGCN` | `False` | Phase 3 |
+| `ENABLE_RAG` | `False` | Phase 3 |
+## Model budget
+~18B params total (under 32B cap). See [MODEL_BUDGET.md](MODEL_BUDGET.md).
+## License
+Apache-2.0. Built for the Build Small Hackathon (Backyard AI track).

formscout/agents/classifier.py CHANGED Viewed

@@ -9,7 +9,6 @@ Gated:  No.
 """
 from __future__ import annotations
-import json
 import logging
 from pathlib import Path

 """
 from __future__ import annotations
 import logging
 from pathlib import Path

formscout/config.py CHANGED Viewed

@@ -3,6 +3,7 @@ FormScout pipeline configuration.
 All model IDs, thresholds, k-values, and feature flags live here.
 No scattered literals elsewhere in the codebase.
 """
 from pathlib import Path
 ROOT = Path(__file__).parent.parent
@@ -95,7 +96,20 @@ SAM_CHECKPOINT = "sam2.1_hiera_base_plus.pt"
 SAM_3D_CHECKPOINT = ROOT / "checkpoints" / "sam-3d-body-dinov3" / "model.ckpt"
 SAM_3D_HF_REPO = "facebook/sam-3d-body-dinov3"
 SAM_3D_MHR_PATH = ROOT / "checkpoints" / "sam-3d-body-dinov3" / "assets" / "mhr_model.pt"
-QWEN_VLM_GGUF = "Qwen3-VL-8B-Instruct-Q4_K_M.gguf"
 QWEN_EMBED_GGUF = "Qwen3-VL-Embedding-8B-Q4_K_M.gguf"
 STGCN_CHECKPOINT = ROOT / "checkpoints" / "stgcn_fms.pth"
@@ -103,7 +117,7 @@ STGCN_CHECKPOINT = ROOT / "checkpoints" / "stgcn_fms.pth"
 ENABLE_3D = False           # SAM 3D Body — access granted Jun 2026, off until integrated
 ENABLE_STGCN = False        # Phase 3
 ENABLE_RAG = False          # Phase 3
-ENABLE_JUDGE = False        # Phase 2
 # ─── Thresholds ──────────────────────────────────────────────────────────────
 MIN_CONFIDENCE = 0.6

 All model IDs, thresholds, k-values, and feature flags live here.
 No scattered literals elsewhere in the codebase.
 """
+import os
 from pathlib import Path
 ROOT = Path(__file__).parent.parent
 SAM_3D_CHECKPOINT = ROOT / "checkpoints" / "sam-3d-body-dinov3" / "model.ckpt"
 SAM_3D_HF_REPO = "facebook/sam-3d-body-dinov3"
 SAM_3D_MHR_PATH = ROOT / "checkpoints" / "sam-3d-body-dinov3" / "assets" / "mhr_model.pt"
+# ─── Judge / Classifier VLM (Qwen3-VL-8B-Instruct via llama.cpp) ────────────
+# Default: stock Qwen3-VL-8B-Instruct Q4_K_M. To swap in a fine-tuned GGUF,
+# set FORMSCOUT_JUDGE_GGUF (and FORMSCOUT_JUDGE_MMPROJ if it has its own
+# projector) — no code change needed.
+_QWEN_DIR = ROOT / "checkpoints" / "qwen3-vl"
+JUDGE_GGUF = Path(os.environ.get(
+    "FORMSCOUT_JUDGE_GGUF", _QWEN_DIR / "Qwen3VL-8B-Instruct-Q4_K_M.gguf"
+))
+JUDGE_MMPROJ = Path(os.environ.get(
+    "FORMSCOUT_JUDGE_MMPROJ", _QWEN_DIR / "mmproj-Qwen3VL-8B-Instruct-F16.gguf"
+))
+JUDGE_HF_REPO = "Qwen/Qwen3-VL-8B-Instruct-GGUF"
+QWEN_VLM_GGUF = str(JUDGE_GGUF)  # backward-compat alias
 QWEN_EMBED_GGUF = "Qwen3-VL-Embedding-8B-Q4_K_M.gguf"
 STGCN_CHECKPOINT = ROOT / "checkpoints" / "stgcn_fms.pth"
 ENABLE_3D = False           # SAM 3D Body — access granted Jun 2026, off until integrated
 ENABLE_STGCN = False        # Phase 3
 ENABLE_RAG = False          # Phase 3
+ENABLE_JUDGE = True         # VLM judge/classifier — falls back to rubric when llama-server is down
 # ─── Thresholds ──────────────────────────────────────────────────────────────
 MIN_CONFIDENCE = 0.6

formscout/serving/llama_cpp.py CHANGED Viewed

@@ -52,49 +52,48 @@ class LlamaCppClient:
         stop: list[str] | None = None,
     ) -> dict[str, Any]:
         """
-        Send a completion request. Returns parsed JSON if the response is JSON,
         otherwise returns {"text": raw_text}.
         Args:
             prompt: The text prompt (system + user combined).
-            images: Optional list of base64-encoded images or file paths.
             max_tokens: Max generation tokens.
             temperature: Sampling temperature.
-            stop: Stop sequences.
         """
         payload: dict[str, Any] = {
-            "prompt": prompt,
-            "n_predict": max_tokens,
             "temperature": temperature,
-            "stop": stop or ["\n\n"],
         }
-        # Add images for multimodal (Qwen3-VL via llama.cpp mmproj)
-        if images:
-            image_data = []
-            for img in images:
-                if Path(img).exists():
-                    with open(img, "rb") as f:
-                        image_data.append({"data": base64.b64encode(f.read()).decode()})
-                else:
-                    # Assume already base64
-                    image_data.append({"data": img})
-            payload["image_data"] = image_data
         try:
             r = requests.post(
-                f"{self.base_url}/completion",
                 json=payload,
                 timeout=_TIMEOUT,
             )
             r.raise_for_status()
             result = r.json()
-            content = result.get("content", "")
-            # Try to parse as JSON
-            try:
-                return json.loads(content)
-            except (json.JSONDecodeError, TypeError):
-                return {"text": content}
         except requests.ConnectionError:
             return {"error": "llama.cpp server not available", "text": ""}
         except requests.Timeout:
@@ -102,6 +101,21 @@ class LlamaCppClient:
         except Exception as e:
             return {"error": str(e), "text": ""}
 class EmbeddingClient:
     """HTTP client for the llama.cpp embedding server."""

         stop: list[str] | None = None,
     ) -> dict[str, Any]:
         """
+        Send a chat-completion request (OpenAI-compatible /v1/chat/completions —
+        required for multimodal: llama-server routes images through the mmproj
+        only on this endpoint). Returns parsed JSON if the response is JSON,
         otherwise returns {"text": raw_text}.
         Args:
             prompt: The text prompt (system + user combined).
+            images: Optional list of base64-encoded JPEGs or file paths.
             max_tokens: Max generation tokens.
             temperature: Sampling temperature.
+            stop: Stop sequences (default: none — JSON output must not be truncated).
         """
+        content: list[dict[str, Any]] = [{"type": "text", "text": prompt}]
+        for img in images or []:
+            if len(img) < 4096 and Path(img).exists():
+                with open(img, "rb") as f:
+                    b64 = base64.b64encode(f.read()).decode()
+            else:
+                b64 = img  # already base64
+            content.append({
+                "type": "image_url",
+                "image_url": {"url": f"data:image/jpeg;base64,{b64}"},
+            })
         payload: dict[str, Any] = {
+            "messages": [{"role": "user", "content": content}],
+            "max_tokens": max_tokens,
             "temperature": temperature,
         }
+        if stop:
+            payload["stop"] = stop
         try:
             r = requests.post(
+                f"{self.base_url}/v1/chat/completions",
                 json=payload,
                 timeout=_TIMEOUT,
             )
             r.raise_for_status()
             result = r.json()
+            text = result["choices"][0]["message"]["content"] or ""
+            return self._parse_json_reply(text)
         except requests.ConnectionError:
             return {"error": "llama.cpp server not available", "text": ""}
         except requests.Timeout:
         except Exception as e:
             return {"error": str(e), "text": ""}
+    @staticmethod
+    def _parse_json_reply(text: str) -> dict[str, Any]:
+        """Parse model output as JSON, tolerating markdown fences."""
+        stripped = text.strip()
+        if stripped.startswith("```"):
+            stripped = stripped.split("\n", 1)[-1]
+            stripped = stripped.rsplit("```", 1)[0].strip()
+        try:
+            parsed = json.loads(stripped)
+            if isinstance(parsed, dict):
+                return parsed
+        except (json.JSONDecodeError, TypeError):
+            pass
+        return {"text": text}
 class EmbeddingClient:
     """HTTP client for the llama.cpp embedding server."""

scripts/hf_upload.sh ADDED Viewed

	@@ -0,0 +1,97 @@

+#!/usr/bin/env bash
+# Upload the FormScout source tree to both the model repo and the Space.
+#
+# Usage:
+#   ./scripts/hf_upload.sh                     # message from last git commit
+#   ./scripts/hf_upload.sh "feat: my change"   # custom message
+#
+# Pushes to:
+#   silas-therapy/small-functional-movement-screening          (model repo)
+#   spaces/silas-therapy/small-functional-movement-screening   (Gradio Space)
+#
+# `hf upload` does NOT read .hfignore — it only honors .gitignore, and only at
+# commit time (after hashing and pre-uploading everything). So we parse
+# .hfignore ourselves into --exclude globs and pass them explicitly.
+#
+# If the filtered file count still exceeds LARGE_THRESHOLD, we fall back to
+# `hf upload-large-folder` (resumable, multi-threaded). Caveats of that mode:
+# no --create-pr and no custom commit message — it commits directly to main
+# in multiple commits.
+set -euo pipefail
+cd "$(dirname "$0")/.."
+MODEL_REPO="silas-therapy/small-functional-movement-screening"
+SPACE_REPO="spaces/silas-therapy/small-functional-movement-screening"
+MSG="${1:-$(git log -1 --pretty=%s)}"
+LARGE_THRESHOLD="${FORMSCOUT_HF_LARGE_THRESHOLD:-500}"
+# Belt-and-suspenders extras on top of .hfignore. `.cache/` is the resume
+# state upload-large-folder writes into the folder being uploaded.
+PATTERNS=(
+    "*.pdf"
+    "**/node_modules/**"
+    ".cache/**"
+)
+# Parse .hfignore into fnmatch-style globs. fnmatch's `*` crosses `/`, but a
+# bare name like `.DS_Store` or `dir/` only matches at the root, so emit both
+# the rooted and `**/`-prefixed forms.
+while IFS= read -r line; do
+    line="${line%%#*}"
+    line="${line#"${line%%[![:space:]]*}"}"
+    line="${line%"${line##*[![:space:]]}"}"
+    [[ -z "$line" ]] && continue
+    if [[ "$line" == */ ]]; then
+        PATTERNS+=("${line}**" "**/${line}**")
+    else
+        PATTERNS+=("$line" "**/$line")
+    fi
+done < .hfignore
+EXCLUDES=()
+for p in "${PATTERNS[@]}"; do
+    EXCLUDES+=(--exclude="$p")
+done
+# Count what would actually be uploaded, using the same filter the hub client
+# applies, so the mode decision matches reality.
+N_FILES=$(python3 - "${PATTERNS[@]}" <<'EOF'
+import sys
+from pathlib import Path
+from huggingface_hub.utils import filter_repo_objects
+patterns = sys.argv[1:]
+files = (
+    str(p) for p in Path(".").rglob("*")
+    if p.is_file() and p.parts[0] != ".git"
+)
+print(len(list(filter_repo_objects(files, ignore_patterns=patterns))))
+EOF
+)
+echo "── $N_FILES files to upload after .hfignore filtering"
+if (( N_FILES == 0 )); then
+    echo "✗ nothing to upload — check .hfignore" >&2
+    exit 1
+fi
+upload_repo() {
+    local repo="$1"
+    if (( N_FILES > LARGE_THRESHOLD )); then
+        echo "── $repo: $N_FILES files > $LARGE_THRESHOLD, using upload-large-folder"
+        echo "   (resumable; commits directly to main — no PR, no custom message)"
+        hf upload-large-folder "$repo" . "${EXCLUDES[@]}"
+    else
+        echo "── uploading to: $repo"
+        hf upload "$repo" . . \
+            "${EXCLUDES[@]}" \
+            --create-pr \
+            --commit-message="$MSG"
+    fi
+}
+upload_repo "$MODEL_REPO"
+upload_repo "$SPACE_REPO"
+echo "✓ done"

scripts/serve_judge.sh ADDED Viewed

	@@ -0,0 +1,35 @@

+#!/usr/bin/env bash
+# Launch llama-server with the FormScout Judge/Classifier VLM.
+#
+# Default model: Qwen3-VL-8B-Instruct Q4_K_M (checkpoints/qwen3-vl/).
+# To serve a fine-tuned GGUF instead, set:
+#   FORMSCOUT_JUDGE_GGUF=/path/to/finetuned.gguf
+#   FORMSCOUT_JUDGE_MMPROJ=/path/to/mmproj.gguf   (only if it ships its own)
+#
+# Requires: brew install llama.cpp
+set -euo pipefail
+# Homebrew bin may be missing from non-interactive shells
+export PATH="/opt/homebrew/bin:/usr/local/bin:$PATH"
+ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+GGUF="${FORMSCOUT_JUDGE_GGUF:-$ROOT/checkpoints/qwen3-vl/Qwen3VL-8B-Instruct-Q4_K_M.gguf}"
+MMPROJ="${FORMSCOUT_JUDGE_MMPROJ:-$ROOT/checkpoints/qwen3-vl/mmproj-Qwen3VL-8B-Instruct-F16.gguf}"
+HOST="${FORMSCOUT_LLAMA_HOST:-127.0.0.1}"
+PORT="${FORMSCOUT_LLAMA_PORT:-8080}"
+if [[ ! -f "$GGUF" ]]; then
+    echo "Model not found: $GGUF" >&2
+    echo "Download it with:" >&2
+    echo "  python3 -c \"from huggingface_hub import hf_hub_download; [hf_hub_download('Qwen/Qwen3-VL-8B-Instruct-GGUF', f, local_dir='$ROOT/checkpoints/qwen3-vl') for f in ['Qwen3VL-8B-Instruct-Q4_K_M.gguf', 'mmproj-Qwen3VL-8B-Instruct-F16.gguf']]\"" >&2
+    exit 1
+fi
+exec llama-server \
+    --model "$GGUF" \
+    --mmproj "$MMPROJ" \
+    --host "$HOST" \
+    --port "$PORT" \
+    --ctx-size 16384 \
+    --n-gpu-layers 99 \
+    --no-warmup

tests/test_phase2.py CHANGED Viewed

@@ -1,9 +1,7 @@
 """Tests for all rubric scorers and Phase 2 agents."""
-import pytest
 from formscout.types import (
-    BiomechFeatures, ScoreResult, MovementResult, IngestResult,
-    Pose2DResult, JudgeResult, ReportResult,
 )
 from formscout.rubric import score_test, SCORERS
 from formscout.rubric.hurdle_step import score_hurdle_step
@@ -178,8 +176,10 @@ class TestRotaryStability:
 # ─── JudgeAgent fallback ─────────────────────────────────────────────────────
 class TestJudgeAgent:
-    def test_fallback_when_judge_disabled(self):
         """When ENABLE_JUDGE=False, judge promotes rubric score."""
         agent = JudgeAgent()
         features = _make_features("deep_squat", angles={"left_femur_from_horizontal_deg": 70.0})
         rubric = ScoreResult(score=3, rationale="all good", confidence=0.9)
@@ -189,6 +189,105 @@ class TestJudgeAgent:
         assert result.score == 3
         assert "[rubric-only]" in result.rationale
 # ─── ReportAgent ──────────────────────────────────────────────────────────────

 """Tests for all rubric scorers and Phase 2 agents."""
 from formscout.types import (
+    BiomechFeatures, ScoreResult, MovementResult, JudgeResult, ReportResult,
 )
 from formscout.rubric import score_test, SCORERS
 from formscout.rubric.hurdle_step import score_hurdle_step
 # ─── JudgeAgent fallback ─────────────────────────────────────────────────────
 class TestJudgeAgent:
+    def test_fallback_when_judge_disabled(self, monkeypatch):
         """When ENABLE_JUDGE=False, judge promotes rubric score."""
+        from formscout import config
+        monkeypatch.setattr(config, "ENABLE_JUDGE", False)
         agent = JudgeAgent()
         features = _make_features("deep_squat", angles={"left_femur_from_horizontal_deg": 70.0})
         rubric = ScoreResult(score=3, rationale="all good", confidence=0.9)
         assert result.score == 3
         assert "[rubric-only]" in result.rationale
+    def test_fallback_when_server_unavailable(self, monkeypatch):
+        """ENABLE_JUDGE=True but llama-server down → rubric fallback, never a crash."""
+        from unittest.mock import PropertyMock, patch
+        from formscout import config
+        monkeypatch.setattr(config, "ENABLE_JUDGE", True)
+        agent = JudgeAgent()
+        with patch.object(type(agent._client), "available", new_callable=PropertyMock, return_value=False):
+            features = _make_features("deep_squat")
+            rubric = ScoreResult(score=2, rationale="heels up", confidence=0.8)
+            movement = MovementResult(test_name="deep_squat", side="na", confidence=1.0)
+            result = agent.run(features, rubric, movement)
+        assert result.score == 2
+        assert "[rubric-only]" in result.rationale
+    def test_vlm_response_parsed_into_judge_result(self, monkeypatch):
+        """ENABLE_JUDGE=True with live client → VLM JSON becomes JudgeResult."""
+        from unittest.mock import PropertyMock, patch
+        from formscout import config
+        monkeypatch.setattr(config, "ENABLE_JUDGE", True)
+        agent = JudgeAgent()
+        vlm_json = {
+            "test": "deep_squat", "side": "na", "score": 2, "needs_human": False,
+            "rationale": "Femur 5° above horizontal; 2D estimate.",
+            "compensation_tags": ["forward_lean"], "corrective_hint": "Sit back into heels.",
+            "confidence": 0.78,
+        }
+        with patch.object(type(agent._client), "available", new_callable=PropertyMock, return_value=True), \
+             patch.object(agent._client, "complete", return_value=vlm_json):
+            features = _make_features("deep_squat")
+            rubric = ScoreResult(score=2, rationale="ok", confidence=0.8)
+            movement = MovementResult(test_name="deep_squat", side="na", confidence=1.0)
+            result = agent.run(features, rubric, movement)
+        assert result.score == 2
+        assert result.compensation_tags == ["forward_lean"]
+        assert result.needs_human is False
+    def test_vlm_needs_human_yields_no_score(self, monkeypatch):
+        """needs_human=True from the VLM must produce score=None."""
+        from unittest.mock import PropertyMock, patch
+        from formscout import config
+        monkeypatch.setattr(config, "ENABLE_JUDGE", True)
+        agent = JudgeAgent()
+        vlm_json = {"score": 1, "needs_human": True, "rationale": "Possible pain.", "confidence": 0.9}
+        with patch.object(type(agent._client), "available", new_callable=PropertyMock, return_value=True), \
+             patch.object(agent._client, "complete", return_value=vlm_json):
+            result = agent.run(
+                _make_features("deep_squat"),
+                ScoreResult(score=1, rationale="x", confidence=0.5),
+                MovementResult(test_name="deep_squat", side="na", confidence=1.0),
+            )
+        assert result.needs_human is True
+        assert result.score is None
+# ─── LlamaCppClient (chat-completions endpoint) ──────────────────────────────
+class TestLlamaCppClient:
+    def test_parse_plain_json(self):
+        from formscout.serving.llama_cpp import LlamaCppClient
+        assert LlamaCppClient._parse_json_reply('{"score": 3}') == {"score": 3}
+    def test_parse_fenced_json(self):
+        from formscout.serving.llama_cpp import LlamaCppClient
+        fenced = '```json\n{"score": 2, "needs_human": false}\n```'
+        assert LlamaCppClient._parse_json_reply(fenced) == {"score": 2, "needs_human": False}
+    def test_parse_non_json_returns_text(self):
+        from formscout.serving.llama_cpp import LlamaCppClient
+        assert LlamaCppClient._parse_json_reply("not json") == {"text": "not json"}
+    def test_complete_posts_chat_endpoint_with_images(self):
+        from unittest.mock import MagicMock, patch
+        from formscout.serving.llama_cpp import LlamaCppClient
+        client = LlamaCppClient(port=8080)
+        resp = MagicMock()
+        resp.json.return_value = {"choices": [{"message": {"content": '{"ok": true}'}}]}
+        resp.raise_for_status.return_value = None
+        with patch("formscout.serving.llama_cpp.requests.post", return_value=resp) as mock_post:
+            result = client.complete("score this", images=["aGVsbG8=" * 600])
+        assert result == {"ok": True}
+        url = mock_post.call_args.args[0] if mock_post.call_args.args else mock_post.call_args.kwargs.get("url")
+        assert url.endswith("/v1/chat/completions")
+        payload = mock_post.call_args.kwargs["json"]
+        content = payload["messages"][0]["content"]
+        assert content[0] == {"type": "text", "text": "score this"}
+        assert content[1]["type"] == "image_url"
+        assert content[1]["image_url"]["url"].startswith("data:image/jpeg;base64,")
+    def test_complete_connection_error_returns_safe_dict(self):
+        from unittest.mock import patch
+        import requests as _requests
+        from formscout.serving.llama_cpp import LlamaCppClient
+        client = LlamaCppClient(port=8080)
+        with patch("formscout.serving.llama_cpp.requests.post", side_effect=_requests.ConnectionError):
+            result = client.complete("hello")
+        assert "error" in result
 # ─── ReportAgent ──────────────────────────────────────────────────────────────