small-functional-movement-screening / docs /FormScout-FMS-Spec.md
BladeSzaSza's picture
fix: define REPO_NAME in hf_upload.sh (ensure_blade_space referenced it)
4948993 verified
|
Raw
History Blame Contribute Delete
24.6 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

FormScout β€” Functional Movement Screening, scored small

Project specification & architecture documentation Build Small Hackathon (Gradio Γ— Hugging Face) β€” Track: Backyard AI Working title; rename freely. Doc version 0.1, June 2026.


1. One-paragraph pitch

A basketball team's physiotherapist screens players with the Functional Movement Screen (FMS) β€” seven movement patterns, each scored 0–3 by eye. The scoring is slow, subjective, and hard to reproduce across raters or across months. FormScout is a Gradio app that takes a video of an athlete performing an FMS test, extracts 2D and 3D body pose, measures the biomechanics the FMS rubric actually cares about, and produces a 0–3 score with a written rationale and an annotated overlay β€” anchored to the physio's own previously-scored clips. It is a screening aid that standardizes and speeds up the physio's first pass, not a diagnosis and not an injury predictor. Everything runs on models that fit on a laptop.


2. The problem, honestly

The FMS is a seven-test battery (Deep Squat, Hurdle Step, In-Line Lunge, Shoulder Mobility, Active Straight-Leg Raise, Trunk Stability Push-Up, Rotary Stability), each scored 0–3 for a composite 0–21. A score of 0 means pain during the movement and is an automatic red flag for clinical referral. Three of the tests have associated clearing tests (shoulder, spinal extension, spinal flexion) that also force a 0 on pain.

Two facts shape this project and should be stated plainly in the demo and the writeup:

  • Inter-rater reliability is decent but not perfect. Composite-score reliability is moderate-to-good (ICC roughly 0.7–0.8), but novice and less-experienced raters grade component scores inconsistently. This is the real, addressable pain point: variance between raters and over time.
  • Predictive validity for injury is weak/mixed. The popular "≀14 = higher injury risk" cutoff is not a reliable predictor on its own. So FormScout must not be sold as injury prediction.

Where FormScout genuinely helps:

  1. A repeatable, objective digital baseline to track an athlete over a season.
  2. Asymmetry detection (left vs. right), which is one of the FMS's most defensible outputs.
  3. A fast, consistent first-pass / second opinion that reduces rater variance.
  4. Explainability β€” it shows which compensation it saw, not just a number.

This honest framing is also strategic: the Backyard AI track is judged partly on "honest fit between problem and the small-model constraint." Overclaiming clinical power would hurt the submission, not help it.


3. Why this fits the hackathon

Hackathon rule How FormScout satisfies it
Total params ≀ 32B Recommended config sums to ~18B. A portfolio of small specialists beats one monolith β€” which is on-theme for "think small."
Built on Gradio, hosted as a HF Space Gradio app with gr.Video input, a custom-styled results panel, on-Space inference (ZeroGPU or llama.cpp).
Show, Don't Tell Demo video = physio uploads a real player clip, gets a scored overlay in seconds. Social post = before/after of a manual vs. assisted screening session.
Track: Backyard AI The "someone you know" is the team physiotherapist. The deliverable is something they actually use on real players.

Badge targets (aim for all six):

  • πŸ”Œ Off the Grid β€” no cloud APIs; all models served on the Space.
  • 🎯 Well-Tuned β€” the skeletal-temporal scoring head is fine-tuned on the physio's labels and published to the Hub.
  • 🎨 Off-Brand β€” custom Gradio frontend (scorecard UI, video overlay, per-test rubric panel), pushing past default Gradio.
  • πŸ¦™ Llama Champion β€” VLM + embedding model served through llama.cpp (GGUF builds exist for both).
  • πŸ“‘ Sharing is Caring β€” publish the agent trace (one full screening run, agent by agent) to the Hub.
  • πŸ““ Field Notes β€” a blog post on building a clinical-adjacent AQA pipeline under a 32B budget, with the honesty section front and center.

4. Core technical framing: FMS is Action Quality Assessment

Don't reinvent this from scratch. Action Quality Assessment (AQA) is the established field for "score how well a movement was performed." Skeleton-based AQA (sports scoring, surgical-skill and rehab assessment) is the directly relevant lineage. The "Skeletal-Temporal Transformer" idea maps onto the AQA scoring head.

The key design constraint is the tiny labeled dataset (a couple of physio-scored videos). That rules out training a large score regressor from scratch and dictates a hybrid approach:

  1. Deterministic biomechanics carry most of the load. The FMS rubric is, to a large degree, a set of angle and alignment thresholds (e.g. Deep Squat "3" = femur below horizontal, torso parallel to tibia, knees tracking over feet, dowel over feet). These are computable from 3D pose with zero training and are inherently interpretable β€” exactly what earns a physio's trust.
  2. A small learned head (ST-GCN or a compact temporal transformer) refines the score and captures the patterns rules miss. It is small enough to fine-tune on a few labeled clips, especially if pre-trained on public AQA/pose datasets first.
  3. Retrieval over the physio's labeled clips (RAG) gives the language model few-shot anchors at judgment time β€” the right move when you have examples but not enough to train on.
  4. A VLM as the judge/explainer synthesizes rubric + measurements + retrieved exemplars into a final score and a human-readable rationale, and conservatively flags anything pain-related for a human.

5. Parameter budget (the single most important table)

Assume "total parameters" = sum of all model weights in the pipeline. Design to this; confirm the exact interpretation in the Discord AMA.

Recommended config β€” "Portfolio of specialists" (~18B)

Component Model Params Role
2D pose + tracking YOLO26-Pose (L/X) ~0.05B Per-frame 17-keypoint skeletons, multi-person tracking
Segmentation SAM 3.1 (base) ~0.85B Clean athlete mask, occlusion handling, prompt for 3D
3D body SAM 3D Body ~0.7–1B* Single-image 3D mesh β†’ true joint angles, view-invariant
Scoring head ST-GCN / temporal transformer (fine-tuned) ~0.01–0.05B Pose-sequence β†’ candidate 0–3 + confidence
Judge / explainer Qwen3-VL-8B-Instruct 8B Movement ID, rubric reasoning, final score + rationale
Retrieval Qwen3-VL-Embedding-8B 8B Nearest physio-scored reference clips (RAG)
Total ~17.8B Comfortable headroom under 32B

* SAM 3D Body's exact count isn't published prominently β€” verify on the model card. It's SAM-3-family and sub-billion-class; budget impact is small either way. The two 8B Qwen models share the Qwen3-VL-8B backbone (the embedder is built on the instruct model), which is conceptually clean and operationally efficient.

Alternative config β€” "Heavy reasoner" (~28.7B)

Swap the 8B judge for Qwen3.6-27B (multimodal, strong tool-calling, MTP speedups on llama.cpp). Budget then = 27 + ~0.85 + ~1 + small β‰ˆ 28.7B. This leaves no room for the 8B embedder, so you'd drop RAG (or replace it with a sub-0.5B embedder, or use pose-feature similarity for retrieval). Note: Qwen3.6-27B's MTP speculative decoding currently can't run simultaneously with image input (--mmproj), so for vision you run it without MTP.

Recommendation: ship the ~18B portfolio config. RAG over the physio's few labeled clips is worth more than raw reasoning horsepower on this task, the headroom de-risks the budget, and "many small specialists" is the better hackathon story.


6. Model selection rationale

YOLO26-Pose β€” current-generation YOLO pose; single forward pass for detection + keypoints, NMS-free, real-time even on edge. Tiny param cost. It also handles multiple people in frame (important: team videos often have other players/staff visible) and feeds keypoints downstream. Off-the-shelf it predicts COCO human keypoints; can be fine-tuned for custom landmarks (e.g. dowel endpoints) if needed.

SAM 3.1 β€” gives a clean athlete mask and stable multi-object video tracking (Object Multiplex makes it fast). Two jobs: (a) isolate the target athlete from teammates/background so pose and 3D aren't polluted, (b) provide the mask prompt that SAM 3D Body consumes. Concept prompts ("the person in the blue jersey performing the squat") are a bonus for disambiguation.

SAM 3D Body β€” the addition that makes the scores trustworthy. FMS criteria are joint angles and symmetry; 2D pose can't measure these reliably across camera angles (projection ambiguity). 3D mesh recovery from a single image, promptable with the 2D keypoints + mask you already have, yields view-invariant joint angles (the MHR rig even separates skeletal structure from soft-tissue shape, which is convenient for angle extraction). This is the difference between "looks bent" and "femur is 4Β° above horizontal β†’ not a 3."

Skeletal-temporal scoring head β€” your AQA component and your Well-Tuned badge. Recommend a compact ST-GCN (graph conv over the skeleton, temporal conv over frames) over a from-scratch transformer, because it's far more data-efficient on a tiny labeled set. Pre-train on public AQA / pose-action data, then fine-tune on the physio's labels. Output: per-test candidate score + a confidence the judge can weigh.

Qwen3-VL-8B-Instruct β€” the judge. Strong video temporal modeling (Interleaved-MRoPE, timestamp alignment) suits movement clips. It identifies which of the 7 tests is being performed, reads the biomechanics, considers retrieved exemplars and the head's candidate, and emits the final score + rationale + detected compensation. GGUF β†’ llama.cpp β†’ Llama Champion.

Qwen3-VL-Embedding-8B β€” retrieval. Embeds the query clip (or its keyframes/pose-render) and finds the physio's most similar already-scored clips to anchor the judge. Top multimodal retriever on MMEB-V2; same backbone as the judge; GGUF available.


7. Architecture β€” an agentic pipeline

Structured as cooperating specialist agents (maps naturally onto an OFP-style orchestration, with a Director coordinating and quality-gating). Each agent has one job and a typed output.

                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   video upload  ───────▢│  IngestAgent                                  β”‚
                         β”‚  decode, normalize FPS, sample frames         β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         β–Ό
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚  SegmentationAgent  (SAM 3.1)                 β”‚
                         β”‚  athlete mask + track id (reject teammates)   β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό                                                      β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ PoseAgent (YOLO26-Pose)    β”‚                      β”‚ Body3DAgent (SAM 3D Body)  β”‚
  β”‚ 2D keypoints per frame     β”‚ ───keypoints+mask──▢ β”‚ 3D mesh / joint angles     β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β–Ό
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚  MovementClassifierAgent                      β”‚
                         β”‚  which of the 7 FMS tests? (VLM or small CLS) β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό                          β–Ό                           β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ BiomechanicsAgent  β”‚   β”‚ ScoringAgent (ST-GCN)    β”‚   β”‚ RetrievalAgent          β”‚
  β”‚ rubric angles,     β”‚   β”‚ candidate 0–3 + conf     β”‚   β”‚ (Qwen3-VL-Embedding)    β”‚
  β”‚ ROM, symmetry,     β”‚   β”‚ from pose sequence       β”‚   β”‚ k nearest physio clips  β”‚
  β”‚ alignment, timing  β”‚   β”‚                          β”‚   β”‚ + their scores          β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β–Ό
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚  JudgeAgent  (Qwen3-VL-8B)                    β”‚
                         β”‚  rubric + measurements + exemplars + candidateβ”‚
                         β”‚  β†’ final 0–3, rationale, compensation tag,    β”‚
                         β”‚    corrective hint, PAIN/CLEARING β†’ defer      β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         β–Ό
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚  ReportAgent                                  β”‚
                         β”‚  per-test card, composite 0–21, asymmetry     β”‚
                         β”‚  flags, annotated video, exportable PDF       β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent contracts (sketch):

  • IngestAgent β†’ {frames[], fps, duration, n_people}
  • SegmentationAgent β†’ {athlete_track_id, masks[]}
  • PoseAgent β†’ {keypoints_2d[frame][joint]={x,y,conf}}
  • Body3DAgent β†’ {joints_3d[frame][joint]={x,y,z}, mesh_optional}
  • MovementClassifierAgent β†’ {test_name, side: left|right|n/a, confidence}
  • BiomechanicsAgent β†’ {features: {torso_tibia_angle, hip_flexion_deg, knee_valgus_deg, dowel_alignment, L_R_symmetry, ...}}
  • ScoringAgent β†’ {candidate_score: 0–3, confidence}
  • RetrievalAgent β†’ {exemplars: [{clip_id, score, similarity}]}
  • JudgeAgent β†’ {score: 0–3, rationale, compensation_tags[], corrective_hint, needs_human: bool}
  • ReportAgent β†’ {per_test[], composite, asymmetries[], overlay_video, pdf}

Quality gating: if the ST-GCN candidate and the JudgeAgent disagree by β‰₯1 point, or any agent confidence is low, the report marks the test "low confidence β€” physio review recommended." This keeps the human in the loop and is itself a selling point.


8. Scoring methodology, per test

The seven tests reduce to measurable quantities. Build a small rubric module β€” one scoring function per test β€” that consumes the 3D features and returns a score with the triggering reason. Examples:

  • Deep Squat (3): femur below horizontal AND torso parallel to tibia AND knees tracking over feet AND dowel over feet. (2): same but achieved only with heels elevated. (1): criteria unmet even with heels elevated. β†’ all four conditions are angle/alignment checks on the 3D pose.
  • Hurdle Step / In-Line Lunge / Shoulder Mobility / ASLR: bilateral β€” score each side, record the lower as the test score, and always emit the asymmetry even when the score is the same.
  • Trunk Stability Push-Up / Rotary Stability: trunk rigidity / timing of limb movement β€” temporal features from the pose sequence; the ST-GCN head is most valuable here.
  • Pain / clearing tests (0): the system cannot detect pain. Any clearing test, or a visible distress/abort, sets needs_human = true and the test is not auto-scored. Defer to the physio. State this loudly.

Final composite = sum of seven test scores (0–21), plus an asymmetry summary. The number is never shown without its rationale.


9. Data & fine-tuning plan (tiny-dataset survival guide)

You have "a couple" of physio-scored clips. Treat them as gold, not as a training set.

  1. Deterministic backbone first. Get the biomechanics rubric working with no training. Validate the measured angles against the physio's scores qualitatively. This alone may be demo-ready.
  2. Pre-train the ST-GCN on public pose-action / AQA data (action recognition or generic AQA) so it learns temporal movement structure, not FMS labels.
  3. Fine-tune on the physio's clips with heavy augmentation: temporal crops/speed jitter, mirror (left↔right, doubles your bilateral data), camera-angle perturbation in 3D, joint noise. Few-shot, regularized, early-stopped.
  4. Hold out at least one physio-scored clip as a sanity check the judge never sees.
  5. RAG instead of more training. Every labeled clip goes into the embedding index as a scoring anchor. New clips added later improve the system with no retraining β€” a nice longitudinal story for the physio.
  6. Publish the fine-tuned head to the Hub with a model card (β†’ Well-Tuned badge). Include the augmentation recipe and the honest "trained on N clips, treat as assistive" caveat.

Label schema to collect from the physio (if you can get a bit more data): clip_id, athlete_id, test_name, side, score(0–3), pain(bool), compensation_notes, camera_view. Even 20–30 well-labeled clips meaningfully helps.


10. Gradio Space & deployment

UI (targets Off-Brand badge):

  • gr.Video upload (or webcam capture) + a test-type selector (auto-detect, with manual override).
  • Results panel: the 0–3 score as a large dial/patch, the composite 0–21, an asymmetry strip (L/R bars), and the rationale text.
  • The annotated overlay video: skeleton + the specific angle that decided the score drawn on the frame where it mattered.
  • A rubric drawer that shows the official 3/2/1 criteria for the detected test, with the met/unmet conditions checked off.
  • A persistent "Screening aid β€” not a diagnosis. Pain or clearing tests require a clinician." banner.
  • Custom CSS / gr.Server for a non-default look (scout/trail-map theme would rhyme with the hackathon, and with your design instincts).

Compute:

  • ZeroGPU (H200 slice) can host the ~18B portfolio; load pose/SAM/3D eagerly, the VLM + embedder via llama.cpp.
  • For Off the Grid, ensure zero external API calls β€” everything served on-Space.
  • For Llama Champion, route the VLM + embedding through llama.cpp (GGUF builds exist for Qwen3-VL-8B-Instruct, Qwen3-VL-Embedding-8B, and Qwen3.6-27B). On a Space, watch the CUDA/llama-cpp build flags β€” recent hackathon Spaces hit libcudart issues; a CPU-only or pinned-CUDA build is the usual fix.
  • Persist the embedding index and accumulated labels in Space storage for the longitudinal baseline.

11. Clinical safety & ethics (bake this in, don't bolt it on)

  • Not a medical device. Screening aid only. No diagnosis, no injury prediction, no treatment advice beyond generic FMS-style correctives.
  • Pain is out of scope for automatic scoring β€” always defer to the physio.
  • Human-in-the-loop by design: low-confidence and disagreement cases are surfaced, not hidden.
  • Consent & privacy: athlete videos are biometric data. Get consent; don't log/persist clips beyond what the physio approves; document retention in the writeup.
  • Honesty in the demo: show a case the system gets right and one it flags as uncertain. Judges (and physios) trust calibrated tools more than confident ones.

12. Build plan β€” two weekends (June 5–15)

Weekend 1 β€” the spine works end to end:

  • Day 1: Space scaffold, gr.Video in β†’ skeleton overlay out (YOLO26-Pose). Ingest + Segmentation + Pose agents.
  • Day 2: SAM 3D Body integrated; BiomechanicsAgent computing Deep-Squat angles; first deterministic score on a real clip.
  • Goal: upload a squat video, get a rationalized 0–3. This alone is a viable demo.

Midweek: wire the JudgeAgent (Qwen3-VL via llama.cpp), MovementClassifier, and the rubric module for all 7 tests. Attend the AMA β€” confirm the param-sum interpretation.

Weekend 2 β€” make it sing:

  • ST-GCN pre-train + few-shot fine-tune on physio clips; publish to Hub.
  • RetrievalAgent + embedding index over labeled clips.
  • Custom UI polish, asymmetry view, PDF export, safety banners.
  • Record the demo video (physio uses it on a real player), write the social post, publish the agent trace and the blog post.

13. Risks & open questions

  • Param-sum interpretation β€” biggest unknown. The ~18B config is safe under either reading; confirm anyway.
  • SAM 3D Body on a Space β€” verify weights, license, and that it runs within ZeroGPU limits; have a 2D-only fallback (angles from 2D + camera-angle caveats) if it's too heavy.
  • Single-camera angle limits even with 3D β€” note it; recommend a consistent capture protocol (fixed camera position) for the physio, which also improves the longitudinal baseline.
  • Tiny dataset β€” the deterministic rubric must stand on its own so the demo doesn't hinge on the learned head generalizing from a few clips.
  • llama.cpp + vision build on Spaces β€” budget time for the CUDA build dance; CPU fallback for the embedder is fine.
  • Movement misclassification β€” if the wrong test is detected, scoring is meaningless; keep the manual override prominent.

14. Quick reference β€” the stack

Layer Choice Badge it helps
2D pose YOLO26-Pose β€”
Segmentation/track SAM 3.1 β€”
3D biomechanics SAM 3D Body β€”
Learned scoring ST-GCN (fine-tuned, published) Well-Tuned
Judge/explainer Qwen3-VL-8B-Instruct (llama.cpp) Llama Champion
Retrieval Qwen3-VL-Embedding-8B (llama.cpp) Llama Champion
Serving On-Space, no cloud APIs Off the Grid
Frontend Custom Gradio (scout theme) Off-Brand
Trace Published agent run on Hub Sharing is Caring
Writeup Blog post w/ honesty section Field Notes

Total β‰ˆ 18B params. Honest, explainable, human-in-the-loop, runs on a laptop.