Spaces:
Running on Zero
Running on Zero
File size: 14,431 Bytes
4948993 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | # FormScout β Starter Kit & Resource Pack
Companion to `FormScout-FMS-Spec.md` and `FormScout-Build-Prompt.md`. Every link below was checked. Read Β§1 first β some items are time-sensitive and block the build if you leave them late.
---
## 1. Do this NOW (before the hack window β some take hours to clear)
- [ ] **Request access to the gated Meta checkpoints today.** Both are gated on Hugging Face and approval isn't instant:
- SAM 3 / SAM 3.1 β request on the SAM 3 repos (you need the latest code for the 3.1 checkpoints).
- SAM 3D Body β `facebook/sam-3d-body-dinov3` and `facebook/sam-3d-body-vith` both require an access request, then an authenticated download. **Note:** data/checkpoints are blocked in sanctioned jurisdictions β shouldn't affect SK, but verify.
- [ ] **Put your HF token in the Space secrets** so the Space can pull the gated weights at build time.
- [ ] **Check licenses before you commit to a model** (this affects whether you can even submit):
- Qwen3-VL-8B / Qwen3-VL-Embedding-8B / Qwen3.6 β **Apache-2.0** (clean).
- SAM 3 / SAM 3.1 / SAM 3D Body β **SAM License** (not Apache; read the terms β there are use restrictions).
- Ultralytics YOLO26 β historically **AGPL-3.0** (open-sourcing obligations; commercial license exists). Verify on the model/repo and make sure an AGPL dependency is OK for your submission. If it's a problem, RTMPose/ViTPose are alternatives.
- pyskl / MMAction2 β Apache-2.0.
- KIMORE / UI-PRMD β academic/research terms; check before redistributing anything derived.
- [ ] **Confirm the param-counting rule in the Discord AMA.** Specifically: (a) is it summed across the pipeline or per-model? (b) do **frozen** base models count? (c) does a LoRA adapter's base count? Your ~18B config is safe under the strict reading either way, but get it on record.
---
## 2. Literature package
### 2.1 The framing that wins β "evaluate like an FMS reliability study"
The single most credible move in your writeup: evaluate FormScout the way the clinical literature evaluates human FMS raters. Treat the model as a *second rater* and report **weighted Cohen's ΞΊ** and **ICC** against the physio, the exact metrics the reliability papers use. That instantly makes your results legible to any sports-medicine reader and is far more honest than a vanity accuracy number.
| Resource | What it gives you | Link |
|---|---|---|
| Physiopedia β FMS | Clean overview of the 7 tests + 0β21 scoring | https://www.physio-pedia.com/Functional_Movement_Screen_(FMS) |
| FMS reliability study (JOSPT 2012) | The ICC/ΞΊ numbers and method you'll mirror in your eval | https://www.jospt.org/doi/10.2519/jospt.2012.3838 |
| FMS in elite youth soccer (PMC) | Per-test scores, asymmetries, clearing-test order | https://pmc.ncbi.nlm.nih.gov/articles/PMC5675373/ |
| Clinician's guide to FMS scoring | Per-test 3/2/1 criteria in plain language (rubric source) | https://meloqdevices.com/blogs/meloq-updates/functional-movement-screening |
> **Honesty anchor for the blog post:** the popular "β€14 β injury risk" cutoff has weak/mixed predictive validity. Sell standardization, asymmetry detection, and a repeatable baseline β not prediction.
### 2.2 Action Quality Assessment β surveys & living lists
| Resource | Why | Link |
|---|---|---|
| *A Decade of AQA* (survey, 2025, 200+ papers, PRISMA) | The map of the whole field; start here | https://arxiv.org/abs/2502.02817 Β· code: https://github.com/HaoYin116/Survey_of_AQA |
| *Comprehensive Survey of AQA: Method & Benchmark* (2024) | Taxonomy by modality (video / **skeleton** / multimodal) + unified benchmark | https://arxiv.org/abs/2412.11149 Β· page: https://zhoukanglei.github.io/AQA-Survey |
| Awesome-AQA (ZhouKanglei) | Curated, **has a Medical-Care/rehab section** β your closest analogues | https://github.com/ZhouKanglei/Awesome-AQA |
| Awesome-AQA (Lyman-Smoker) | Second list; catches papers the other misses (FLEX, ExAct, etc.) | https://github.com/Lyman-Smoker/Awesome-AQA |
### 2.3 Skeleton-based scoring β the methods your head will borrow from
| Paper | Relevance to FormScout | Link |
|---|---|---|
| ST-GCN (original) | The graph-over-skeleton + temporal-conv backbone | https://github.com/open-mmlab/mmaction2/blob/main/configs/skeleton/stgcn/README.md |
| AQA via Hierarchical **Pose-guided** Multi-Stage Contrastive Regression (TIP 2025) | Pose-guided + contrastive regression with few labels β close to your setup | https://arxiv.org/abs/2501.03674 |
| Attention-guided Movement **Quality** Assessment + skeletal augmentation (UI-PRMD/KIMORE) | Transformer MQA on clinician-scored rehab data; **augmentation recipe for tiny sets** | https://arxiv.org/pdf/2204.07840 |
| SSL-Rehab: self-supervised 3D skeleton + **LoRA** fine-tune (KIMORE/UI-PRMD) | PretrainβLoRA recipe for small clinical datasets (uses your LoRA muscle) | https://www.sciencedirect.com/science/article/abs/pii/S1077314224003564 |
| Skeleton-based AQA w/ anomaly-aware DTW (Sensors 2025) | DTW alignment + anomaly scoring; cheap, label-light baseline | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12693942/ |
---
## 3. Models & tooling (verified)
| Component | Repo / card | Params | License | Gated? |
|---|---|---:|---|---|
| YOLO26-Pose | https://docs.ultralytics.com/tasks/pose | <0.1B | AGPL-3.0* | no |
| SAM 3.1 | https://github.com/facebookresearch/sam3 | ~0.85B | SAM License | **yes** |
| SAM 3D Body | https://github.com/facebookresearch/sam-3d-body Β· https://huggingface.co/facebook/sam-3d-body-dinov3 | sub-1Bβ | SAM License | **yes** |
| ST-GCN++ / PoseConv3D | https://github.com/kennymckormick/pyskl | ~0.01β0.05B | Apache-2.0 | no |
| Qwen3-VL-8B-Instruct | https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct | 8B | Apache-2.0 | no |
| Qwen3-VL-Embedding-8B | https://huggingface.co/Qwen/Qwen3-VL-Embedding-8B (GGUF: dam2452/...-GGUF) | 8B | Apache-2.0 | no |
| Qwen3.6-27B (alt brain) | https://huggingface.co/unsloth/Qwen3.6-27B-GGUF | 27B | Apache-2.0 | no |
\* verify the current YOLO26 license. β two variants (`dinov3`, `vith`); confirm exact count on the card β budget impact is small either way. SAM 3 itself is 848M.
**Useful extras:** SAM 3D Body uses a Momentum Human Rig (MHR) that separates skeleton from soft-tissue shape β convenient for clean joint-angle extraction. The repo ships a notebook combining SAM 3D Body + SAM 3D Objects in one frame of reference. SAM 3D Body demo: https://www.aidemos.meta.com/segment-anything/editor/convert-body-to-3d
---
## 4. Datasets for transfer / pretraining
You have a couple of labeled clips. Pretrain on clinician-scored movement-quality data first, then few-shot fine-tune. These are the most transferable to FMS (ranked by relevance):
| Dataset | Why it's the closest analogue | Link |
|---|---|---|
| **KIMORE** | Clinician **scores** of low-back-pain rehab exercises (trunk control, multi-plane) β same "score movement quality" task as FMS; partially overlaps Deep Squat / Rotary Stability / TSPU mechanics | https://www.researchgate.net/publication/333791841 (search "KIMORE dataset") |
| **UI-PRMD** | 10 rehab movements, correct vs. incorrect executions; standard MQA benchmark, pairs with KIMORE | search "UI-PRMD University of Idaho Physical Rehabilitation Movements" |
| **Fitness-AQA** | Real gym **squat/deadlift form errors** β directly relevant to Deep Squat compensations | https://github.com/ParitoshParmar/MTL-AQA (links Fitness-AQA) |
| **FLEX** | Large multi-modal fitness AQA dataset | via Lyman-Smoker/Awesome-AQA |
| **MTL-AQA / AQA-7 / FineFS** | General sports AQA for backbone pretraining (diving, skating) | https://github.com/ParitoshParmar/MTL-AQA |
**FMS-specific public video data is scarce** β don't expect a drop-in set. Your physio's clips are the gold; everything above is for pretraining the temporal backbone so it learns movement structure before it ever sees an FMS label.
---
## 5. Build & deploy tooling
| Need | Link |
|---|---|
| Gradio docs (v6) | https://www.gradio.app/docs |
| `gradio.Server` β custom frontend + Gradio backend (Off-Brand badge) | https://www.gradio.app/guides/server-mode Β· blog: https://huggingface.co/blog/introducing-gradio-server |
| Gradio AI coding-assistant skill | `gradio skills add --claude` (PyPI: https://pypi.org/project/gradio/) |
| Gradio changelog (confirm `gr.Walkthrough`, `gr.Navbar`, `gr.Video.playback_position`) | https://www.gradio.app/changelog |
| HF Spaces ZeroGPU (`@spaces.GPU`) | https://huggingface.co/docs/hub/spaces-zerogpu |
| llama.cpp | https://github.com/ggml-org/llama.cpp |
| pyskl (ST-GCN++/PoseConv3D, custom-video tutorial incl. diving48) | https://github.com/kennymckormick/pyskl |
| MMAction2 (broader video understanding) | https://github.com/open-mmlab/mmaction2 |
| Hackathon's own trailheads (ML Intern, Gradio guides) | https://github.com/huggingface/ml-intern |
> **Hackathon-specific gotcha already seen in the org:** another team's Space hit `libcudart.so.12` errors and had to swap llama.cpp for transformers + `spaces.GPU`. Plan for it β isolate the llama.cpp build (CPU-only or pinned-CUDA) and keep a transformers fallback. For the scoring head, a small hand-rolled ST-GCN may deploy more cleanly on a Space than the full MMAction2/pyskl stack β prototype with pyskl, ship lean.
---
## 6. Two artifacts you probably haven't made yet
### 6.1 Data & capture protocol (highest-leverage non-code work)
With a tiny dataset, controlling *how* clips are captured beats any model tweak. Give the physio a one-pager:
- **Camera:** one fixed position, tripod, ~3 m back, lens at hip height, landscape, 1080p/30fps+. Same setup every session β this is what makes 3D consistent and the longitudinal baseline meaningful.
- **Framing:** whole body in frame for the whole rep, including the dowel. Plain-ish background, even lighting, no backlight.
- **One athlete in frame** at scoring time (or note who to track). For bilateral tests, capture **both sides** and label each.
- **Label schema (CSV):** `clip_id, athlete_id, date, test_name, side(L/R/NA), score(0β3), pain(bool), compensation_notes(free text), camera_view, consent_on_file(bool)`.
- **One rep per clip** to start (simplest). If sessions are continuous, you'll need temporal segmentation first β flag it to the build agent at Phase 1.
### 6.2 Evaluation plan
Define "good" before you train, given so few labels:
- **Primary:** Spearman Ο between predicted and physio scores (the AQA-standard metric), plus **exact-match** and **Β±1 accuracy** per test.
- **Clinical credibility:** **weighted Cohen's ΞΊ** and **ICC** of model-vs-physio, reported alongside the human inter-rater numbers from the JOSPT study β i.e. "how does FormScout compare to a second human rater?"
- **Asymmetry:** detection rate of L/R asymmetries the physio flagged (this is one of the FMS's most defensible outputs).
- **Validation:** leave-one-clip-out CV (you can't afford a held-out test split). Keep β₯1 clip the judge never sees for the demo.
- **Calibration:** report when the system says "low confidence / physio review" and show it's right to do so. A well-calibrated, humble tool reads as more trustworthy than a confident one.
---
## 7. Ethics, consent & data handling (EU / Slovakia)
You're filming identifiable athletes, possibly **minors** on a youth team. This is biometric personal data under GDPR β treat it as first-class, and say so in your submission (judges and physios both reward it):
- **Consent:** written consent from each athlete (and a parent/guardian for anyone under 18) before any footage is used. No consent β not in the dataset, not in the demo.
- **Data minimization & retention:** keep only what you need; don't persist raw clips on the Space beyond what's approved; document a retention/deletion policy. Prefer storing derived skeletons over raw video where possible.
- **Demo footage:** use a consenting adult (you, a teammate) for the public demo video rather than a minor athlete, even if you trained on team data privately.
- **Framing:** screening aid, not a medical device; pain/clearing tests always defer to the clinician; human-in-the-loop by design.
---
## 8. The transfer-learning recipe (ties it together)
1. **Backbone pretrain** β ST-GCN++ on a general skeleton-action set (NTU/Kinetics skeletons via pyskl) so it learns motion structure.
2. **Domain adapt** β continue on **KIMORE + UI-PRMD** (clinician-scored movement quality) so it learns *quality*, not just *what action*.
3. **Few-shot fine-tune** β **LoRA** on the physio's FMS clips with heavy augmentation (temporal jitter, **LβR mirror** to double bilateral data, 3D camera-angle perturbation, joint noise). The SSL-Rehab paper (Β§2.3) is your blueprint and it's exactly your LoRA wheelhouse.
4. **Don't over-train the head** β let deterministic biomechanics carry the demo; the learned head and RAG are the refinement and the badges, not the foundation.
---
## 9. Demo & submission storyboard (the "make it sing" 30%)
The submission needs a demo video + social post; "Show, Don't Tell" is a literal rule. A tight 60β90s cut:
1. **0β10s** β the problem: physio eyeballing a squat, scribbling a score. "Same player, two raters, two scores."
2. **10β35s** β upload the clip to FormScout β skeleton overlay β 0β3 with the *deciding angle drawn on the frame* (`playback_position` jump). The "aha" shot.
3. **35β55s** β the scorecard: composite 0β21, the L/R asymmetry strip, a "low confidence β physio review" flag on a borderline case (honesty sells).
4. **55β75s** β the physio reacting / using it on a real player (the Backyard AI "they actually used it" proof).
5. **End card** β "Runs on a laptop. ~18B params. Screening aid, not a diagnosis." Link the Space, the published head, the agent trace, the blog.
Social post: lead with the overlay GIF + the asymmetry-detection angle; tag Gradio/HF; one line of honest framing.
---
*Built to give FormScout the best shot. The two things most teams underinvest in β the capture protocol (Β§6.1) and the honest, clinical-style evaluation (Β§6.2, Β§2.1) β are exactly where this project can out-class flashier entries. Good luck. π*
|