naifenn commited on
Commit
3ac4030
Β·
verified Β·
1 Parent(s): 1b22c25

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pose Encoder
2
+
3
+ Trains a shared adapter on top of a frozen [PoseFormerV2](https://github.com/QitaoZhao/PoseFormerV2) encoder for two tasks:
4
+ 1. **Metric rating** β€” `Good` / `Okay` / `Needs work` per exercise metric
5
+ 2. **Exercise classification** β€” which exercise is being performed
6
+
7
+ The trained adapter weights feed into the multimodal MotiVate pipeline.
8
+
9
+ ## Pipeline
10
+
11
+ ```
12
+ MediaPipe 2D
13
+ β†’ H36M remap + pad/crop + normalize
14
+ β†’ PoseFormerV2 (frozen, 27Γ—544)
15
+ β†’ Shared Adapter (trainable, 27Γ—256)
16
+ β†’ mean pool
17
+ β†’ Rating Head + Exercise Head
18
+
19
+ Loss = rating_loss + exercise_loss_weight Γ— exercise_loss
20
+ ```
21
+
22
+ ## Data Sources
23
+
24
+ | Source | Path | Used for |
25
+ |---|---|---|
26
+ | `training.csv` / `validation.csv` | `train/unimodal/` | **Train/val split only** β€” provides `(dataset, clip_id)` pairs. No labels are read from these CSVs. |
27
+ | `pose_data.npz` | `processed/<dataset>/<clip_id>/mediapipe_result/` | **Model input** β€” MediaPipe 2D pose landmarks per frame. |
28
+ | `raw_gt.csv` | `processed/<dataset>/<clip_id>/pose/` | **Rating labels** β€” raw pose measurements. Fed to `compute_metrics()` at load time to produce `Good` / `Okay` / `Needs work` per metric. |
29
+ | `metadata.json` | `processed/<dataset>/` | **Exercise labels** β€” maps each `clip_id` to its exercise name (fallback: infer from `clip_id` prefix). |
30
+
31
+ Labels are **not** pre-computed in the CSVs. They are derived on the fly:
32
+ - **Rating targets**: `raw_gt.csv` β†’ `compute_metrics(df, exercise)` β†’ `evaluate_rating()` β†’ one of `Good`(0) / `Okay`(1) / `Needs work`(2) per metric. Metrics that can't be evaluated get `IGNORE_INDEX = -100` and are excluded from the loss.
33
+ - **Exercise targets**: looked up from `metadata.json` and mapped to a class index via `EXERCISE_TO_ID`.
34
+
35
+ ## Setup
36
+
37
+ ```bash
38
+ cd MotiVate/train
39
+ bash setup_poseformer.sh # clone + download checkpoint
40
+ bash setup_poseformer.sh --force # re-clone
41
+ ```
42
+
43
+ ## Training
44
+
45
+ ```bash
46
+ uv run python train/pose_encoder/train_shared_adapter.py --config train/pose_encoder/config.json
47
+ ```
48
+
49
+ ## Outputs
50
+
51
+ Saved to `checkpoints/shared_adapter/`:
52
+ - `best_shared_adapter.pt` β€” adapter weights for the multimodal pipeline
53
+ - `best_model.pt` / `last_model.pt` β€” full checkpoints
54
+
55
+ `val_score = 0.5 Γ— (rating_acc + exercise_acc)` β€” used for scheduling, early stopping, and best checkpoint selection.