Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Pose Encoder
|
| 2 |
+
|
| 3 |
+
Trains a shared adapter on top of a frozen [PoseFormerV2](https://github.com/QitaoZhao/PoseFormerV2) encoder for two tasks:
|
| 4 |
+
1. **Metric rating** β `Good` / `Okay` / `Needs work` per exercise metric
|
| 5 |
+
2. **Exercise classification** β which exercise is being performed
|
| 6 |
+
|
| 7 |
+
The trained adapter weights feed into the multimodal MotiVate pipeline.
|
| 8 |
+
|
| 9 |
+
## Pipeline
|
| 10 |
+
|
| 11 |
+
```
|
| 12 |
+
MediaPipe 2D
|
| 13 |
+
β H36M remap + pad/crop + normalize
|
| 14 |
+
β PoseFormerV2 (frozen, 27Γ544)
|
| 15 |
+
β Shared Adapter (trainable, 27Γ256)
|
| 16 |
+
β mean pool
|
| 17 |
+
β Rating Head + Exercise Head
|
| 18 |
+
|
| 19 |
+
Loss = rating_loss + exercise_loss_weight Γ exercise_loss
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
## Data Sources
|
| 23 |
+
|
| 24 |
+
| Source | Path | Used for |
|
| 25 |
+
|---|---|---|
|
| 26 |
+
| `training.csv` / `validation.csv` | `train/unimodal/` | **Train/val split only** β provides `(dataset, clip_id)` pairs. No labels are read from these CSVs. |
|
| 27 |
+
| `pose_data.npz` | `processed/<dataset>/<clip_id>/mediapipe_result/` | **Model input** β MediaPipe 2D pose landmarks per frame. |
|
| 28 |
+
| `raw_gt.csv` | `processed/<dataset>/<clip_id>/pose/` | **Rating labels** β raw pose measurements. Fed to `compute_metrics()` at load time to produce `Good` / `Okay` / `Needs work` per metric. |
|
| 29 |
+
| `metadata.json` | `processed/<dataset>/` | **Exercise labels** β maps each `clip_id` to its exercise name (fallback: infer from `clip_id` prefix). |
|
| 30 |
+
|
| 31 |
+
Labels are **not** pre-computed in the CSVs. They are derived on the fly:
|
| 32 |
+
- **Rating targets**: `raw_gt.csv` β `compute_metrics(df, exercise)` β `evaluate_rating()` β one of `Good`(0) / `Okay`(1) / `Needs work`(2) per metric. Metrics that can't be evaluated get `IGNORE_INDEX = -100` and are excluded from the loss.
|
| 33 |
+
- **Exercise targets**: looked up from `metadata.json` and mapped to a class index via `EXERCISE_TO_ID`.
|
| 34 |
+
|
| 35 |
+
## Setup
|
| 36 |
+
|
| 37 |
+
```bash
|
| 38 |
+
cd MotiVate/train
|
| 39 |
+
bash setup_poseformer.sh # clone + download checkpoint
|
| 40 |
+
bash setup_poseformer.sh --force # re-clone
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
## Training
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
uv run python train/pose_encoder/train_shared_adapter.py --config train/pose_encoder/config.json
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
## Outputs
|
| 50 |
+
|
| 51 |
+
Saved to `checkpoints/shared_adapter/`:
|
| 52 |
+
- `best_shared_adapter.pt` β adapter weights for the multimodal pipeline
|
| 53 |
+
- `best_model.pt` / `last_model.pt` β full checkpoints
|
| 54 |
+
|
| 55 |
+
`val_score = 0.5 Γ (rating_acc + exercise_acc)` β used for scheduling, early stopping, and best checkpoint selection.
|