SignMotionGPT / README.md
rdz-falcon's picture
Update README.md
a71026f verified
---
title: SignMotionGPT
emoji: ๐Ÿ‘‹
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false
---
### 1) Configure setup script (one time)
Run the setup:
```bash
bash setup_env.sh
```
After setup, defaults are:
- `WORK_DIR` = current directory
- `DATA_JSON_PATH` = `./data/motion_llm_dataset.json`
You can override via environment variables if needed:
```bash
export WORK_DIR=/path/to/workdir
export DATA_JSON_PATH=/path/to/motion_llm_dataset.json
```
## Overview
This repository implements a robust 2-stage training pipeline for motion generation, replicating the high-performance "overfit" test setup:
- **Stage 1**: Motion-only Language Model (MLM) - Pre-training on motion token sequences to learn the "language of motion".
- **Stage 2**: Text-to-Motion Fine-Tuning (T2M) - Supervised fine-tuning to align text prompts with motion sequences.
Key features:
- **Integrated Evaluation**: Automatically computes FID, Diversity, and Multimodality (MIM) metrics.
- **Side-by-Side Visualization**: Generates HTML comparisons of Ground Truth vs Generated motions.
- **Test Set Evaluation**: Can optionally run evaluation on a held-out test set (SMPL-X data).
- **Hugging Face Integration**: Automatic checkpointing and resuming from the Hub.
## Installation
```bash
# Clone the repository
git clone https://github.com/rajvizala/SignMotionGPT.git
cd SignMotionGPT
# Setup Everything
bash setup_env.sh
```
## Dataset Format
Your dataset should be a JSON file with the following structure:
```json
[
{
"text_query": "a person walks forward",
"motion_tokens": "42 18 91 ...",
"participant_id": "P001" // Optional
},
...
]
```
## Quick Start
### 1. Configure Training
Edit `config.py` to set your paths and hyperparameters. Key settings include:
- `DATA_JSON_PATH`: Path to your dataset.
- `MODEL_NAME`: Base model (e.g., "Qwen/Qwen3-0.6B").
- `PIPELINE_OUTPUT_DIR`: Directory for checkpoints and results.
- `HF_TOKEN`: Your Hugging Face token (or set via env var).
### 2. Run Full Pipeline
```bash
python train_pipeline.py
```
This script orchestrates the entire process:
1. **Data Loading & Cleaning**: Deduplicates samples and builds vocabulary.
2. **Stage 1 Training**: Motion Language Modeling (Pre-training).
3. **Stage 2 Training**: Text-to-Motion Fine-Tuning.
4. **Evaluation**: Runs inference on specific words, computes metrics (FID, Diversity, MIM), and generates visualizations.
5. **Test Set Evaluation**: (Optional) Runs evaluation on held-out test data if configured.
### 3. Environment Variables
You can control many aspects via environment variables without editing code:
```bash
# Training Config
export PIPELINE_S1_EPOCHS=20
export PIPELINE_S2_EPOCHS=20
export PIPELINE_S1_BATCH=8
export PIPELINE_S2_BATCH=8
# Hugging Face
export HUGGINGFACE_HUB_TOKEN="your_token"
export HF_UPLOAD_INTERVAL_EPOCHS=2
# Evaluation
export EVALUATION_WORDS="passport,send,library"
export TEST_EVAL_SAMPLE_LIMIT=100
```
## Held-out Test Dataset Evaluation
The pipeline includes integration with `test_dataset_eval.py` to measure performance on an unseen SMPL-X test dataset.
To enable this, ensure `TEST_EVAL_DOWNLOAD_DIR` or `TEST_EVAL_EXTRACT_DIR` are configured in `config.py` or via env vars. The pipeline will attempt to run this after training if data is available.
## Visualization
The pipeline automatically generates side-by-side HTML visualizations in the output directory (`html_visualizations` folder). You can open these in any browser to compare Ground Truth motions with the model's generations.
To manually visualize tokens:
```bash
python visualize.py --tokens "<MOT_BEGIN><motion_177>...<MOT_END>" --output my_anim.html
```