Spaces:
Running
A newer version of the Gradio SDK is available: 6.11.0
title: SignMotionGPT
emoji: 👋
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false
1) Configure setup script (one time)
Run the setup:
bash setup_env.sh
After setup, defaults are:
WORK_DIR= current directoryDATA_JSON_PATH=./data/motion_llm_dataset.json
You can override via environment variables if needed:
export WORK_DIR=/path/to/workdir
export DATA_JSON_PATH=/path/to/motion_llm_dataset.json
Overview
This repository implements a robust 2-stage training pipeline for motion generation, replicating the high-performance "overfit" test setup:
- Stage 1: Motion-only Language Model (MLM) - Pre-training on motion token sequences to learn the "language of motion".
- Stage 2: Text-to-Motion Fine-Tuning (T2M) - Supervised fine-tuning to align text prompts with motion sequences.
Key features:
- Integrated Evaluation: Automatically computes FID, Diversity, and Multimodality (MIM) metrics.
- Side-by-Side Visualization: Generates HTML comparisons of Ground Truth vs Generated motions.
- Test Set Evaluation: Can optionally run evaluation on a held-out test set (SMPL-X data).
- Hugging Face Integration: Automatic checkpointing and resuming from the Hub.
Installation
# Clone the repository
git clone https://github.com/rajvizala/SignMotionGPT.git
cd SignMotionGPT
# Setup Everything
bash setup_env.sh
Dataset Format
Your dataset should be a JSON file with the following structure:
[
{
"text_query": "a person walks forward",
"motion_tokens": "42 18 91 ...",
"participant_id": "P001" // Optional
},
...
]
Quick Start
1. Configure Training
Edit config.py to set your paths and hyperparameters. Key settings include:
DATA_JSON_PATH: Path to your dataset.MODEL_NAME: Base model (e.g., "Qwen/Qwen3-0.6B").PIPELINE_OUTPUT_DIR: Directory for checkpoints and results.HF_TOKEN: Your Hugging Face token (or set via env var).
2. Run Full Pipeline
python train_pipeline.py
This script orchestrates the entire process:
- Data Loading & Cleaning: Deduplicates samples and builds vocabulary.
- Stage 1 Training: Motion Language Modeling (Pre-training).
- Stage 2 Training: Text-to-Motion Fine-Tuning.
- Evaluation: Runs inference on specific words, computes metrics (FID, Diversity, MIM), and generates visualizations.
- Test Set Evaluation: (Optional) Runs evaluation on held-out test data if configured.
3. Environment Variables
You can control many aspects via environment variables without editing code:
# Training Config
export PIPELINE_S1_EPOCHS=20
export PIPELINE_S2_EPOCHS=20
export PIPELINE_S1_BATCH=8
export PIPELINE_S2_BATCH=8
# Hugging Face
export HUGGINGFACE_HUB_TOKEN="your_token"
export HF_UPLOAD_INTERVAL_EPOCHS=2
# Evaluation
export EVALUATION_WORDS="passport,send,library"
export TEST_EVAL_SAMPLE_LIMIT=100
Held-out Test Dataset Evaluation
The pipeline includes integration with test_dataset_eval.py to measure performance on an unseen SMPL-X test dataset.
To enable this, ensure TEST_EVAL_DOWNLOAD_DIR or TEST_EVAL_EXTRACT_DIR are configured in config.py or via env vars. The pipeline will attempt to run this after training if data is available.
Visualization
The pipeline automatically generates side-by-side HTML visualizations in the output directory (html_visualizations folder). You can open these in any browser to compare Ground Truth motions with the model's generations.
To manually visualize tokens:
python visualize.py --tokens "<MOT_BEGIN><motion_177>...<MOT_END>" --output my_anim.html