SignMotionGPT / README.md
rdz-falcon's picture
Update README.md
a71026f verified

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: SignMotionGPT
emoji: 👋
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false

1) Configure setup script (one time)

Run the setup:

bash setup_env.sh

After setup, defaults are:

  • WORK_DIR = current directory
  • DATA_JSON_PATH = ./data/motion_llm_dataset.json

You can override via environment variables if needed:

export WORK_DIR=/path/to/workdir
export DATA_JSON_PATH=/path/to/motion_llm_dataset.json

Overview

This repository implements a robust 2-stage training pipeline for motion generation, replicating the high-performance "overfit" test setup:

  • Stage 1: Motion-only Language Model (MLM) - Pre-training on motion token sequences to learn the "language of motion".
  • Stage 2: Text-to-Motion Fine-Tuning (T2M) - Supervised fine-tuning to align text prompts with motion sequences.

Key features:

  • Integrated Evaluation: Automatically computes FID, Diversity, and Multimodality (MIM) metrics.
  • Side-by-Side Visualization: Generates HTML comparisons of Ground Truth vs Generated motions.
  • Test Set Evaluation: Can optionally run evaluation on a held-out test set (SMPL-X data).
  • Hugging Face Integration: Automatic checkpointing and resuming from the Hub.

Installation

# Clone the repository
git clone https://github.com/rajvizala/SignMotionGPT.git
cd SignMotionGPT

# Setup Everything
bash setup_env.sh

Dataset Format

Your dataset should be a JSON file with the following structure:

[
  {
    "text_query": "a person walks forward",
    "motion_tokens": "42 18 91 ...",
    "participant_id": "P001"  // Optional
  },
  ...
]

Quick Start

1. Configure Training

Edit config.py to set your paths and hyperparameters. Key settings include:

  • DATA_JSON_PATH: Path to your dataset.
  • MODEL_NAME: Base model (e.g., "Qwen/Qwen3-0.6B").
  • PIPELINE_OUTPUT_DIR: Directory for checkpoints and results.
  • HF_TOKEN: Your Hugging Face token (or set via env var).

2. Run Full Pipeline

python train_pipeline.py

This script orchestrates the entire process:

  1. Data Loading & Cleaning: Deduplicates samples and builds vocabulary.
  2. Stage 1 Training: Motion Language Modeling (Pre-training).
  3. Stage 2 Training: Text-to-Motion Fine-Tuning.
  4. Evaluation: Runs inference on specific words, computes metrics (FID, Diversity, MIM), and generates visualizations.
  5. Test Set Evaluation: (Optional) Runs evaluation on held-out test data if configured.

3. Environment Variables

You can control many aspects via environment variables without editing code:

# Training Config
export PIPELINE_S1_EPOCHS=20
export PIPELINE_S2_EPOCHS=20
export PIPELINE_S1_BATCH=8
export PIPELINE_S2_BATCH=8

# Hugging Face
export HUGGINGFACE_HUB_TOKEN="your_token"
export HF_UPLOAD_INTERVAL_EPOCHS=2

# Evaluation
export EVALUATION_WORDS="passport,send,library"
export TEST_EVAL_SAMPLE_LIMIT=100

Held-out Test Dataset Evaluation

The pipeline includes integration with test_dataset_eval.py to measure performance on an unseen SMPL-X test dataset.

To enable this, ensure TEST_EVAL_DOWNLOAD_DIR or TEST_EVAL_EXTRACT_DIR are configured in config.py or via env vars. The pipeline will attempt to run this after training if data is available.

Visualization

The pipeline automatically generates side-by-side HTML visualizations in the output directory (html_visualizations folder). You can open these in any browser to compare Ground Truth motions with the model's generations.

To manually visualize tokens:

python visualize.py --tokens "<MOT_BEGIN><motion_177>...<MOT_END>" --output my_anim.html