Spaces:

rdz-falcon
/

SignMotionGPT

Running

App Files Files Community

SignMotionGPT / README.md

rdz-falcon

Update README.md

a71026f verified 3 months ago

preview code

raw

history blame contribute delete

3.73 kB

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

metadata

title: SignMotionGPT
emoji: 👋
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false

1) Configure setup script (one time)

Run the setup:

bash setup_env.sh

After setup, defaults are:

WORK_DIR = current directory
DATA_JSON_PATH = ./data/motion_llm_dataset.json

You can override via environment variables if needed:

export WORK_DIR=/path/to/workdir
export DATA_JSON_PATH=/path/to/motion_llm_dataset.json

Overview

This repository implements a robust 2-stage training pipeline for motion generation, replicating the high-performance "overfit" test setup:

Stage 1: Motion-only Language Model (MLM) - Pre-training on motion token sequences to learn the "language of motion".
Stage 2: Text-to-Motion Fine-Tuning (T2M) - Supervised fine-tuning to align text prompts with motion sequences.

Key features:

Integrated Evaluation: Automatically computes FID, Diversity, and Multimodality (MIM) metrics.
Side-by-Side Visualization: Generates HTML comparisons of Ground Truth vs Generated motions.
Test Set Evaluation: Can optionally run evaluation on a held-out test set (SMPL-X data).
Hugging Face Integration: Automatic checkpointing and resuming from the Hub.

Installation

# Clone the repository
git clone https://github.com/rajvizala/SignMotionGPT.git
cd SignMotionGPT

# Setup Everything
bash setup_env.sh

Dataset Format

Your dataset should be a JSON file with the following structure:

[
  {
    "text_query": "a person walks forward",
    "motion_tokens": "42 18 91 ...",
    "participant_id": "P001"  // Optional
  },
  ...
]

Quick Start

1. Configure Training

Edit config.py to set your paths and hyperparameters. Key settings include:

DATA_JSON_PATH: Path to your dataset.
MODEL_NAME: Base model (e.g., "Qwen/Qwen3-0.6B").
PIPELINE_OUTPUT_DIR: Directory for checkpoints and results.
HF_TOKEN: Your Hugging Face token (or set via env var).

2. Run Full Pipeline

python train_pipeline.py

This script orchestrates the entire process:

Data Loading & Cleaning: Deduplicates samples and builds vocabulary.
Stage 1 Training: Motion Language Modeling (Pre-training).
Stage 2 Training: Text-to-Motion Fine-Tuning.
Evaluation: Runs inference on specific words, computes metrics (FID, Diversity, MIM), and generates visualizations.
Test Set Evaluation: (Optional) Runs evaluation on held-out test data if configured.

3. Environment Variables

You can control many aspects via environment variables without editing code:

# Training Config
export PIPELINE_S1_EPOCHS=20
export PIPELINE_S2_EPOCHS=20
export PIPELINE_S1_BATCH=8
export PIPELINE_S2_BATCH=8

# Hugging Face
export HUGGINGFACE_HUB_TOKEN="your_token"
export HF_UPLOAD_INTERVAL_EPOCHS=2

# Evaluation
export EVALUATION_WORDS="passport,send,library"
export TEST_EVAL_SAMPLE_LIMIT=100

Held-out Test Dataset Evaluation

The pipeline includes integration with test_dataset_eval.py to measure performance on an unseen SMPL-X test dataset.

To enable this, ensure TEST_EVAL_DOWNLOAD_DIR or TEST_EVAL_EXTRACT_DIR are configured in config.py or via env vars. The pipeline will attempt to run this after training if data is available.

Visualization

The pipeline automatically generates side-by-side HTML visualizations in the output directory (html_visualizations folder). You can open these in any browser to compare Ground Truth motions with the model's generations.

To manually visualize tokens:

python visualize.py --tokens "<MOT_BEGIN><motion_177>...<MOT_END>" --output my_anim.html