--- title: SignMotionGPT emoji: 👋 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 6.3.0 app_file: app.py pinned: false --- ### 1) Configure setup script (one time) Run the setup: ```bash bash setup_env.sh ``` After setup, defaults are: - `WORK_DIR` = current directory - `DATA_JSON_PATH` = `./data/motion_llm_dataset.json` You can override via environment variables if needed: ```bash export WORK_DIR=/path/to/workdir export DATA_JSON_PATH=/path/to/motion_llm_dataset.json ``` ## Overview This repository implements a robust 2-stage training pipeline for motion generation, replicating the high-performance "overfit" test setup: - **Stage 1**: Motion-only Language Model (MLM) - Pre-training on motion token sequences to learn the "language of motion". - **Stage 2**: Text-to-Motion Fine-Tuning (T2M) - Supervised fine-tuning to align text prompts with motion sequences. Key features: - **Integrated Evaluation**: Automatically computes FID, Diversity, and Multimodality (MIM) metrics. - **Side-by-Side Visualization**: Generates HTML comparisons of Ground Truth vs Generated motions. - **Test Set Evaluation**: Can optionally run evaluation on a held-out test set (SMPL-X data). - **Hugging Face Integration**: Automatic checkpointing and resuming from the Hub. ## Installation ```bash # Clone the repository git clone https://github.com/rajvizala/SignMotionGPT.git cd SignMotionGPT # Setup Everything bash setup_env.sh ``` ## Dataset Format Your dataset should be a JSON file with the following structure: ```json [ { "text_query": "a person walks forward", "motion_tokens": "42 18 91 ...", "participant_id": "P001" // Optional }, ... ] ``` ## Quick Start ### 1. Configure Training Edit `config.py` to set your paths and hyperparameters. Key settings include: - `DATA_JSON_PATH`: Path to your dataset. - `MODEL_NAME`: Base model (e.g., "Qwen/Qwen3-0.6B"). - `PIPELINE_OUTPUT_DIR`: Directory for checkpoints and results. - `HF_TOKEN`: Your Hugging Face token (or set via env var). ### 2. Run Full Pipeline ```bash python train_pipeline.py ``` This script orchestrates the entire process: 1. **Data Loading & Cleaning**: Deduplicates samples and builds vocabulary. 2. **Stage 1 Training**: Motion Language Modeling (Pre-training). 3. **Stage 2 Training**: Text-to-Motion Fine-Tuning. 4. **Evaluation**: Runs inference on specific words, computes metrics (FID, Diversity, MIM), and generates visualizations. 5. **Test Set Evaluation**: (Optional) Runs evaluation on held-out test data if configured. ### 3. Environment Variables You can control many aspects via environment variables without editing code: ```bash # Training Config export PIPELINE_S1_EPOCHS=20 export PIPELINE_S2_EPOCHS=20 export PIPELINE_S1_BATCH=8 export PIPELINE_S2_BATCH=8 # Hugging Face export HUGGINGFACE_HUB_TOKEN="your_token" export HF_UPLOAD_INTERVAL_EPOCHS=2 # Evaluation export EVALUATION_WORDS="passport,send,library" export TEST_EVAL_SAMPLE_LIMIT=100 ``` ## Held-out Test Dataset Evaluation The pipeline includes integration with `test_dataset_eval.py` to measure performance on an unseen SMPL-X test dataset. To enable this, ensure `TEST_EVAL_DOWNLOAD_DIR` or `TEST_EVAL_EXTRACT_DIR` are configured in `config.py` or via env vars. The pipeline will attempt to run this after training if data is available. ## Visualization The pipeline automatically generates side-by-side HTML visualizations in the output directory (`html_visualizations` folder). You can open these in any browser to compare Ground Truth motions with the model's generations. To manually visualize tokens: ```bash python visualize.py --tokens "..." --output my_anim.html ```