text2sign / README.md
xiaruize's picture
Update README.md
fb59797 verified
metadata
license: mit
tags:
  - sign-language
  - diffusion
  - text-to-video
  - asl
  - how2sign
  - lightweight
metrics:
  - fvd

Text2Sign: Lightweight Diffusion Model for Sign Language Video Generation

This repository contains the pretrained checkpoint and inference code for the Text2Sign model, a lightweight diffusion-based architecture for generating sign language videos from text prompts.

Model Overview

  • Architecture: 3D UNet backbone with DiT (Diffusion Transformer) blocks and a custom Transformer-based text encoder.
  • Dataset: Trained on How2Sign (ASL) video-text pairs.
  • Resolution: 64x64 RGB, 16 frames per clip.
  • Checkpoint: Provided at epoch 70.

Files

  • checkpoint_epoch_70.pt — Pretrained model weights
  • config.py — Model and generation configuration
  • inference.py — Example script for generating sign language videos from text

Usage

  1. Install dependencies:
    pip install torch torchvision pillow matplotlib
    
  2. Run the inference script:
    python inference.py --prompt "Hello world"
    
    This will generate a video for the given prompt and save a filmstrip image.

License

MIT