text2sign / README.md

xiaruize

Update README.md

fb59797 verified 11 days ago

preview code

raw

history blame contribute delete

1.17 kB

metadata

license: mit
tags:
  - sign-language
  - diffusion
  - text-to-video
  - asl
  - how2sign
  - lightweight
metrics:
  - fvd

Text2Sign: Lightweight Diffusion Model for Sign Language Video Generation

This repository contains the pretrained checkpoint and inference code for the Text2Sign model, a lightweight diffusion-based architecture for generating sign language videos from text prompts.

Model Overview

Architecture: 3D UNet backbone with DiT (Diffusion Transformer) blocks and a custom Transformer-based text encoder.
Dataset: Trained on How2Sign (ASL) video-text pairs.
Resolution: 64x64 RGB, 16 frames per clip.
Checkpoint: Provided at epoch 70.

Files

checkpoint_epoch_70.pt — Pretrained model weights
config.py — Model and generation configuration
inference.py — Example script for generating sign language videos from text

Usage

Install dependencies:

pip install torch torchvision pillow matplotlib

Run the inference script:
```
python inference.py --prompt "Hello world"
```
This will generate a video for the given prompt and save a filmstrip image.

License

MIT