text2sign / README.md
xiaruize's picture
Update README.md
fb59797 verified
---
license: mit
tags:
- sign-language
- diffusion
- text-to-video
- asl
- how2sign
- lightweight
metrics:
- fvd
---
# Text2Sign: Lightweight Diffusion Model for Sign Language Video Generation
This repository contains the pretrained checkpoint and inference code for the Text2Sign model, a lightweight diffusion-based architecture for generating sign language videos from text prompts.
## Model Overview
- **Architecture:** 3D UNet backbone with DiT (Diffusion Transformer) blocks and a custom Transformer-based text encoder.
- **Dataset:** Trained on How2Sign (ASL) video-text pairs.
- **Resolution:** 64x64 RGB, 16 frames per clip.
- **Checkpoint:** Provided at epoch 70.
## Files
- `checkpoint_epoch_70.pt` β€” Pretrained model weights
- `config.py` β€” Model and generation configuration
- `inference.py` β€” Example script for generating sign language videos from text
## Usage
1. Install dependencies:
```bash
pip install torch torchvision pillow matplotlib
```
2. Run the inference script:
```bash
python inference.py --prompt "Hello world"
```
This will generate a video for the given prompt and save a filmstrip image.
## License
MIT