Duplicated from xiaruize/text2sign

Minhndn191
/

text2sign

Model card Files Files and versions

text2sign / README.md

Minhndn191's picture

Duplicate from xiaruize/text2sign

19c3723 about 2 months ago

|

history blame contribute delete

1.17 kB

	---
	license: mit
	tags:
	- sign-language
	- diffusion
	- text-to-video
	- asl
	- how2sign
	- lightweight
	metrics:
	- fvd
	---

	# Text2Sign: Lightweight Diffusion Model for Sign Language Video Generation

	This repository contains the pretrained checkpoint and inference code for the Text2Sign model, a lightweight diffusion-based architecture for generating sign language videos from text prompts.

	## Model Overview
	- Architecture: 3D UNet backbone with DiT (Diffusion Transformer) blocks and a custom Transformer-based text encoder.
	- Dataset: Trained on How2Sign (ASL) video-text pairs.
	- Resolution: 64x64 RGB, 16 frames per clip.
	- Checkpoint: Provided at epoch 70.

	## Files
	- `checkpoint_epoch_70.pt` — Pretrained model weights
	- `config.py` — Model and generation configuration
	- `inference.py` — Example script for generating sign language videos from text

	## Usage
	1. Install dependencies:
	```bash
	pip install torch torchvision pillow matplotlib
	```
	2. Run the inference script:
	```bash
	python inference.py --prompt "Hello world"
	```
	This will generate a video for the given prompt and save a filmstrip image.


	## License
	MIT