Elastic-Wan2.2-T2V-A14B-Diffusers / README.md

Update README.md

fa75ffc verified 2 months ago

4.46 kB

	---
	license: apache-2.0
	base_model:
	- Wan-AI/Wan2.2-T2V-A14B-Diffusers
	base_model_relation: quantized
	pipeline_tag: text-to-video
	---


	# Elastic model: Fastest self-serving models. Wan 2.2

	Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:

	* __S__: The fastest model, with accuracy degradation less than 2%.


	__Goals of Elastic Models:__

	* Provide the fastest models and service for self-hosting.
	* Provide flexibility in cost vs quality selection for inference.
	* Provide clear quality and latency benchmarks.
	* Provide interface of HF libraries: transformers and diffusers with a single line of code.
	* Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT.

	> It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.

	-----
	Prompt: Massive ocean waves violently crashing and shattering against jagged rocky cliffs during an intense storm with lightning flashes

	Resolution: 480x480, Number of frames: 81

	\| S \| Original \|
	\|:-:\|:-:\|
	\| https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/7Z1gFce9lMkOfFKrc8UUk.mp4 \| https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/b-JwMpD8LhbbvUdU2kjFE.mp4 \|

	## Inference

	> Compiled versions are currently available only for 81-frame generations at 480x480 resolution. Other versions are not yet accessible. Stay tuned for updates!

	To infer our models, you just need to replace `diffusers` import with `elastic_models.diffusers`:


	```python
	import torch
	from elastic_models.diffusers import WanPipeline
	from diffusers.utils import export_to_video

	model_name = "Wan-AI/Wan2.2-T2V-A14B-Diffusers"
	device = torch.device("cuda")
	dtype = torch.bfloat16

	pipe = WanPipeline.from_pretrained(
	model_name,
	torch_dtype=dtype,
	mode="S"
	)
	pipe.vae.enable_tiling()
	pipe.vae.enable_slicing()
	pipe.to(device)

	prompt = "A beautiful woman in a red dress dancing"

	with torch.no_grad():
	output = pipe(
	prompt=prompt,
	negative_prompt="",
	height=480,
	width=480,
	num_frames=81,
	num_inference_steps=40,
	guidance_scale=3.0,
	guidance_scale_2=2.0,
	generator=torch.Generator("cuda").manual_seed(42),
	)

	video = output.frames[0]
	export_to_video(video, "wan_output.mp4", fps=16)
	```

	### Installation


	__System requirements:__
	* GPUs: H100
	* CPU: AMD, Intel
	* Python: 3.10-3.12


	To work with our models just run these lines in your terminal:

	```shell
	pip install thestage
	pip install 'thestage-elastic-models[nvidia]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
	pip install transformers==4.52.3
	pip install diffusers==0.35.1

	pip install flash_attn==2.7.3 --no-build-isolation
	pip uninstall apex
	pip install tensorrt==10.11.0.33 opencv-python==4.11.0.86 imageio-ffmpeg==0.6.0
	```

	Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:

	```shell
	thestage config set --api-token <YOUR_API_TOKEN>
	```

	Congrats, now you can use accelerated models!

	----

	## Benchmarks

	Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms.

	### Quality benchmarks

	We used a benchmark VBench (https://github.com/Vchitect/VBench) to evaluate the quality.

	\| Metric \| S \| Original \|
	\|----------\|-----\|----------\|
	\| Subject Consistency \| 0.96 \| 0.96 \|
	\| Background Consistency \| 0.96 \| 0.96 \|
	\| Motion Smoothness \| 0.98 \| 0.98 \|
	\| Dynamic Degree \| 0.29 \| 0.29 \|
	\| Aesthetic Quality \| 0.62 \| 0.62 \|
	\| Imaging Quality \| 0.68 \| 0.68 \|

	### Latency benchmarks

	Time in seconds of generation for 480x480 resolution, 81 frames.


	\| GPU \| S \| Original \|
	\|----------\|-----\|----------\|
	\| H100 \| 90 \| 180 \|


	## Links

	* __Platform__: [app.thestage.ai](https://app.thestage.ai)
	<!-- * __Elastic models Github__: [app.thestage.ai](app.thestage.ai) -->
	* __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
	* __Contact email__: contact@thestage.ai