|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- Wan-AI/Wan2.2-T2V-A14B-Diffusers |
|
|
base_model_relation: quantized |
|
|
pipeline_tag: text-to-video |
|
|
--- |
|
|
|
|
|
|
|
|
# Elastic model: Fastest self-serving models. Wan 2.2 |
|
|
|
|
|
Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models: |
|
|
|
|
|
* __S__: The fastest model, with accuracy degradation less than 2%. |
|
|
|
|
|
|
|
|
__Goals of Elastic Models:__ |
|
|
|
|
|
* Provide the fastest models and service for self-hosting. |
|
|
* Provide flexibility in cost vs quality selection for inference. |
|
|
* Provide clear quality and latency benchmarks. |
|
|
* Provide interface of HF libraries: transformers and diffusers with a single line of code. |
|
|
* Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT. |
|
|
|
|
|
> It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well. |
|
|
|
|
|
----- |
|
|
Prompt: Massive ocean waves violently crashing and shattering against jagged rocky cliffs during an intense storm with lightning flashes |
|
|
|
|
|
Resolution: 480x480, Number of frames: 81 |
|
|
|
|
|
| S | Original | |
|
|
|:-:|:-:| |
|
|
| https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/7Z1gFce9lMkOfFKrc8UUk.mp4 | https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/b-JwMpD8LhbbvUdU2kjFE.mp4 | |
|
|
|
|
|
## Inference |
|
|
|
|
|
> Compiled versions are currently available only for 81-frame generations at 480x480 resolution. Other versions are not yet accessible. Stay tuned for updates! |
|
|
|
|
|
To infer our models, you just need to replace `diffusers` import with `elastic_models.diffusers`: |
|
|
|
|
|
|
|
|
```python |
|
|
import torch |
|
|
from elastic_models.diffusers import WanPipeline |
|
|
from diffusers.utils import export_to_video |
|
|
|
|
|
model_name = "Wan-AI/Wan2.2-T2V-A14B-Diffusers" |
|
|
device = torch.device("cuda") |
|
|
dtype = torch.bfloat16 |
|
|
|
|
|
pipe = WanPipeline.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=dtype, |
|
|
mode="S" |
|
|
) |
|
|
pipe.vae.enable_tiling() |
|
|
pipe.vae.enable_slicing() |
|
|
pipe.to(device) |
|
|
|
|
|
prompt = "A beautiful woman in a red dress dancing" |
|
|
|
|
|
with torch.no_grad(): |
|
|
output = pipe( |
|
|
prompt=prompt, |
|
|
negative_prompt="", |
|
|
height=480, |
|
|
width=480, |
|
|
num_frames=81, |
|
|
num_inference_steps=40, |
|
|
guidance_scale=3.0, |
|
|
guidance_scale_2=2.0, |
|
|
generator=torch.Generator("cuda").manual_seed(42), |
|
|
) |
|
|
|
|
|
video = output.frames[0] |
|
|
export_to_video(video, "wan_output.mp4", fps=16) |
|
|
``` |
|
|
|
|
|
### Installation |
|
|
|
|
|
|
|
|
__System requirements:__ |
|
|
* GPUs: H100 |
|
|
* CPU: AMD, Intel |
|
|
* Python: 3.10-3.12 |
|
|
|
|
|
|
|
|
To work with our models just run these lines in your terminal: |
|
|
|
|
|
```shell |
|
|
pip install thestage |
|
|
pip install 'thestage-elastic-models[nvidia]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple |
|
|
pip install transformers==4.52.3 |
|
|
pip install diffusers==0.35.1 |
|
|
|
|
|
pip install flash_attn==2.7.3 --no-build-isolation |
|
|
pip uninstall apex |
|
|
pip install tensorrt==10.11.0.33 opencv-python==4.11.0.86 imageio-ffmpeg==0.6.0 |
|
|
``` |
|
|
|
|
|
Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows: |
|
|
|
|
|
```shell |
|
|
thestage config set --api-token <YOUR_API_TOKEN> |
|
|
``` |
|
|
|
|
|
Congrats, now you can use accelerated models! |
|
|
|
|
|
---- |
|
|
|
|
|
## Benchmarks |
|
|
|
|
|
Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms. |
|
|
|
|
|
### Quality benchmarks |
|
|
|
|
|
We used a benchmark VBench (https://github.com/Vchitect/VBench) to evaluate the quality. |
|
|
|
|
|
| Metric | S | Original | |
|
|
|----------|-----|----------| |
|
|
| Subject Consistency | 0.96 | 0.96 | |
|
|
| Background Consistency | 0.96 | 0.96 | |
|
|
| Motion Smoothness | 0.98 | 0.98 | |
|
|
| Dynamic Degree | 0.29 | 0.29 | |
|
|
| Aesthetic Quality | 0.62 | 0.62 | |
|
|
| Imaging Quality | 0.68 | 0.68 | |
|
|
|
|
|
### Latency benchmarks |
|
|
|
|
|
Time in seconds of generation for 480x480 resolution, 81 frames. |
|
|
|
|
|
|
|
|
| GPU | S | Original | |
|
|
|----------|-----|----------| |
|
|
| H100 | 90 | 180 | |
|
|
|
|
|
|
|
|
## Links |
|
|
|
|
|
* __Platform__: [app.thestage.ai](https://app.thestage.ai) |
|
|
<!-- * __Elastic models Github__: [app.thestage.ai](app.thestage.ai) --> |
|
|
* __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI) |
|
|
* __Contact email__: contact@thestage.ai |
|
|
|