File size: 4,460 Bytes
ffa09be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fa75ffc
ffa09be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d9cbad
 
ffa09be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10676e5
 
 
 
 
 
 
 
 
 
 
 
ffa09be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
license: apache-2.0
base_model:
- Wan-AI/Wan2.2-T2V-A14B-Diffusers
base_model_relation: quantized
pipeline_tag: text-to-video
---


# Elastic model: Fastest self-serving models. Wan 2.2

Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:

* __S__: The fastest model, with accuracy degradation less than 2%.


__Goals of Elastic Models:__

* Provide the fastest models and service for self-hosting.
* Provide flexibility in cost vs quality selection for inference.
* Provide clear quality and latency benchmarks.
* Provide interface of HF libraries: transformers and diffusers with a single line of code.
* Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT.

> It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.

-----
Prompt: Massive ocean waves violently crashing and shattering against jagged rocky cliffs during an intense storm with lightning flashes

Resolution: 480x480, Number of frames: 81

| S | Original |
|:-:|:-:|
| https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/7Z1gFce9lMkOfFKrc8UUk.mp4 | https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/b-JwMpD8LhbbvUdU2kjFE.mp4 |

## Inference

> Compiled versions are currently available only for 81-frame generations at 480x480 resolution. Other versions are not yet accessible. Stay tuned for updates!

To infer our models, you just need to replace `diffusers` import with `elastic_models.diffusers`:


```python
import torch
from elastic_models.diffusers import WanPipeline
from diffusers.utils import export_to_video

model_name = "Wan-AI/Wan2.2-T2V-A14B-Diffusers"
device = torch.device("cuda")
dtype = torch.bfloat16

pipe = WanPipeline.from_pretrained(
    model_name,
    torch_dtype=dtype,
    mode="S"
)
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()
pipe.to(device)

prompt = "A beautiful woman in a red dress dancing"

with torch.no_grad():
    output = pipe(
        prompt=prompt,
        negative_prompt="",
        height=480,
        width=480,
        num_frames=81,
        num_inference_steps=40,
        guidance_scale=3.0,
        guidance_scale_2=2.0,
        generator=torch.Generator("cuda").manual_seed(42),
    )

    video = output.frames[0]
    export_to_video(video, "wan_output.mp4", fps=16)
```

### Installation


__System requirements:__
* GPUs: H100
* CPU: AMD, Intel
* Python: 3.10-3.12


To work with our models just run these lines in your terminal:

```shell
pip install thestage
pip install 'thestage-elastic-models[nvidia]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
pip install transformers==4.52.3
pip install diffusers==0.35.1

pip install flash_attn==2.7.3 --no-build-isolation
pip uninstall apex
pip install tensorrt==10.11.0.33 opencv-python==4.11.0.86 imageio-ffmpeg==0.6.0
```

Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:

```shell
thestage config set --api-token <YOUR_API_TOKEN>
```

Congrats, now you can use accelerated models!

----

## Benchmarks

Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms.

### Quality benchmarks

We used a benchmark VBench (https://github.com/Vchitect/VBench) to evaluate the quality.

| Metric  | S   | Original |
|----------|-----|----------|
| Subject Consistency     | 0.96  | 0.96     |
| Background Consistency     | 0.96  | 0.96     |
| Motion Smoothness     | 0.98  | 0.98     |
| Dynamic Degree     | 0.29  | 0.29     |
| Aesthetic Quality     | 0.62  | 0.62     |
| Imaging Quality     | 0.68  | 0.68     |

### Latency benchmarks

Time in seconds of generation for 480x480 resolution, 81 frames.


| GPU  | S   | Original |
|----------|-----|----------|
| H100     | 90  | 180      |


## Links

* __Platform__: [app.thestage.ai](https://app.thestage.ai)
<!-- * __Elastic models Github__: [app.thestage.ai](app.thestage.ai) -->
* __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
* __Contact email__: contact@thestage.ai