psynote123 commited on
Commit
ffa09be
·
verified ·
1 Parent(s): a8d5634

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -3
README.md CHANGED
@@ -1,3 +1,130 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Wan-AI/Wan2.2-T2V-A14B-Diffusers
5
+ base_model_relation: quantized
6
+ pipeline_tag: text-to-video
7
+ ---
8
+
9
+
10
+ # Elastic model: Fastest self-serving models. Wan 2.2
11
+
12
+ Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:
13
+
14
+ * __S__: The fastest model, with accuracy degradation less than 2%.
15
+
16
+
17
+ __Goals of Elastic Models:__
18
+
19
+ * Provide the fastest models and service for self-hosting.
20
+ * Provide flexibility in cost vs quality selection for inference.
21
+ * Provide clear quality and latency benchmarks.
22
+ * Provide interface of HF libraries: transformers and diffusers with a single line of code.
23
+ * Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT.
24
+
25
+ > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
26
+
27
+ -----
28
+ Prompt: Massive ocean waves violently crashing and shattering against jagged rocky cliffs during an intense storm with lightning flashes
29
+
30
+ Resolution: 480x480, Number of frames: 81
31
+
32
+ | S | Original |
33
+ |:-:|:-:|
34
+ | https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/fFhpSm1JdZNxnoSmr6tZ0.mp4 | https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/ctx01OzYgKDBd-N6xsE4E.mp4|
35
+
36
+ ## Inference
37
+
38
+ > Compiled versions are currently available only for 81-frame generations at 480x480 resolution. Other versions are not yet accessible. Stay tuned for updates!
39
+
40
+ To infer our models, you just need to replace `diffusers` import with `elastic_models.diffusers`:
41
+
42
+
43
+ ```python
44
+ import torch
45
+ from elastic_models.diffusers import WanPipeline
46
+ from diffusers.utils import export_to_video
47
+
48
+ model_name = "Wan-AI/Wan2.2-T2V-A14B-Diffusers"
49
+ device = torch.device("cuda")
50
+ dtype = torch.bfloat16
51
+
52
+ pipe = WanPipeline.from_pretrained(
53
+ model_name,
54
+ torch_dtype=dtype,
55
+ mode="S"
56
+ )
57
+ pipe.vae.enable_tiling()
58
+ pipe.vae.enable_slicing()
59
+ pipe.to(device)
60
+
61
+ prompt = "A beautiful woman in a red dress dancing"
62
+
63
+ with torch.no_grad():
64
+ output = pipe(
65
+ prompt=prompt,
66
+ negative_prompt="",
67
+ height=480,
68
+ width=480,
69
+ num_frames=81,
70
+ num_inference_steps=40,
71
+ guidance_scale=3.0,
72
+ guidance_scale_2=2.0,
73
+ generator=torch.Generator("cuda").manual_seed(42),
74
+ )
75
+
76
+ video = output.frames[0]
77
+ export_to_video(video, "wan_output.mp4", fps=16)
78
+ ```
79
+
80
+ ### Installation
81
+
82
+
83
+ __System requirements:__
84
+ * GPUs: H100
85
+ * CPU: AMD, Intel
86
+ * Python: 3.10-3.12
87
+
88
+
89
+ To work with our models just run these lines in your terminal:
90
+
91
+ ```shell
92
+ pip install thestage
93
+ pip install 'thestage-elastic-models[nvidia]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
94
+
95
+ pip install flash_attn==2.7.3 --no-build-isolation
96
+ pip uninstall apex
97
+ pip install tensorrt==10.11.0.33 opencv-python==4.11.0.86 imageio-ffmpeg==0.6.0
98
+ ```
99
+
100
+ Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:
101
+
102
+ ```shell
103
+ thestage config set --api-token <YOUR_API_TOKEN>
104
+ ```
105
+
106
+ Congrats, now you can use accelerated models!
107
+
108
+ ----
109
+
110
+ ## Benchmarks
111
+
112
+ Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms.
113
+
114
+
115
+ ### Latency benchmarks
116
+
117
+ Time in seconds of generation for 480x480 resolution, 81 frames.
118
+
119
+
120
+ | GPU | S | Original |
121
+ |----------|-----|----------|
122
+ | H100 | 90 | 180 |
123
+
124
+
125
+ ## Links
126
+
127
+ * __Platform__: [app.thestage.ai](https://app.thestage.ai)
128
+ <!-- * __Elastic models Github__: [app.thestage.ai](app.thestage.ai) -->
129
+ * __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
130
+ * __Contact email__: contact@thestage.ai