psynote123 commited on
Commit
3c0eac4
·
verified ·
1 Parent(s): bbbadd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -3
README.md CHANGED
@@ -1,3 +1,125 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - stabilityai/stable-diffusion-3.5-large
5
+ base_model_relation: quantized
6
+ pipeline_tag: text-to-image
7
+ ---
8
+
9
+
10
+ # Elastic model: Fastest self-serving models. Stable Diffusion 3.5 Large.
11
+
12
+ Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:
13
+
14
+ * __XL__: Mathematically equivalent neural network, optimized with our DNN compiler.
15
+
16
+ * __L__: Near lossless model, with less than 1% degradation obtained on corresponding benchmarks.
17
+
18
+ * __M__: Faster model, with accuracy degradation less than 1.5%.
19
+
20
+ * __S__: The fastest model, with accuracy degradation less than 2%.
21
+
22
+
23
+ __Goals of Elastic Models:__
24
+
25
+ * Provide the fastest models and service for self-hosting.
26
+ * Provide flexibility in cost vs quality selection for inference.
27
+ * Provide clear quality and latency benchmarks.
28
+ * Provide interface of HF libraries: transformers and diffusers with a single line of code.
29
+ * Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT.
30
+
31
+ > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
32
+
33
+ -----
34
+
35
+ ## Inference
36
+
37
+ Currently, our demo model supports 1024x1024 and batch sizes 1-8. This will be updated in the near future.
38
+ To infer our models, you just need to replace `diffusers` import with `elastic_models.diffusers`:
39
+
40
+ ```python
41
+ import torch
42
+ from elastic_models.diffusers import StableDiffusion3Pipeline
43
+
44
+ model_name = 'stabilityai/stable-diffusion-3.5-large'
45
+ hf_token = ''
46
+ device = torch.device("cuda")
47
+
48
+ pipeline = StableDiffusion3Pipeline.from_pretrained(
49
+ model_name,
50
+ torch_dtype=torch.bfloat16,
51
+ token=hf_token,
52
+ mode='S'
53
+ )
54
+ pipeline.to(device)
55
+
56
+ prompts = ["A cat holding a sign that says hello world"]
57
+ output = pipeline(prompt=prompts)
58
+
59
+ for prompt, output_image in zip(prompts, output.images):
60
+ output_image.save((prompt.replace(' ', '_') + '.png'))
61
+ ```
62
+
63
+ ### Installation
64
+
65
+
66
+ __System requirements:__
67
+ * GPUs: H100, B200
68
+ * CPU: AMD, Intel
69
+ * Python: 3.10-3.12
70
+
71
+
72
+ To work with our models just run these lines in your terminal:
73
+
74
+ ```shell
75
+ pip install thestage
76
+ pip install 'thestage-elastic-models[nvidia]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
77
+
78
+ # or for blackwell support
79
+ pip install 'thestage-elastic-models[blackwell]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
80
+ pip install -U --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
81
+ pip install -U --pre torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
82
+
83
+
84
+ pip install flash_attn==2.7.3 --no-build-isolation
85
+ pip uninstall apex
86
+ ```
87
+
88
+ Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:
89
+
90
+ ```shell
91
+ thestage config set --api-token <YOUR_API_TOKEN>
92
+ ```
93
+
94
+ Congrats, now you can use accelerated models!
95
+
96
+ ----
97
+
98
+ ## Benchmarks
99
+
100
+ Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms.
101
+
102
+ ### Quality benchmarks
103
+
104
+ For quality evaluation we have used: PSNR, SSIM and CLIP score. PSNR and SSIM were computed using outputs of original model.
105
+ | Metric/Model | S | M | L | XL | Original |
106
+ |---------------|---|---|---|----|----------|
107
+ | PSNR | TBD | TBD | TBD | inf | inf |
108
+ | SSIM | TBD | TBD | TBD | 1.0 | 1.0 |
109
+ | CLIP | TBD | TBD | TBD | TBD | TBD|
110
+
111
+
112
+ ### Latency benchmarks
113
+
114
+ Time in seconds to generate one image 1024x1024
115
+ | GPU/Model | S | M | L | XL | Original |
116
+ |-----------|-----|---|---|----|----------|
117
+ | H100 | TBD | TBD | TBD | 3.80 | 6.55 |
118
+
119
+
120
+ ## Links
121
+
122
+ * __Platform__: [app.thestage.ai](https://app.thestage.ai)
123
+ <!-- * __Elastic models Github__: [app.thestage.ai](app.thestage.ai) -->
124
+ * __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
125
+ * __Contact email__: contact@thestage.ai