neuroeng commited on
Commit
1fbd5f6
·
verified ·
1 Parent(s): c20103a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -20
README.md CHANGED
@@ -11,6 +11,8 @@ pipeline_tag: text-to-image
11
 
12
  ## Overview
13
 
 
 
14
  ElasticModels are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement, routing different compression algorithms to different layers. For each model, we have produced a series of optimized models:
15
 
16
  - **XL**: Mathematically equivalent neural network, optimized with our DNN compiler.
@@ -20,12 +22,15 @@ ElasticModels are the models produced by TheStage AI ANNA: Automated Neural Netw
20
 
21
  Models can be accessed via TheStage AI Python SDK: ElasticModels, or deployed as Docker containers with REST API endpoints (see Deploy section).
22
 
 
 
23
  ---
24
 
25
- ## Installation
26
 
27
  ### System Requirements
28
 
 
 
29
  | **Property**| **Value** |
30
  | --- | --- |
31
  | **GPU** | L40s, RTX 5090, H100, B200 |
@@ -36,6 +41,8 @@ Models can be accessed via TheStage AI Python SDK: ElasticModels, or deployed as
36
 
37
  ### TheStage AI Access token setup
38
 
 
 
39
  Install TheStage AI CLI and setup API token:
40
 
41
  ```bash
@@ -45,6 +52,8 @@ thestage config set --access-token <YOUR_ACCESS_TOKEN>
45
 
46
  ### ElasticModels installation
47
 
 
 
48
  Install TheStage Elastic Models package:
49
 
50
  ```bash
@@ -52,9 +61,10 @@ pip install 'thestage-elastic-models[nvidia]' \
52
  --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
53
  ```
54
 
 
 
55
  ---
56
 
57
- ## Usage example
58
 
59
  Elastic Models provides the same interface as HuggingFace Diffusers. Here is an example of how to use the FLUX.1-dev model:
60
 
@@ -84,9 +94,10 @@ for prompt, output_image in zip(prompts, output.images):
84
  ```
85
 
86
 
 
 
87
  ---
88
 
89
- ## Quality Benchmarks
90
 
91
  We have used PartiPrompts and DrawBench datasets to evaluate the quality of images generated by different sizes of FLUX.1-dev models (S, M, L, XL) compared to the original model. The evaluation metrics include ARNIQA, CLIP IQA, PSNR, SSIM, and VQA Faithfulness.
92
 
@@ -94,6 +105,8 @@ We have used PartiPrompts and DrawBench datasets to evaluate the quality of imag
94
 
95
  ### Quality Benchmark Results
96
 
 
 
97
  | **Metric/Model Size**| **S**| **M**| **L**| **XL**| **Original** |
98
  | --- | --- | --- | --- | --- | --- |
99
  | **ARNIQA (PartiPrompts)** | 64.1 | 63.2 | 61.9 | 66.8 | 66.9 |
@@ -106,17 +119,19 @@ We have used PartiPrompts and DrawBench datasets to evaluate the quality of imag
106
  | **SSIM (PartiPrompts)** | 0.72 | 0.72 | 0.76 | 1.0 | 1.0 |
107
 
108
 
 
 
109
  ---
110
 
111
- ## Datasets
112
 
113
  - **PartiPrompts**: A benchmark dataset created by Google Research, containing 1,632 diverse and challenging prompts that test various aspects of text-to-image generation models. It includes categories such as abstract concepts, complex compositions, properties and attributes, counting and numbers, text rendering, artistic styles, and fine-grained details.
114
 
115
  - **DrawBench**: A comprehensive benchmark dataset developed by Google Research, containing 200 carefully curated prompts designed to test specific capabilities and challenge areas of diffusion models. It includes categories such as colors, counting, conflicting requirements, DALL-E inspired prompts, detailed descriptions, misspellings, positional relationships, rare words, Reddit user prompts, and text generation.
116
 
 
 
117
  ---
118
 
119
- ## Metrics
120
 
121
  - **ARNIQA**: No-reference image quality assessment metric that predicts perceptual quality without reference images.
122
  - **CLIP_IQA**: No-reference image quality metric using contrastive learning to assess image quality without references.
@@ -125,9 +140,10 @@ We have used PartiPrompts and DrawBench datasets to evaluate the quality of imag
125
  - **SSIM**: Structural Similarity Index measuring perceptual similarity between generated by accelerated model and original model images.
126
 
127
 
 
 
128
  ---
129
 
130
- ## Latency Benchmarks
131
 
132
  We have measured the latency of different sizes of FLUX.1-dev model (S, M, L, XL, original) on various GPUs. The measurements were taken for generating images of size 1024x1024 pixels.
133
 
@@ -135,6 +151,8 @@ We have measured the latency of different sizes of FLUX.1-dev model (S, M, L, XL
135
 
136
  ### Latency Benchmark Results
137
 
 
 
138
  Latency (in seconds) for generating a 1024x1024 image using different model sizes on various hardware setups.
139
 
140
  | **GPU/Model Size**| **S**| **M**| **L**| **XL**| **Original** |
@@ -145,9 +163,10 @@ Latency (in seconds) for generating a 1024x1024 image using different model size
145
  | **GeForce RTX 5090** | 5.79 | N/A | N/A | N/A | N/A |
146
 
147
 
 
 
148
  ---
149
 
150
- ## Benchmarking Methodology
151
 
152
  The benchmarking was performed on a single GPU with a batch size of 1. Each model was run for 10 iterations, and the average latency was calculated.
153
 
@@ -163,9 +182,10 @@ The benchmarking was performed on a single GPU with a batch size of 1. Each mode
163
  > - Record the end time and calculate the latency for that iteration.
164
  > 5. Calculate the average latency over all iterations.
165
 
 
 
166
  ---
167
 
168
- ## Reproduce benchmarking
169
 
170
  ```python
171
  import torch
@@ -224,9 +244,10 @@ print(f"Average Latency over {num_runs} runs: {average_latency} seconds")
224
  ```
225
 
226
 
 
 
227
  ---
228
 
229
- ## Serving with Docker Image
230
 
231
  For serving with Nvidia GPUs, we provide ready-to-go Docker containers with OpenAI-compatible API endpoints.
232
  Using our containers you can set up an inference endpoint on any desired cloud/serverless providers as well as on-premise servers.
@@ -234,15 +255,12 @@ You can also use this container to run inference through TheStage AI platform.
234
 
235
  ### Prebuilt image from ECR
236
 
237
- | **GPU** | **Docker image name** |
238
- | --- | --- |
239
- | H100, L40s | `public.ecr.aws/i3f7g5s7/thestage/elastic-models:0.1.2-diffusers-nvidia-24.09b` |
240
- | B200, RTX 5090 | `public.ecr.aws/i3f7g5s7/thestage/elastic-models:0.1.2-diffusers-blackwell-24.09b` |
241
 
242
- Pull docker image for your Nvidia GPU and start inference container:
243
 
244
  ```bash
245
- docker pull <IMAGE_NAME>
246
  ```
247
  ```bash
248
  docker run --rm -ti \
@@ -255,7 +273,7 @@ docker run --rm -ti \
255
  -e HUGGINGFACE_ACCESS_TOKEN=<HUGGINGFACE_ACCESS_TOKEN> \
256
  -e THESTAGE_AUTH_TOKEN=<THESTAGE_ACCESS_TOKEN> \
257
  -v /mnt/hf_cache:/root/.cache/huggingface \
258
- <IMAGE_NAME_DEPENDING_ON_YOUR_GPU>
259
  ```
260
 
261
  | **Parameter** | **Description** |
@@ -265,11 +283,11 @@ docker run --rm -ti \
265
  | `<HUGGINGFACE_ACCESS_TOKEN>` | Hugging Face access token. |
266
  | `<THESTAGE_ACCESS_TOKEN>` | TheStage token generated on the platform (Profile -> Access tokens). |
267
  | `<AUTH_TOKEN>` | Token for endpoint authentication. You can set it to any random string; it must match the value used by the client. |
268
- | `<IMAGE_NAME>` | Image name which you have pulled. |
 
269
 
270
  ---
271
 
272
- ## Invocation
273
 
274
  You can invoke the endpoint using CURL as follows:
275
 
@@ -343,16 +361,21 @@ with open("thestage_image.webp", "wb") as f:
343
  f.write(response.content)
344
  ```
345
 
 
 
346
  ---
347
 
348
- ## Endpoint Parameters
349
 
350
  ### Method
351
 
 
 
352
  > **POST** `/v1/images/generations`
353
 
354
  ### Header Parameters
355
 
 
 
356
  > `Authorization`: `string`
357
  >
358
  > Bearer token for authentication. Should match the `AUTH_TOKEN` set during container startup.
@@ -367,6 +390,8 @@ with open("thestage_image.webp", "wb") as f:
367
 
368
  ### Input Body
369
 
 
 
370
  > `prompt` : `string`
371
  >
372
  > The text prompt to generate an image for.
@@ -400,14 +425,17 @@ with open("thestage_image.webp", "wb") as f:
400
  >
401
  > Guidance scale for classifier-free guidance. Higher values increase adherence to the prompt.
402
 
 
 
403
  ---
404
 
405
- ## Deploy on Modal
406
 
407
  For more details please use the tutorial [Modal deployment](https://docs.thestage.ai/tutorials/source/modal_thestage.html)
408
 
409
  ### Clone modal serving code
410
 
 
 
411
  ```shell
412
  git clone https://github.com/TheStageAI/ElasticModels.git
413
  cd ElasticModels/examples/modal
@@ -415,6 +443,8 @@ cd ElasticModels/examples/modal
415
 
416
  ### Configuration of environment variables
417
 
 
 
418
  Set your environment variables in `modal_serving.py`:
419
 
420
  ```python
@@ -433,6 +463,8 @@ ENVS = {
433
 
434
  ### Configuration of GPUs
435
 
 
 
436
  Set your desired GPU type and autoscaling variables in `modal_serving.py`:
437
 
438
  ```python
@@ -459,6 +491,8 @@ def serve():
459
 
460
  ### Run serving
461
 
 
 
462
  ```shell
463
  modal serve modal_serving.py
464
  ```
@@ -466,6 +500,8 @@ modal serve modal_serving.py
466
 
467
  ## Links
468
 
 
 
469
  * __Platform__: [app.thestage.ai](https://app.thestage.ai)
470
  * __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
471
  * __Contact email__: contact@thestage.ai
 
11
 
12
  ## Overview
13
 
14
+ ---
15
+
16
  ElasticModels are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement, routing different compression algorithms to different layers. For each model, we have produced a series of optimized models:
17
 
18
  - **XL**: Mathematically equivalent neural network, optimized with our DNN compiler.
 
22
 
23
  Models can be accessed via TheStage AI Python SDK: ElasticModels, or deployed as Docker containers with REST API endpoints (see Deploy section).
24
 
25
+ ## Installation
26
+
27
  ---
28
 
 
29
 
30
  ### System Requirements
31
 
32
+ ---
33
+
34
  | **Property**| **Value** |
35
  | --- | --- |
36
  | **GPU** | L40s, RTX 5090, H100, B200 |
 
41
 
42
  ### TheStage AI Access token setup
43
 
44
+ ---
45
+
46
  Install TheStage AI CLI and setup API token:
47
 
48
  ```bash
 
52
 
53
  ### ElasticModels installation
54
 
55
+ ---
56
+
57
  Install TheStage Elastic Models package:
58
 
59
  ```bash
 
61
  --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
62
  ```
63
 
64
+ ## Usage example
65
+
66
  ---
67
 
 
68
 
69
  Elastic Models provides the same interface as HuggingFace Diffusers. Here is an example of how to use the FLUX.1-dev model:
70
 
 
94
  ```
95
 
96
 
97
+ ## Quality Benchmarks
98
+
99
  ---
100
 
 
101
 
102
  We have used PartiPrompts and DrawBench datasets to evaluate the quality of images generated by different sizes of FLUX.1-dev models (S, M, L, XL) compared to the original model. The evaluation metrics include ARNIQA, CLIP IQA, PSNR, SSIM, and VQA Faithfulness.
103
 
 
105
 
106
  ### Quality Benchmark Results
107
 
108
+ ---
109
+
110
  | **Metric/Model Size**| **S**| **M**| **L**| **XL**| **Original** |
111
  | --- | --- | --- | --- | --- | --- |
112
  | **ARNIQA (PartiPrompts)** | 64.1 | 63.2 | 61.9 | 66.8 | 66.9 |
 
119
  | **SSIM (PartiPrompts)** | 0.72 | 0.72 | 0.76 | 1.0 | 1.0 |
120
 
121
 
122
+ ## Datasets
123
+
124
  ---
125
 
 
126
 
127
  - **PartiPrompts**: A benchmark dataset created by Google Research, containing 1,632 diverse and challenging prompts that test various aspects of text-to-image generation models. It includes categories such as abstract concepts, complex compositions, properties and attributes, counting and numbers, text rendering, artistic styles, and fine-grained details.
128
 
129
  - **DrawBench**: A comprehensive benchmark dataset developed by Google Research, containing 200 carefully curated prompts designed to test specific capabilities and challenge areas of diffusion models. It includes categories such as colors, counting, conflicting requirements, DALL-E inspired prompts, detailed descriptions, misspellings, positional relationships, rare words, Reddit user prompts, and text generation.
130
 
131
+ ## Metrics
132
+
133
  ---
134
 
 
135
 
136
  - **ARNIQA**: No-reference image quality assessment metric that predicts perceptual quality without reference images.
137
  - **CLIP_IQA**: No-reference image quality metric using contrastive learning to assess image quality without references.
 
140
  - **SSIM**: Structural Similarity Index measuring perceptual similarity between generated by accelerated model and original model images.
141
 
142
 
143
+ ## Latency Benchmarks
144
+
145
  ---
146
 
 
147
 
148
  We have measured the latency of different sizes of FLUX.1-dev model (S, M, L, XL, original) on various GPUs. The measurements were taken for generating images of size 1024x1024 pixels.
149
 
 
151
 
152
  ### Latency Benchmark Results
153
 
154
+ ---
155
+
156
  Latency (in seconds) for generating a 1024x1024 image using different model sizes on various hardware setups.
157
 
158
  | **GPU/Model Size**| **S**| **M**| **L**| **XL**| **Original** |
 
163
  | **GeForce RTX 5090** | 5.79 | N/A | N/A | N/A | N/A |
164
 
165
 
166
+ ## Benchmarking Methodology
167
+
168
  ---
169
 
 
170
 
171
  The benchmarking was performed on a single GPU with a batch size of 1. Each model was run for 10 iterations, and the average latency was calculated.
172
 
 
182
  > - Record the end time and calculate the latency for that iteration.
183
  > 5. Calculate the average latency over all iterations.
184
 
185
+ ## Reproduce benchmarking
186
+
187
  ---
188
 
 
189
 
190
  ```python
191
  import torch
 
244
  ```
245
 
246
 
247
+ ## Serving with Docker Image
248
+
249
  ---
250
 
 
251
 
252
  For serving with Nvidia GPUs, we provide ready-to-go Docker containers with OpenAI-compatible API endpoints.
253
  Using our containers you can set up an inference endpoint on any desired cloud/serverless providers as well as on-premise servers.
 
255
 
256
  ### Prebuilt image from ECR
257
 
258
+ ---
 
 
 
259
 
260
+ Pull docker image and start inference container:
261
 
262
  ```bash
263
+ docker pull public.ecr.aws/i3f7g5s7/thestage/elastic-models:0.2.0-diffusers-24.09c
264
  ```
265
  ```bash
266
  docker run --rm -ti \
 
273
  -e HUGGINGFACE_ACCESS_TOKEN=<HUGGINGFACE_ACCESS_TOKEN> \
274
  -e THESTAGE_AUTH_TOKEN=<THESTAGE_ACCESS_TOKEN> \
275
  -v /mnt/hf_cache:/root/.cache/huggingface \
276
+ public.ecr.aws/i3f7g5s7/thestage/elastic-models:0.2.0-diffusers-24.09c
277
  ```
278
 
279
  | **Parameter** | **Description** |
 
283
  | `<HUGGINGFACE_ACCESS_TOKEN>` | Hugging Face access token. |
284
  | `<THESTAGE_ACCESS_TOKEN>` | TheStage token generated on the platform (Profile -> Access tokens). |
285
  | `<AUTH_TOKEN>` | Token for endpoint authentication. You can set it to any random string; it must match the value used by the client. |
286
+
287
+ ## Invocation
288
 
289
  ---
290
 
 
291
 
292
  You can invoke the endpoint using CURL as follows:
293
 
 
361
  f.write(response.content)
362
  ```
363
 
364
+ ## Endpoint Parameters
365
+
366
  ---
367
 
 
368
 
369
  ### Method
370
 
371
+ ---
372
+
373
  > **POST** `/v1/images/generations`
374
 
375
  ### Header Parameters
376
 
377
+ ---
378
+
379
  > `Authorization`: `string`
380
  >
381
  > Bearer token for authentication. Should match the `AUTH_TOKEN` set during container startup.
 
390
 
391
  ### Input Body
392
 
393
+ ---
394
+
395
  > `prompt` : `string`
396
  >
397
  > The text prompt to generate an image for.
 
425
  >
426
  > Guidance scale for classifier-free guidance. Higher values increase adherence to the prompt.
427
 
428
+ ## Deploy on Modal
429
+
430
  ---
431
 
 
432
 
433
  For more details please use the tutorial [Modal deployment](https://docs.thestage.ai/tutorials/source/modal_thestage.html)
434
 
435
  ### Clone modal serving code
436
 
437
+ ---
438
+
439
  ```shell
440
  git clone https://github.com/TheStageAI/ElasticModels.git
441
  cd ElasticModels/examples/modal
 
443
 
444
  ### Configuration of environment variables
445
 
446
+ ---
447
+
448
  Set your environment variables in `modal_serving.py`:
449
 
450
  ```python
 
463
 
464
  ### Configuration of GPUs
465
 
466
+ ---
467
+
468
  Set your desired GPU type and autoscaling variables in `modal_serving.py`:
469
 
470
  ```python
 
491
 
492
  ### Run serving
493
 
494
+ ---
495
+
496
  ```shell
497
  modal serve modal_serving.py
498
  ```
 
500
 
501
  ## Links
502
 
503
+ ---
504
+
505
  * __Platform__: [app.thestage.ai](https://app.thestage.ai)
506
  * __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
507
  * __Contact email__: contact@thestage.ai