Files changed (1) hide show
  1. README.md +34 -52
README.md CHANGED
@@ -4,7 +4,7 @@ base_model:
4
  - black-forest-labs/FLUX.1-schnell
5
  ---
6
 
7
- # Elastic models
8
 
9
  Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:
10
 
@@ -17,19 +17,20 @@ Elastic models are the models produced by TheStage AI ANNA: Automated Neural Net
17
  * __S__: The fastest model, with accuracy degradation less than 2%.
18
 
19
 
20
- __Goals of elastic models:__
21
 
22
- * Provide flexibility in cost vs quality selection for inference
23
- * Provide clear quality and latency benchmarks
24
- * Provide interface of HF libraries: transformers and diffusers with a single line of code
 
25
  * Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT.
26
- * Provide the best models and service for self-hosting.
27
 
28
  > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
29
 
30
  -----
31
 
32
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6487003ecd55eec571d14f96/ouz3FYQzG8C7Fl3XpNe6t.jpeg)
 
33
 
34
  ## Inference
35
 
@@ -49,12 +50,12 @@ device = torch.device("cuda")
49
  pipeline = FluxPipeline.from_pretrained(
50
  mode_name,
51
  torch_dtype=torch.bfloat16,
 
52
  mode='S'
53
  )
54
  pipeline.to(device)
55
 
56
  prompts = ["Kitten eating a banana"]
57
-
58
  output = pipeline(prompt=prompts)
59
 
60
  for prompt, output_image in zip(prompts, output.images):
@@ -64,25 +65,30 @@ for prompt, output_image in zip(prompts, output.images):
64
  ### Installation
65
 
66
 
67
- __System requirements__
68
-
69
- * GPUs: H100, L40s
70
-
71
  * CPU: AMD, Intel
72
-
73
- * OS: Linux #TODO
74
-
75
  * Python: 3.10-3.12
76
 
77
 
78
- To work with our models
79
 
80
  ```shell
81
  pip install thestage
82
  pip install elastic_models
 
 
 
 
 
 
 
 
 
 
83
  ```
84
 
85
- Then go to app.thestage.ai, login and generate API token from your profile page. Set up API token as follows:
86
 
87
  ```shell
88
  thestage config set --api-token <YOUR_API_TOKEN>
@@ -94,48 +100,25 @@ Congrats, now you can use accelerated models!
94
 
95
  ## Benchmarks
96
 
97
- Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms. The `W8A8, int8 column` indicates that we applied W8A8 quantization with int8 data type to all linear layers and used the same calibration data as for ANNA. The S model achieves practically identical speed but much higher quality, as ANNA knows how to improve quantization quality on sensitive layers!
98
 
99
  ### Quality benchmarks
100
 
101
- For quality evaluation we have used: #TODO link to github
102
-
103
- | Metric/Model | S | M | L | XL | Original | W8A8, int8 |
104
- |---------------|---|---|---|----|----------|------------|
105
- | MMLU | 0 | 0 | 0 | 0 | 0 | 0 |
106
- | PIQA | 0 | 0 | 0 | 0 | 0 | 0 |
107
- | Arc Challenge | 0 | 0 | 0 | 0 | 0 | 0 |
108
- | Winogrande | 0 | 0 | 0 | 0 | 0 | 0 |
109
 
110
 
111
- * **MMLU**:Evaluates general knowledge across 57 subjects including science, humanities, engineering, and more. Shows model's ability to handle diverse academic topics.
112
- * **PIQA**: Evaluates physical commonsense reasoning through questions about everyday physical interactions. Shows model's understanding of real-world physics concepts.
113
- * **Arc Challenge**: Evaluates grade-school level multiple-choice questions requiring reasoning. Shows model's ability to solve complex reasoning tasks.
114
- * **Winogrande**: Evaluates commonsense reasoning through sentence completion tasks. Shows model's capability to understand context and resolve ambiguity.
115
-
116
  ### Latency benchmarks
117
 
118
- We have profiled models in different scenarios:
119
-
120
- <table>
121
- <tr><th> 100 input/300 output; tok/s </th><th> 1000 input/1000 output; tok/s </th></tr>
122
- <tr><td>
123
-
124
- | GPU/Model | S | M | L | XL | Original | W8A8, int8 |
125
- |-----------|-----|---|---|----|----------|------------|
126
- | H100 | 189 | 0 | 0 | 0 | 48 | 0 |
127
- | L40s | 79 | 0 | 0 | 0 | 42 | 0 |
128
-
129
-
130
-
131
- </td><td>
132
-
133
- | GPU/Model | S | M | L | XL | Original | W8A8, int8 |
134
- |-----------|-----|---|---|----|----------|------------|
135
- | H100 | 189 | 0 | 0 | 0 | 48 | 0 |
136
- | L40s | 79 | 0 | 0 | 0 | 42 | 0 |
137
-
138
- </td></tr> </table>
139
 
140
 
141
  ## Links
@@ -144,4 +127,3 @@ We have profiled models in different scenarios:
144
  * __Elastic models Github__: [app.thestage.ai](app.thestage.ai)
145
  * __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
146
  * __Contact email__: contact@thestage.ai
147
-
 
4
  - black-forest-labs/FLUX.1-schnell
5
  ---
6
 
7
+ # Elastic model: Fastest self-serving models. FLUX.1-schnell.
8
 
9
  Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:
10
 
 
17
  * __S__: The fastest model, with accuracy degradation less than 2%.
18
 
19
 
20
+ __Goals of Elastic Models:__
21
 
22
+ * Provide the fastest models and service for self-hosting.
23
+ * Provide flexibility in cost vs quality selection for inference.
24
+ * Provide clear quality and latency benchmarks.
25
+ * Provide interface of HF libraries: transformers and diffusers with a single line of code.
26
  * Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT.
 
27
 
28
  > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.
29
 
30
  -----
31
 
32
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6487003ecd55eec571d14f96/ouz3FYQzG8C7Fl3XpNe6t.jpeg)
33
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6487003ecd55eec571d14f96/l8xFGy0p5rxsn1-UojolK.png)
34
 
35
  ## Inference
36
 
 
50
  pipeline = FluxPipeline.from_pretrained(
51
  mode_name,
52
  torch_dtype=torch.bfloat16,
53
+ token=hf_token
54
  mode='S'
55
  )
56
  pipeline.to(device)
57
 
58
  prompts = ["Kitten eating a banana"]
 
59
  output = pipeline(prompt=prompts)
60
 
61
  for prompt, output_image in zip(prompts, output.images):
 
65
  ### Installation
66
 
67
 
68
+ __System requirements:__
69
+ * GPUs: H100
 
 
70
  * CPU: AMD, Intel
 
 
 
71
  * Python: 3.10-3.12
72
 
73
 
74
+ To work with our models just run these lines in your terminal:
75
 
76
  ```shell
77
  pip install thestage
78
  pip install elastic_models
79
+ pip install flash_attn==2.7.3 --no-build-isolation
80
+ # can cause problems in text encoders
81
+ pip uninstall apex
82
+ echo "{
83
+ "meta-llama/Llama-3.2-1B-Instruct": 6,
84
+ "mistralai/Mistral-7B-Instruct-v0.3": 7,
85
+ "black-forest-labs/FLUX.1-schnell": 1,
86
+ "black-forest-labs/FLUX.1-dev": 5
87
+ }" > model_name_id.json
88
+ export ELASTIC_MODEL_ID_MAPPING=./model_name_id.json
89
  ```
90
 
91
+ Then go to [app.thestage.ai](app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:
92
 
93
  ```shell
94
  thestage config set --api-token <YOUR_API_TOKEN>
 
100
 
101
  ## Benchmarks
102
 
103
+ Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms.
104
 
105
  ### Quality benchmarks
106
 
107
+ For quality evaluation we have used: PSNR, SSIM and CLIP score. PSNR and SSIM were computed using outputs of original model.
108
+ | Metric/Model | S | M | L | XL | Original |
109
+ |---------------|---|---|---|----|----------|
110
+ | PSNR | 29.9 | 30.2 | 31 | inf | inf |
111
+ | SSIM | 0.66 | 0.71 | 0.86 | 1.0 | 1.0 |
112
+ | CLIP | 11.5 | 11.6 | 11.8 | 11.9 | 11.9|
 
 
113
 
114
 
 
 
 
 
 
115
  ### Latency benchmarks
116
 
117
+ Time in seconds to generate one image 1024x1024
118
+ | GPU/Model | S | M | L | XL | Original |
119
+ |-----------|-----|---|---|----|----------|
120
+ | H100 | 0.5 | 0.58 | 0.65 | 0.75 | 1.05 |
121
+ | L40s | 1.4 | 1.6 | 1.9 | 2.1 | 2.5|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
 
123
 
124
  ## Links
 
127
  * __Elastic models Github__: [app.thestage.ai](app.thestage.ai)
128
  * __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
129
  * __Contact email__: contact@thestage.ai