krishnateja95 commited on
Commit
3cb1c89
·
verified ·
1 Parent(s): cc18893

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -509
README.md DELETED
@@ -1,509 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- pipeline_tag: text-generation
4
- tags:
5
- - fp8
6
- - quantized
7
- - llm-compressor
8
- - compressed-tensors
9
- - red hat
10
- base_model:
11
- - ibm-granite/granite-4.0-h-small
12
- ---
13
-
14
-
15
- # Granite-4.0-h-small
16
-
17
- ## Model Overview
18
- - **Model Architecture:** GraniteMoeHybridForCausalLM
19
- - **Input:** Text
20
- - **Output:** Text
21
- - **Model Optimizations:**
22
- - **Weight quantization:** FP8
23
- - **Activation quantization:** FP8
24
- - **Release Date:**
25
- - **Version:** 1.0
26
- - **Model Developers:**: Red Hat
27
-
28
- Quantized version of [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small).
29
-
30
- ### Model Optimizations
31
-
32
- This model was obtained by quantizing the weights and activations of [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small) to FP8 data type.
33
- This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.
34
- Only the weights and activations of the linear operators within transformers blocks of the language model are quantized.
35
-
36
- ## Deployment
37
-
38
- ### Use with vLLM
39
-
40
- 1. Initialize vLLM server:
41
- ```
42
- vllm serve RedHatAI/granite-4.0-h-small-FP8-block --tensor_parallel_size 4
43
- ```
44
-
45
- 2. Send requests to the server:
46
-
47
- ```python
48
- from openai import OpenAI
49
-
50
- # Modify OpenAI's API key and API base to use vLLM's API server.
51
- openai_api_key = "EMPTY"
52
- openai_api_base = "http://<your-server-host>:8000/v1"
53
-
54
- client = OpenAI(
55
- api_key=openai_api_key,
56
- base_url=openai_api_base,
57
- )
58
-
59
- model = "RedHatAI/granite-4.0-h-small-FP8-block"
60
-
61
- messages = [
62
- {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
63
- ]
64
-
65
-
66
- outputs = client.chat.completions.create(
67
- model=model,
68
- messages=messages,
69
- )
70
-
71
- generated_text = outputs.choices[0].message.content
72
- print(generated_text)
73
- ```
74
-
75
- <!-- ## Creation
76
-
77
- This model was quantized using the [llm-compressor](https://github.com/vllm-project/llm-compressor) library as shown below.
78
-
79
- <details>
80
- <summary>Creation details</summary>
81
-
82
- ```python
83
- from transformers import AutoProcessor, Qwen3ForCausalLM
84
-
85
- from llmcompressor import oneshot
86
- from llmcompressor.modeling import replace_modules_for_calibration
87
- from llmcompressor.modifiers.quantization import QuantizationModifier
88
-
89
- MODEL_ID = "Qwen/Qwen3-8B"
90
-
91
- # Load model.
92
- model = Qwen3ForCausalLM.from_pretrained(MODEL_ID, dtype="auto")
93
- processor = AutoProcessor.from_pretrained(MODEL_ID)
94
- model = replace_modules_for_calibration(model)
95
-
96
- # Configure the quantization algorithm and scheme.
97
- # In this case, we:
98
- # * quantize the weights to fp8 with per-block quantization
99
- # * quantize the activations to fp8 with dynamic token activations
100
- recipe = QuantizationModifier(
101
- targets="Linear",
102
- scheme="FP8_BLOCK",
103
- ignore=["lm_head"],
104
- )
105
-
106
- # Apply quantization.
107
- oneshot(model=model, recipe=recipe)
108
-
109
- # Save to disk in compressed-tensors format.
110
- SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-FP8-block"
111
- model.save_pretrained(SAVE_DIR)
112
- processor.save_pretrained(SAVE_DIR)
113
- ```
114
- </details> -->
115
-
116
-
117
- ## Evaluation
118
-
119
-
120
- The model was evaluated on the OpenLLM leaderboard task, using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
121
- [vLLM](https://docs.vllm.ai/en/stable/) was used for all evaluations.
122
-
123
- <details>
124
- <summary>Evaluation details</summary>
125
-
126
- **Openllm V1**
127
- ```
128
- lm_eval \
129
- --model vllm \
130
- --model_args pretrained="RedHatAI/granite-4.0-h-small-FP8-block",dtype=auto,add_bos_token=True,max_model_len=16384,tensor_parallel_size=4,gpu_memory_utilization=0.9,enable_chunked_prefill=True,trust_remote_code=True \
131
- --tasks openllm \
132
- --write_out \
133
- --batch_size auto \
134
- --show_config
135
- ```
136
-
137
-
138
- **Openllm V2**
139
- ```
140
- lm_eval \
141
- --model vllm \
142
- --model_args pretrained="RedHatAI/granite-4.0-h-small-FP8-block",dtype=auto,add_bos_token=False,max_model_len=16384,tensor_parallel_size=4,gpu_memory_utilization=0.7,disable_log_stats=True,enable_chunked_prefill=True,trust_remote_code=True \
143
- --tasks leaderboard \
144
- --apply_chat_template \
145
- --fewshot_as_multiturn \
146
- --write_out \
147
- --batch_size auto \
148
- --show_config
149
- ```
150
-
151
-
152
- **Coding Benchmarks**
153
-
154
- ```
155
- evalplus.evaluate --model "RedHatAI/granite-4.0-h-small-FP8-block" \
156
- --dataset "humaneval" \
157
- --backend vllm \
158
- --tp 4 \
159
- --greedy
160
-
161
- evalplus.evaluate --model "RedHatAI/granite-4.0-h-small-FP8-block" \
162
- --dataset "mbpp" \
163
- --backend vllm \
164
- --tp 4 \
165
- --greedy
166
-
167
- ```
168
-
169
- </details>
170
-
171
-
172
- <b>*</b> I/p Length = 2048, O/p Length = 2048, #Requests = 1024
173
-
174
- ### Accuracy
175
- <table>
176
-
177
- <thead>
178
- <tr>
179
- <th>Category</th>
180
- <th>Metric</th>
181
- <th>ibm-granite/granite-4.0-h-small</th>
182
- <th>ibm-granite/granite-4.0-h-small-FP8</th>
183
- <th>RedHatAI/granite-4.0-h-small-FP8-block</th>
184
- </tr>
185
- </thead>
186
- <tbody>
187
- <tr>
188
- <td rowspan="1"><b>Model Size (GB)</b></td>
189
- <td></td>
190
- <td>64.41</td>
191
- <td>33.48</td>
192
- <td>36.43</td>
193
- </tr>
194
- <tr>
195
- <td rowspan="1"><b>Throughput (Requests/sec)*</b></td>
196
- <td></td>
197
- <td>2.031</td>
198
- <td>2.144</td>
199
- <td>2.066</td>
200
- </tr>
201
- <tr>
202
- <td rowspan="7"><b>OpenLLM V1</b></td>
203
- <td>ARC-Challenge (Acc-Norm, 25-shot)</td>
204
- <td>72.27</td>
205
- <td>71.67 (99.17)</td>
206
- <td>72.27 (100.00)</td>
207
- </tr>
208
- <tr>
209
- <td>GSM8K (Strict-Match, 5-shot)</td>
210
- <td>85.06</td>
211
- <td>85.60 (100.62)</td>
212
- <td>85.60 (100.62)</td>
213
- </tr>
214
- <tr>
215
- <td>HellaSwag (Acc-Norm, 10-shot)</td>
216
- <td>86.07</td>
217
- <td>86.02 (99.94)</td>
218
- <td>85.96 (99.87)</td>
219
- </tr>
220
- <tr>
221
- <td>MMLU (Acc, 5-shot)</td>
222
- <td>77.15</td>
223
- <td>76.94 (99.73)</td>
224
- <td>77.23 (100.10)</td>
225
- </tr>
226
- <tr>
227
- <td>TruthfulQA (MC2, 0-shot)</td>
228
- <td>57.97</td>
229
- <td>57.62 (99.40)</td>
230
- <td>57.85 (99.80)</td>
231
- </tr>
232
- <tr>
233
- <td>Winogrande (Acc, 5-shot)</td>
234
- <td>81.45</td>
235
- <td>81.14 (99.61)</td>
236
- <td>80.82 (99.22)</td>
237
- </tr>
238
- <tr>
239
- <td><b>Average Score</b></td>
240
- <td><b>76.66</b></td>
241
- <td><b>76.50 (99.79)</b></td>
242
- <td><b>76.62 (99.95)</b></td>
243
- </tr>
244
- <!-- OpenLLM Leaderboard V2 -->
245
- <tr>
246
- <td rowspan="7"><b>OpenLLM V2</b></td>
247
- <td>IFEval (Inst Level Strict Acc, 0-shot)</td>
248
- <td>87.41</td>
249
- <td>87.65 (100.27)</td>
250
- <td>87.89 (100.55)</td>
251
- </tr>
252
- <tr>
253
- <td>BBH (Acc-Norm, 3-shot)</td>
254
- <td>61.52</td>
255
- <td>61.31 (99.66)</td>
256
- <td>61.40 (99.80)</td>
257
- </tr>
258
- <tr>
259
- <td>Math-Hard (Exact-Match, 4-shot)</td>
260
- <td>46.60</td>
261
- <td>44.34 (95.14)</td>
262
- <td>44.94 (96.43)</td>
263
- </tr>
264
- <tr>
265
- <td>GPQA (Acc-Norm, 0-shot)</td>
266
- <td>32.55</td>
267
- <td>32.05 (98.45)</td>
268
- <td>34.23 (105.15)</td>
269
- </tr>
270
- <tr>
271
- <td>MUSR (Acc-Norm, 0-shot)</td>
272
- <td>46.43</td>
273
- <td>46.30 (99.72)</td>
274
- <td>45.77 (98.58)</td>
275
- </tr>
276
- <tr>
277
- <td>MMLU-Pro (Acc, 5-shot)</td>
278
- <td>47.96</td>
279
- <td>47.91 (99.88)</td>
280
- <td>47.93 (99.93)</td>
281
- </tr>
282
- <tr>
283
- <td><b>Average Score</b></td>
284
- <td><b>53.75</b></td>
285
- <td><b>53.26 (99.09)</b></td>
286
- <td><b>53.69 (99.89)</b></td>
287
- </tr>
288
- </tbody>
289
- </table>
290
-
291
-
292
-
293
-
294
-
295
-
296
- <!--
297
- ### Accuracy
298
- <table>
299
- <thead>
300
- <tr>
301
- <th>Category</th>
302
- <th>Metric</th>
303
- <th>ibm-granite/granite-4.0-h-small</th>
304
- <th>ibm-granite/granite-4.0-h-small-FP8</th>
305
- <th>Recovery (%)</th>
306
- </tr>
307
- </thead>
308
- <tbody>
309
-
310
- <tr>
311
- <td rowspan="7"><b>OpenLLM V1</b></td>
312
- <td>ARC-Challenge (Acc-Norm, 25-shot)</td>
313
- <td>72.27</td>
314
- <td>71.67</td>
315
- <td>99.17</td>
316
- </tr>
317
- <tr>
318
- <td>GSM8K (Strict-Match, 5-shot)</td>
319
- <td>85.06</td>
320
- <td>85.60</td>
321
- <td>100.62</td>
322
- </tr>
323
- <tr>
324
- <td>HellaSwag (Acc-Norm, 10-shot)</td>
325
- <td>86.07</td>
326
- <td>86.02</td>
327
- <td>99.94</td>
328
- </tr>
329
- <tr>
330
- <td>MMLU (Acc, 5-shot)</td>
331
- <td>77.15</td>
332
- <td>76.94</td>
333
- <td>99.73</td>
334
- </tr>
335
- <tr>
336
- <td>TruthfulQA (MC2, 0-shot)</td>
337
- <td>57.97</td>
338
- <td>57.62</td>
339
- <td>99.40</td>
340
- </tr>
341
- <tr>
342
- <td>Winogrande (Acc, 5-shot)</td>
343
- <td>81.45</td>
344
- <td>81.14</td>
345
- <td>99.61</td>
346
- </tr>
347
- <tr>
348
- <td><b>Average Score</b></td>
349
- <td><b>76.66</b></td>
350
- <td><b>76.50</b></td>
351
- <td><b>99.79</b></td>
352
- </tr>
353
-
354
- <tr>
355
- <td rowspan="7"><b>OpenLLM V2</b></td>
356
- <td>IFEval (Inst Level Strict Acc, 0-shot)</td>
357
- <td>87.41</td>
358
- <td>87.65</td>
359
- <td>100.27</td>
360
- </tr>
361
- <tr>
362
- <td>BBH (Acc-Norm, 3-shot)</td>
363
- <td>61.52</td>
364
- <td>61.31</td>
365
- <td>99.66</td>
366
- </tr>
367
- <tr>
368
- <td>Math-Hard (Exact-Match, 4-shot)</td>
369
- <td>46.60</td>
370
- <td>44.34</td>
371
- <td>95.14</td>
372
- </tr>
373
- <tr>
374
- <td>GPQA (Acc-Norm, 0-shot)</td>
375
- <td>32.55</td>
376
- <td>32.05</td>
377
- <td>98.45</td>
378
- </tr>
379
- <tr>
380
- <td>MUSR (Acc-Norm, 0-shot)</td>
381
- <td>46.43</td>
382
- <td>46.30</td>
383
- <td>99.72</td>
384
- </tr>
385
- <tr>
386
- <td>MMLU-Pro (Acc, 5-shot)</td>
387
- <td>47.96</td>
388
- <td>47.91</td>
389
- <td>99.88</td>
390
- </tr>
391
- <tr>
392
- <td><b>Average Score</b></td>
393
- <td><b>53.75</b></td>
394
- <td><b>53.26</b></td>
395
- <td><b>99.09</b></td>
396
- </tr>
397
- </tbody>
398
- </table>
399
-
400
-
401
-
402
-
403
-
404
-
405
-
406
-
407
-
408
- ### Accuracy
409
- <table>
410
- <thead>
411
- <tr>
412
- <th>Category</th>
413
- <th>Metric</th>
414
- <th>ibm-granite/granite-4.0-h-small</th>
415
- <th>RedHatAI/granite-4.0-h-small-FP8-block</th>
416
- <th>Recovery (%)</th>
417
- </tr>
418
- </thead>
419
- <tbody>
420
-
421
- <tr>
422
- <td rowspan="7"><b>OpenLLM V1</b></td>
423
- <td>ARC-Challenge (Acc-Norm, 25-shot)</td>
424
- <td>72.27</td>
425
- <td>72.27</td>
426
- <td>100.00</td>
427
- </tr>
428
- <tr>
429
- <td>GSM8K (Strict-Match, 5-shot)</td>
430
- <td>85.06</td>
431
- <td>85.60</td>
432
- <td>100.62</td>
433
- </tr>
434
- <tr>
435
- <td>HellaSwag (Acc-Norm, 10-shot)</td>
436
- <td>86.07</td>
437
- <td>85.96</td>
438
- <td>99.87</td>
439
- </tr>
440
- <tr>
441
- <td>MMLU (Acc, 5-shot)</td>
442
- <td>77.15</td>
443
- <td>77.23</td>
444
- <td>100.10</td>
445
- </tr>
446
- <tr>
447
- <td>TruthfulQA (MC2, 0-shot)</td>
448
- <td>57.97</td>
449
- <td>57.85</td>
450
- <td>99.80</td>
451
- </tr>
452
- <tr>
453
- <td>Winogrande (Acc, 5-shot)</td>
454
- <td>81.45</td>
455
- <td>80.82</td>
456
- <td>99.22</td>
457
- </tr>
458
- <tr>
459
- <td><b>Average Score</b></td>
460
- <td><b>76.66</b></td>
461
- <td><b>76.62</b></td>
462
- <td><b>99.95</b></td>
463
- </tr>
464
-
465
- <tr>
466
- <td rowspan="7"><b>OpenLLM V2</b></td>
467
- <td>IFEval (Inst Level Strict Acc, 0-shot)</td>
468
- <td>87.41</td>
469
- <td>87.89</td>
470
- <td>100.55</td>
471
- </tr>
472
- <tr>
473
- <td>BBH (Acc-Norm, 3-shot)</td>
474
- <td>61.52</td>
475
- <td>61.40</td>
476
- <td>99.80</td>
477
- </tr>
478
- <tr>
479
- <td>Math-Hard (Exact-Match, 4-shot)</td>
480
- <td>46.60</td>
481
- <td>44.94</td>
482
- <td>96.43</td>
483
- </tr>
484
- <tr>
485
- <td>GPQA (Acc-Norm, 0-shot)</td>
486
- <td>32.55</td>
487
- <td>34.23</td>
488
- <td>105.15</td>
489
- </tr>
490
- <tr>
491
- <td>MUSR (Acc-Norm, 0-shot)</td>
492
- <td>46.43</td>
493
- <td>45.77</td>
494
- <td>98.58</td>
495
- </tr>
496
- <tr>
497
- <td>MMLU-Pro (Acc, 5-shot)</td>
498
- <td>47.96</td>
499
- <td>47.93</td>
500
- <td>99.93</td>
501
- </tr>
502
- <tr>
503
- <td><b>Average Score</b></td>
504
- <td><b>53.75</b></td>
505
- <td><b>53.69</b></td>
506
- <td><b>99.89</b></td>
507
- </tr>
508
- </tbody>
509
- </table> -->