File size: 1,209 Bytes
42fe6c1
 
 
197b524
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---

license: apache-2.0
---


# Qwen2.5-7B-Instruct-FP8-dynamic

Quantized version of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).

## Creation

This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet
below.

```python

from llmcompressor.modifiers.quantization import QuantizationModifier

from llmcompressor.transformers import oneshot

from transformers import AutoModelForCausalLM, AutoTokenizer



# Load model

model_stub = "Qwen/Qwen2.5-7B-Instruct"

model_name = model_stub.split("/")[-1]



model = AutoModelForCausalLM.from_pretrained(

    model_stub,

    torch_dtype="auto",

)



tokenizer = AutoTokenizer.from_pretrained(model_stub)



# Configure the quantization algorithm and scheme

recipe = QuantizationModifier(

    targets="Linear",

    scheme="FP8_DYNAMIC",

    ignore=["lm_head"],

)



# Apply quantization

oneshot(

    model=model,

    recipe=recipe,

)



# Save to disk in compressed-tensors format

save_path = model_name + "-FP8-dynamic"

model.save_pretrained(save_path)

tokenizer.save_pretrained(save_path)

print(f"Model and tokenizer saved to: {save_path}")

```