File size: 1,234 Bytes
bbae369
 
 
63aba4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---

license: mit
---


# Gryphe/MythoMax-L2-13b

Quantized version of [Gryphe/MythoMax-L2-13b](https://huggingface.co/Gryphe/MythoMax-L2-13b).

## Creation

This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet
below.

```python

from llmcompressor.modifiers.quantization import QuantizationModifier

from llmcompressor.transformers import oneshot

from transformers import AutoModelForCausalLM, AutoTokenizer



# Load model

model_stub = "Gryphe/MythoMax-L2-13b"

model_name = model_stub.split("/")[-1]



model = AutoModelForCausalLM.from_pretrained(

    model_stub,

    torch_dtype="auto",

)



tokenizer = AutoTokenizer.from_pretrained(model_stub)



# Configure the quantization algorithm and scheme

recipe = QuantizationModifier(

    targets="Linear",

    scheme="FP8_DYNAMIC",

    ignore=["lm_head"],

)



# Apply quantization

oneshot(

    model=model,

    recipe=recipe,

)



# Save to disk in compressed-tensors format

save_path = model_name + "-FP8-dynamic"

model.generation_config.do_sample = True

model.save_pretrained(save_path)

tokenizer.save_pretrained(save_path)

print(f"Model and tokenizer saved to: {save_path}")

```