|
|
---
|
|
|
license: mit
|
|
|
---
|
|
|
|
|
|
# Gryphe/MythoMax-L2-13b
|
|
|
|
|
|
Quantized version of [Gryphe/MythoMax-L2-13b](https://huggingface.co/Gryphe/MythoMax-L2-13b).
|
|
|
|
|
|
## Creation
|
|
|
|
|
|
This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet
|
|
|
below.
|
|
|
|
|
|
```python
|
|
|
from llmcompressor.modifiers.quantization import QuantizationModifier
|
|
|
from llmcompressor.transformers import oneshot
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
|
|
# Load model
|
|
|
model_stub = "Gryphe/MythoMax-L2-13b"
|
|
|
model_name = model_stub.split("/")[-1]
|
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
|
model_stub,
|
|
|
torch_dtype="auto",
|
|
|
)
|
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_stub)
|
|
|
|
|
|
# Configure the quantization algorithm and scheme
|
|
|
recipe = QuantizationModifier(
|
|
|
targets="Linear",
|
|
|
scheme="FP8_DYNAMIC",
|
|
|
ignore=["lm_head"],
|
|
|
)
|
|
|
|
|
|
# Apply quantization
|
|
|
oneshot(
|
|
|
model=model,
|
|
|
recipe=recipe,
|
|
|
)
|
|
|
|
|
|
# Save to disk in compressed-tensors format
|
|
|
save_path = model_name + "-FP8-dynamic"
|
|
|
model.generation_config.do_sample = True
|
|
|
model.save_pretrained(save_path)
|
|
|
tokenizer.save_pretrained(save_path)
|
|
|
print(f"Model and tokenizer saved to: {save_path}")
|
|
|
```
|
|
|
|