File size: 1,213 Bytes
ea271b3 3a8e1c9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | ---
license: apache-2.0
---
# UnslopNemo-12B-v1-FP8-dynamic
Quantized version of [UnslopNemo-12B-v1](https://huggingface.co/TheDrummer/UnslopNemo-12B-v1).
## Creation
This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet
below.
```python
from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model_stub = "TheDrummer/UnslopNemo-12B-v1"
model_name = model_stub.split("/")[-1]
model = AutoModelForCausalLM.from_pretrained(
model_stub,
torch_dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_stub)
# Configure the quantization algorithm and scheme
recipe = QuantizationModifier(
targets="Linear",
scheme="FP8_DYNAMIC",
ignore=["lm_head"],
)
# Apply quantization
oneshot(
model=model,
recipe=recipe,
)
# Save to disk in compressed-tensors format
save_path = model_name + "-FP8-dynamic"
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)
print(f"Model and tokenizer saved to: {save_path}")
```
|