File size: 1,213 Bytes

ea271b3
 
 
3a8e1c9

---

license: apache-2.0
---


# UnslopNemo-12B-v1-FP8-dynamic

Quantized version of [UnslopNemo-12B-v1](https://huggingface.co/TheDrummer/UnslopNemo-12B-v1).

## Creation

This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet
below.

```python

from llmcompressor.modifiers.quantization import QuantizationModifier

from llmcompressor.transformers import oneshot

from transformers import AutoModelForCausalLM, AutoTokenizer



# Load model

model_stub = "TheDrummer/UnslopNemo-12B-v1"

model_name = model_stub.split("/")[-1]



model = AutoModelForCausalLM.from_pretrained(

    model_stub,

    torch_dtype="auto",

)



tokenizer = AutoTokenizer.from_pretrained(model_stub)



# Configure the quantization algorithm and scheme

recipe = QuantizationModifier(

    targets="Linear",

    scheme="FP8_DYNAMIC",

    ignore=["lm_head"],

)



# Apply quantization

oneshot(

    model=model,

    recipe=recipe,

)



# Save to disk in compressed-tensors format

save_path = model_name + "-FP8-dynamic"

model.save_pretrained(save_path)

tokenizer.save_pretrained(save_path)

print(f"Model and tokenizer saved to: {save_path}")

```