YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Made using the following branches of llm-compressor and compressed-tensors

from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

MODEL_ID = "deepseek-ai/DeepSeek-V2-Lite"

# Load model.
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

# Configure the quantization algorithm and scheme.
# In this case, we:
#   * quantize the weights to fp8 with per channel via ptq
#   * quantize the activations to fp8 with dynamic per token
#   * ignore all mlp proj (experts are still quantized)
#       layers.x.mlp.down_proj has incompatible in_features 10944
#       all MoE projections need to have same quant scheme
recipe = QuantizationModifier(
    targets="Linear",
    scheme="FP8_BLOCK",
    ignore=["lm_head", r"re:.*mlp\.(gate|up|down)_proj$"],
)

# Apply quantization.
oneshot(model=model, recipe=recipe)

# Save to disk in compressed-tensors format.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-BLOCK"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)

Downloads last month: 249

Safetensors

Model size

16B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support