YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Falcon 7B with just mlp layers quantized to FP8_BLOCK scheme

from llmcompressor import model_free_ptq

MODEL_ID = "tiiuae/falcon-7b"
SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-FP8-BLOCK"

# Apply FP8-Block to the model
# Once quantized, the model is saved
# using compressed-tensors to the SAVE_DIR.
model_free_ptq(
    model_stub=MODEL_ID,
    save_directory=SAVE_DIR,
    scheme="FP8_BLOCK",
    ignore=[
        "re:.*ln_f$",
        "lm_head",
        "re:.*self_attention.*",
        "re:.*word_embeddings$",
        "re:.*q_a_proj$",
        "model.embed_tokens",
    ],
    max_workers=1,
    device="cuda:0",
)
Downloads last month
25
Safetensors
Model size
7B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support