YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Falcon 7B with just mlp layers quantized to FP8_BLOCK scheme
from llmcompressor import model_free_ptq
MODEL_ID = "tiiuae/falcon-7b"
SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-FP8-BLOCK"
# Apply FP8-Block to the model
# Once quantized, the model is saved
# using compressed-tensors to the SAVE_DIR.
model_free_ptq(
model_stub=MODEL_ID,
save_directory=SAVE_DIR,
scheme="FP8_BLOCK",
ignore=[
"re:.*ln_f$",
"lm_head",
"re:.*self_attention.*",
"re:.*word_embeddings$",
"re:.*q_a_proj$",
"model.embed_tokens",
],
max_workers=1,
device="cuda:0",
)
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support