dbw6/Llama-2-7b-AQLM-4Bit-4x8-hf

This repository contains a Hugging Face export of Llama-2-7b-hf quantized with AQLM using the 4-bit 4x8 scheme.

Base model

meta-llama/Llama-2-7b-hf

Quantization

Method: AQLM
Scheme: 4x8 (4 codebooks, 8 bits per codebook)
Effective label: 4-bit
In-group size: 8

Conversion

This repo was produced with convert_to_hf.py from the AQLM project, then exported with --save_safetensors and --save_tokenizer.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "dbw6/Llama-2-7b-AQLM-4Bit-4x8-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

Downloads last month: 149

Safetensors

Model size

4B params

Tensor type

F16

Model tree for dbw6/Llama-2-7b-AQLM-4Bit-4x8-hf

Base model

meta-llama/Llama-2-7b-hf

Quantized

(72)

this model

Collection including dbw6/Llama-2-7b-AQLM-4Bit-4x8-hf

EVA

Collection

8 items • Updated 14 days ago • 1