WaveCut/QClaw-4B-mlx_4bit_DWQ
This model was quantized to 4-bit MLX format from LakoMoor/QClaw-4B.
Quantization details:
- Method: DWQ (Distilled Weight Quantization)
- Teacher:
WaveCut/QClaw-4B-mlx_8bit - Student initialization:
WaveCut/QClaw-4B-mlx_4bit - Bits: 4
- Group size: 64
- Mode: affine
- Accepted DWQ training samples: 2048
- Maximum sequence length: 513
- Batch size: 1
The artifact was checked for finite floating-point tensors and smoke-tested with short text and code generation prompts.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("WaveCut/QClaw-4B-mlx_4bit_DWQ")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 3,190
Model size
0.7B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit