Llama-3.3-70B-Instruct-abliterated 8-bit MLX
An 8-bit MLX quantization of huihui-ai/Llama-3.3-70B-Instruct-abliterated, packaged for fast local inference on Apple Silicon.
8-bit was chosen instead of the more common 4-bit so the quant preserves as much of the base model's quality as possible — meant for users who have the unified-memory headroom and care more about output fidelity than minimum footprint.
Model details
| Field | Value |
|---|---|
| Base model | huihui-ai/Llama-3.3-70B-Instruct-abliterated |
| Quantization | 8-bit affine, group size 64 |
| Format | MLX (safetensors, 15 shards) |
| Architecture | Llama 3.3 — 80 layers, 8192 hidden, 64 attention heads / 8 KV heads, 128k context |
| Disk size | ~75 GB |
| Converted with | mlx-lm 0.31.2 |
Hardware requirements
You need an Apple Silicon Mac with enough unified memory to hold the full model in RAM, plus headroom for the KV cache and your OS:
- Will not run on 64 GB Macs — use a 4-bit quant instead.
- Minimum: ~80 GB free unified memory (so a 96 GB Ultra or a 128 GB Max).
- Comfortable: 128 GB or more, especially if you want long contexts.
Usage
Install mlx-lm:
pip install mlx-lm
Generate from the command line:
mlx_lm.generate \
--model divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx \
--prompt "Write a short poem about Apple Silicon." \
--max-tokens 200
Or load it from Python:
from mlx_lm import load, generate
model, tokenizer = load("divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx")
messages = [{"role": "user", "content": "Hello, who are you?"}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=False
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)
Credits
- Base abliteration by huihui-ai, built on Meta's
meta-llama/Llama-3.3-70B-Instruct. - MLX 8-bit conversion by divinetribe using
mlx-lm. - For background on the abliteration technique, see Maxime Labonne's write-up on Hugging Face.
License
Inherits the Llama 3.3 Community License from the upstream model. Use of this quant is bound by the same terms and Acceptable Use Policy — accept the gated-access prompt above to access the weights.
Notes
"Abliterated" means the model's built-in refusal direction has been suppressed so it doesn't refuse benign-but-edgy requests. It is not a general capability upgrade — please use it responsibly and within the bounds of the upstream license.
- Downloads last month
- 733
8-bit
Model tree for divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx
Base model
meta-llama/Llama-3.1-70B