Llama-3.3-70B-Instruct-abliterated 8-bit MLX

An 8-bit MLX quantization of huihui-ai/Llama-3.3-70B-Instruct-abliterated, packaged for fast local inference on Apple Silicon.

8-bit was chosen instead of the more common 4-bit so the quant preserves as much of the base model's quality as possible — meant for users who have the unified-memory headroom and care more about output fidelity than minimum footprint.

Model details

Field Value
Base model huihui-ai/Llama-3.3-70B-Instruct-abliterated
Quantization 8-bit affine, group size 64
Format MLX (safetensors, 15 shards)
Architecture Llama 3.3 — 80 layers, 8192 hidden, 64 attention heads / 8 KV heads, 128k context
Disk size ~75 GB
Converted with mlx-lm 0.31.2

Hardware requirements

You need an Apple Silicon Mac with enough unified memory to hold the full model in RAM, plus headroom for the KV cache and your OS:

  • Will not run on 64 GB Macs — use a 4-bit quant instead.
  • Minimum: ~80 GB free unified memory (so a 96 GB Ultra or a 128 GB Max).
  • Comfortable: 128 GB or more, especially if you want long contexts.

Usage

Install mlx-lm:

pip install mlx-lm

Generate from the command line:

mlx_lm.generate \
    --model divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx \
    --prompt "Write a short poem about Apple Silicon." \
    --max-tokens 200

Or load it from Python:

from mlx_lm import load, generate

model, tokenizer = load("divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx")

messages = [{"role": "user", "content": "Hello, who are you?"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)

Credits

License

Inherits the Llama 3.3 Community License from the upstream model. Use of this quant is bound by the same terms and Acceptable Use Policy — accept the gated-access prompt above to access the weights.

Notes

"Abliterated" means the model's built-in refusal direction has been suppressed so it doesn't refuse benign-but-edgy requests. It is not a general capability upgrade — please use it responsibly and within the bounds of the upstream license.

Downloads last month
733
Safetensors
Model size
71B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx