AgenticQwen-8B-oQ4

This repository contains an unofficial oMLX oQ4 quantized version of alibaba-pai/AgenticQwen-8B.

Original model card: https://huggingface.co/alibaba-pai/AgenticQwen-8B

What was done

The upstream alibaba-pai/AgenticQwen-8B safetensors model was downloaded from Hugging Face and quantized locally using the oMLX oQ streaming quantizer.

Quantization command used:

from omlx.oq import quantize_oq_streaming

quantize_oq_streaming(
    model_path="/Users/vilius/.omlx/models/alibaba-pai/AgenticQwen-8B",
    output_path="/Users/vilius/.omlx/models/alibaba-pai/AgenticQwen-8B-oQ4",
    oq_level=4,
    group_size=64,
    dtype="bfloat16",
)

The resulting model uses oMLX quantization metadata in config.json:

  • default quantization: 4-bit affine
  • group size: 64
  • selected sensitive tensors retained at 5-bit or 6-bit by oQ
  • output format: MLX/oMLX-compatible safetensors

Files

Main generated artifact:

model.safetensors

Tokenizer and generation/template files were copied from the upstream model:

added_tokens.json
chat_template.jinja
config.json
generation_config.json
merges.txt
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.json

Size

Approximate local sizes after conversion:

Model Size
Original BF16 safetensors ~15 GB
oMLX oQ4 quantized ~4.5 GB

model.safetensors SHA256:

e5043b2e118c36ee43fa98b95ee155dfb90b0f9776bd43487154e7afdc053e70

Usage with oMLX

Download or place this repository under your oMLX models directory, then restart oMLX.

Example local layout:

~/.omlx/models/<namespace>/AgenticQwen-8B-oQ4

After restarting oMLX, the model should appear through the OpenAI-compatible models endpoint:

curl http://127.0.0.1:8000/v1/models

Expected model id if the folder is named AgenticQwen-8B-oQ4:

AgenticQwen-8B-oQ4

Example chat request:

curl http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "AgenticQwen-8B-oQ4",
    "messages": [
      {"role": "user", "content": "Write a short Python function to reverse a string."}
    ]
  }'

License

The upstream model is Apache 2.0. This quantized derivative is provided under the same license.

See the original model and its documentation here:

Notes

This is a quantized redistribution for convenience. It is not an official release from Alibaba PAI. For model details, intended use, training notes, and limitations, refer to the upstream model card.

Downloads last month
99
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vystartasv/AgenticQwen-8B-oQ4

Quantized
(3)
this model