Instructions to use vystartasv/AgenticQwen-8B-oQ4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use vystartasv/AgenticQwen-8B-oQ4 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir AgenticQwen-8B-oQ4 vystartasv/AgenticQwen-8B-oQ4
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
AgenticQwen-8B-oQ4
This repository contains an unofficial oMLX oQ4 quantized version of alibaba-pai/AgenticQwen-8B.
Original model card: https://huggingface.co/alibaba-pai/AgenticQwen-8B
What was done
The upstream alibaba-pai/AgenticQwen-8B safetensors model was downloaded from Hugging Face and quantized locally using the oMLX oQ streaming quantizer.
Quantization command used:
from omlx.oq import quantize_oq_streaming
quantize_oq_streaming(
model_path="/Users/vilius/.omlx/models/alibaba-pai/AgenticQwen-8B",
output_path="/Users/vilius/.omlx/models/alibaba-pai/AgenticQwen-8B-oQ4",
oq_level=4,
group_size=64,
dtype="bfloat16",
)
The resulting model uses oMLX quantization metadata in config.json:
- default quantization: 4-bit affine
- group size: 64
- selected sensitive tensors retained at 5-bit or 6-bit by oQ
- output format: MLX/oMLX-compatible
safetensors
Files
Main generated artifact:
model.safetensors
Tokenizer and generation/template files were copied from the upstream model:
added_tokens.json
chat_template.jinja
config.json
generation_config.json
merges.txt
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.json
Size
Approximate local sizes after conversion:
| Model | Size |
|---|---|
| Original BF16 safetensors | ~15 GB |
| oMLX oQ4 quantized | ~4.5 GB |
model.safetensors SHA256:
e5043b2e118c36ee43fa98b95ee155dfb90b0f9776bd43487154e7afdc053e70
Usage with oMLX
Download or place this repository under your oMLX models directory, then restart oMLX.
Example local layout:
~/.omlx/models/<namespace>/AgenticQwen-8B-oQ4
After restarting oMLX, the model should appear through the OpenAI-compatible models endpoint:
curl http://127.0.0.1:8000/v1/models
Expected model id if the folder is named AgenticQwen-8B-oQ4:
AgenticQwen-8B-oQ4
Example chat request:
curl http://127.0.0.1:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "AgenticQwen-8B-oQ4",
"messages": [
{"role": "user", "content": "Write a short Python function to reverse a string."}
]
}'
License
The upstream model is Apache 2.0. This quantized derivative is provided under the same license.
See the original model and its documentation here:
Notes
This is a quantized redistribution for convenience. It is not an official release from Alibaba PAI. For model details, intended use, training notes, and limitations, refer to the upstream model card.
- Downloads last month
- 99
4-bit
Model tree for vystartasv/AgenticQwen-8B-oQ4
Base model
alibaba-pai/AgenticQwen-8B