Luigi
/

Qwen3-ASR-0.6B-chatllm-quantized

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-ASR-0.6B-chatllm-quantized

This repository contains pre-quantized binaries for Qwen/Qwen3-ASR-0.6B optimized for use with chatllm.cpp and other GGML-compatible backends.

Available Models

qwen3-asr-0.6b-q4_0.bin: 4-bit quantization (Decent accuracy, fastest inference). Recommended for free-tier CPU instances.
qwen3-asr-0.6b-q8_0.bin: 8-bit quantization (High accuracy, slightly slower than Q4).

Usage with chatllm.cpp

Clone or download the binaries.
Run with the following command (requires chatllm-main):

./chatllm-main -m qwen3-asr-0.6b-q4_0.bin -p audio.wav -n 2

Credits

Original Model: Qwen Team (Alibaba Cloud)
C++ Backend: foldl/chatllm.cpp
Quantization: Pre-quantized using chatllm.cpp conversion scripts.

License

Please refer to the original Qwen3-ASR License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support