n24q02m/Qwen3-Reranker-0.6B-ONNX
ONNX-optimized version of Qwen/Qwen3-Reranker-0.6B for use with qwen3-embed and fastembed (PR #605).
Available Variants
| Variant | File | Size | Description |
|---|---|---|---|
| INT8 | onnx/model_quantized.onnx |
573 MB | Dynamic INT8 quantization (default) |
| Q4F16 | onnx/model_q4f16.onnx |
517 MB | INT4 weights + FP16 activations |
Usage
qwen3-embed
pip install qwen3-embed
from qwen3_embed import TextCrossEncoder
# INT8 (default)
reranker = TextCrossEncoder("n24q02m/Qwen3-Reranker-0.6B-ONNX")
scores = list(reranker.rerank("What is AI?", ["AI is...", "Pizza is..."]))
# Custom instruction
scores = list(reranker.rerank(
"What is AI?",
["doc1", "doc2"],
instruction="Judge document relevance for code search.",
))
# Q4F16 (smaller, slightly less accurate)
reranker_q4 = TextCrossEncoder("n24q02m/Qwen3-Reranker-0.6B-ONNX-Q4F16")
fastembed
pip install fastembed
from fastembed import TextCrossEncoder
# INT8 (default)
reranker = TextCrossEncoder("Qwen/Qwen3-Reranker-0.6B")
scores = list(reranker.rerank("What is AI?", ["AI is...", "Pizza is..."]))
# Q4F16
reranker_q4 = TextCrossEncoder("Qwen/Qwen3-Reranker-0.6B-Q4F16")
Note: fastembed support requires PR #605 or install from fork:
pip install git+https://github.com/n24q02m/fastembed.git@feat/qwen3-support
Conversion Details
- Source: Qwen/Qwen3-Reranker-0.6B
- ONNX opset: 21
- INT8:
onnxruntime.quantization.quantize_dynamic(QInt8) - Q4F16:
MatMulNBitsQuantizer(block_size=128, symmetric) + FP16 cast
Related
- GGUF variants: n24q02m/Qwen3-Reranker-0.6B-GGUF
- Downloads last month
- 341