Upload README.md with huggingface_hub

06f51be verified 7 months ago

1.93 kB

language:
  - en
license: apache-2.0
library_name: gguf
tags:
  - reranker
  - gguf
  - llama.cpp
base_model: Qwen/Qwen3-Reranker-4B

Qwen3-Reranker-4B-F16-GGUF

This model was converted to GGUF format from Qwen/Qwen3-Reranker-4B using llama.cpp via the ggml.ai's GGUF-my-repo space.

Refer to the original model card for more details on the model.

Model Information

This is a F16 quantization of the original model:

F16: Full 16-bit floating point - highest quality, largest size
Q8_0: 8-bit quantization - high quality, good balance
Q4_K_M: 4-bit quantization with medium quality - smaller size, faster inference

This model can be used with llama.cpp and other GGUF-compatible inference engines.

# Example using llama.cpp
./llama-rerank -m Qwen3-Reranker-4B-F16.gguf

Quantization	Use Case
F16	Maximum quality, largest size
Q8_0	High quality, good balance of size/performance
Q4_K_M	Good quality, smallest size, fastest inference

If you use this model, please cite the original model:

# See original model card for citation information

This model inherits the license from the original model. Please refer to the original model card for license details.