sinjab's picture
Upload README.md with huggingface_hub
06f51be verified
metadata
language:
  - en
license: apache-2.0
library_name: gguf
tags:
  - reranker
  - gguf
  - llama.cpp
base_model: Qwen/Qwen3-Reranker-4B

Qwen3-Reranker-4B-F16-GGUF

This model was converted to GGUF format from Qwen/Qwen3-Reranker-4B using llama.cpp via the ggml.ai's GGUF-my-repo space.

Refer to the original model card for more details on the model.

Model Information

  • Base Model: Qwen/Qwen3-Reranker-4B
  • Quantization: F16
  • Format: GGUF (GPT-Generated Unified Format)
  • Converted with: llama.cpp

Quantization Details

This is a F16 quantization of the original model:

  • F16: Full 16-bit floating point - highest quality, largest size
  • Q8_0: 8-bit quantization - high quality, good balance
  • Q4_K_M: 4-bit quantization with medium quality - smaller size, faster inference

Usage

This model can be used with llama.cpp and other GGUF-compatible inference engines.

# Example using llama.cpp
./llama-rerank -m Qwen3-Reranker-4B-F16.gguf

Model Files

Quantization Use Case
F16 Maximum quality, largest size
Q8_0 High quality, good balance of size/performance
Q4_K_M Good quality, smallest size, fastest inference

Citation

If you use this model, please cite the original model:

# See original model card for citation information

License

This model inherits the license from the original model. Please refer to the original model card for license details.

Acknowledgements