Qwen3-Reranker-4B-CoreML (ANE-Optimized)

English

This repository provides a pre-converted CoreML bundle derived from Qwen3-Reranker-4B and an OpenAI-style rerank API service for Apple Silicon.

Bundle Specs

Item Value
Base model Qwen/Qwen3-Reranker-4B
Task Text reranking
Profiles b1_s128
Bundle path bundles/qwen3_reranker_ane_bundle_4b
Default model id qwen3-reranker-4b-ane
Package size (approx.) 7.5G

Scope

  • This release is text-only reranking.
  • Endpoint: POST /rerank and POST /v1/rerank.

Quick Start

./setup_venv.sh
./run_server.sh

Health check:

curl -s http://127.0.0.1:8000/health

Rerank request:

curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'

Notes

  • Fixed shape profile (s128) for low-power deployment.
  • Inputs longer than profile capacity return an explicit error.
  • First request has warm-up latency.
  • Default compute setting is cpu_and_ne (ANE-preferred, not ANE-guaranteed).

中文

这个仓库提供基于 Qwen3-Reranker-4B 的预转换 CoreML bundle,以及可直接运行的文本重排服务(/v1/rerank)。

Bundle 规格

项目
基础模型 Qwen/Qwen3-Reranker-4B
任务类型 文本重排
Profile b1_s128
Bundle 路径 bundles/qwen3_reranker_ane_bundle_4b
默认模型名 qwen3-reranker-4b-ane
包体积(约) 7.5G

范围说明

  • 本版本仅支持纯文本重排
  • 接口为 POST /rerankPOST /v1/rerank

快速开始

./setup_venv.sh
./run_server.sh

健康检查:

curl -s http://127.0.0.1:8000/health

重排请求:

curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'

说明

  • 固定 shape profile(s128),偏向低功耗部署。
  • 输入超过 profile 上限会明确报错。
  • 首次请求会有预热延迟。
  • 默认 cpu_and_ne,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。

License

Apache-2.0. Please also follow the license and usage terms of the base Qwen model.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tooktang/Qwen3-Reranker-4B-CoreML

Base model

Qwen/Qwen3-4B-Base
Quantized
(45)
this model