tooktang's picture
Initial release: Qwen3-Reranker-4B CoreML ANE-optimized bundle + service
1e1d0ce verified
metadata
license: apache-2.0
language:
  - en
  - zh
tags:
  - qwen3
  - reranker
  - coreml
  - apple-silicon
  - ane
pipeline_tag: text-ranking
library_name: coremltools
base_model: Qwen/Qwen3-Reranker-4B

Qwen3-Reranker-4B-CoreML (ANE-Optimized)

English

This repository provides a pre-converted CoreML bundle derived from Qwen3-Reranker-4B and an OpenAI-style rerank API service for Apple Silicon.

Bundle Specs

Item Value
Base model Qwen/Qwen3-Reranker-4B
Task Text reranking
Profiles b1_s128
Bundle path bundles/qwen3_reranker_ane_bundle_4b
Default model id qwen3-reranker-4b-ane
Package size (approx.) 7.5G

Scope

  • This release is text-only reranking.
  • Endpoint: POST /rerank and POST /v1/rerank.

Quick Start

./setup_venv.sh
./run_server.sh

Health check:

curl -s http://127.0.0.1:8000/health

Rerank request:

curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'

Notes

  • Fixed shape profile (s128) for low-power deployment.
  • Inputs longer than profile capacity return an explicit error.
  • First request has warm-up latency.
  • Default compute setting is cpu_and_ne (ANE-preferred, not ANE-guaranteed).

中文

这个仓库提供基于 Qwen3-Reranker-4B 的预转换 CoreML bundle,以及可直接运行的文本重排服务(/v1/rerank)。

Bundle 规格

项目
基础模型 Qwen/Qwen3-Reranker-4B
任务类型 文本重排
Profile b1_s128
Bundle 路径 bundles/qwen3_reranker_ane_bundle_4b
默认模型名 qwen3-reranker-4b-ane
包体积(约) 7.5G

范围说明

  • 本版本仅支持纯文本重排
  • 接口为 POST /rerankPOST /v1/rerank

快速开始

./setup_venv.sh
./run_server.sh

健康检查:

curl -s http://127.0.0.1:8000/health

重排请求:

curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'

说明

  • 固定 shape profile(s128),偏向低功耗部署。
  • 输入超过 profile 上限会明确报错。
  • 首次请求会有预热延迟。
  • 默认 cpu_and_ne,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。

License

Apache-2.0. Please also follow the license and usage terms of the base Qwen model.