Initial release: Qwen3-Reranker-4B CoreML ANE-optimized bundle + service

1e1d0ce verified 9 days ago

2.77 kB

license: apache-2.0
language:
  - en
  - zh
tags:
  - qwen3
  - reranker
  - coreml
  - apple-silicon
  - ane
pipeline_tag: text-ranking
library_name: coremltools
base_model: Qwen/Qwen3-Reranker-4B

Qwen3-Reranker-4B-CoreML (ANE-Optimized)

English

This repository provides a pre-converted CoreML bundle derived from Qwen3-Reranker-4B and an OpenAI-style rerank API service for Apple Silicon.

Bundle Specs

Item	Value
Base model	`Qwen/Qwen3-Reranker-4B`
Task	Text reranking
Profiles	`b1_s128`
Bundle path	`bundles/qwen3_reranker_ane_bundle_4b`
Default model id	`qwen3-reranker-4b-ane`
Package size (approx.)	`7.5G`

Scope

This release is text-only reranking.
Endpoint: POST /rerank and POST /v1/rerank.

Quick Start

./setup_venv.sh
./run_server.sh

Health check:

curl -s http://127.0.0.1:8000/health

Rerank request:

curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'

Notes

Fixed shape profile (s128) for low-power deployment.
Inputs longer than profile capacity return an explicit error.
First request has warm-up latency.
Default compute setting is cpu_and_ne (ANE-preferred, not ANE-guaranteed).

中文

这个仓库提供基于 Qwen3-Reranker-4B 的预转换 CoreML bundle，以及可直接运行的文本重排服务（/v1/rerank）。

Bundle 规格

项目	值
基础模型	`Qwen/Qwen3-Reranker-4B`
任务类型	文本重排
Profile	`b1_s128`
Bundle 路径	`bundles/qwen3_reranker_ane_bundle_4b`
默认模型名	`qwen3-reranker-4b-ane`
包体积（约）	`7.5G`

范围说明

本版本仅支持纯文本重排。
接口为 POST /rerank 与 POST /v1/rerank。

快速开始

./setup_venv.sh
./run_server.sh

健康检查：

curl -s http://127.0.0.1:8000/health

重排请求：

curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'

说明

固定 shape profile（s128），偏向低功耗部署。
输入超过 profile 上限会明确报错。
首次请求会有预热延迟。
默认 cpu_and_ne，是偏向 ANE 调度，不等于 100% 仅 ANE 执行。

License

Apache-2.0. Please also follow the license and usage terms of the base Qwen model.