Qwen3-Reranker-4B-CoreML (ANE-Optimized)
English
This repository provides a pre-converted CoreML bundle derived from Qwen3-Reranker-4B and an OpenAI-style rerank API service for Apple Silicon.
Bundle Specs
| Item | Value |
|---|---|
| Base model | Qwen/Qwen3-Reranker-4B |
| Task | Text reranking |
| Profiles | b1_s128 |
| Bundle path | bundles/qwen3_reranker_ane_bundle_4b |
| Default model id | qwen3-reranker-4b-ane |
| Package size (approx.) | 7.5G |
Scope
- This release is text-only reranking.
- Endpoint:
POST /rerankandPOST /v1/rerank.
Quick Start
./setup_venv.sh
./run_server.sh
Health check:
curl -s http://127.0.0.1:8000/health
Rerank request:
curl -s http://127.0.0.1:8000/v1/rerank \
-H 'Content-Type: application/json' \
-d '{
"query": "capital of China",
"documents": [
"The capital of China is Beijing.",
"Gravity is a force."
],
"top_n": 2,
"return_documents": true
}'
Notes
- Fixed shape profile (
s128) for low-power deployment. - Inputs longer than profile capacity return an explicit error.
- First request has warm-up latency.
- Default compute setting is
cpu_and_ne(ANE-preferred, not ANE-guaranteed).
中文
这个仓库提供基于 Qwen3-Reranker-4B 的预转换 CoreML bundle,以及可直接运行的文本重排服务(/v1/rerank)。
Bundle 规格
| 项目 | 值 |
|---|---|
| 基础模型 | Qwen/Qwen3-Reranker-4B |
| 任务类型 | 文本重排 |
| Profile | b1_s128 |
| Bundle 路径 | bundles/qwen3_reranker_ane_bundle_4b |
| 默认模型名 | qwen3-reranker-4b-ane |
| 包体积(约) | 7.5G |
范围说明
- 本版本仅支持纯文本重排。
- 接口为
POST /rerank与POST /v1/rerank。
快速开始
./setup_venv.sh
./run_server.sh
健康检查:
curl -s http://127.0.0.1:8000/health
重排请求:
curl -s http://127.0.0.1:8000/v1/rerank \
-H 'Content-Type: application/json' \
-d '{
"query": "capital of China",
"documents": [
"The capital of China is Beijing.",
"Gravity is a force."
],
"top_n": 2,
"return_documents": true
}'
说明
- 固定 shape profile(
s128),偏向低功耗部署。 - 输入超过 profile 上限会明确报错。
- 首次请求会有预热延迟。
- 默认
cpu_and_ne,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。
License
Apache-2.0. Please also follow the license and usage terms of the base Qwen model.
- Downloads last month
- 5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support