tooktang's picture
Initial release: Qwen3-Reranker-4B CoreML ANE-optimized bundle + service
1e1d0ce verified
---
license: apache-2.0
language:
- en
- zh
tags:
- qwen3
- reranker
- coreml
- apple-silicon
- ane
pipeline_tag: text-ranking
library_name: coremltools
base_model: Qwen/Qwen3-Reranker-4B
---
# Qwen3-Reranker-4B-CoreML (ANE-Optimized)
## English
This repository provides a pre-converted CoreML bundle derived from `Qwen3-Reranker-4B` and an OpenAI-style rerank API service for Apple Silicon.
### Bundle Specs
| Item | Value |
| --- | --- |
| Base model | `Qwen/Qwen3-Reranker-4B` |
| Task | Text reranking |
| Profiles | `b1_s128` |
| Bundle path | `bundles/qwen3_reranker_ane_bundle_4b` |
| Default model id | `qwen3-reranker-4b-ane` |
| Package size (approx.) | `7.5G` |
### Scope
- This release is **text-only reranking**.
- Endpoint: `POST /rerank` and `POST /v1/rerank`.
### Quick Start
```bash
./setup_venv.sh
./run_server.sh
```
Health check:
```bash
curl -s http://127.0.0.1:8000/health
```
Rerank request:
```bash
curl -s http://127.0.0.1:8000/v1/rerank \
-H 'Content-Type: application/json' \
-d '{
"query": "capital of China",
"documents": [
"The capital of China is Beijing.",
"Gravity is a force."
],
"top_n": 2,
"return_documents": true
}'
```
### Notes
- Fixed shape profile (`s128`) for low-power deployment.
- Inputs longer than profile capacity return an explicit error.
- First request has warm-up latency.
- Default compute setting is `cpu_and_ne` (ANE-preferred, not ANE-guaranteed).
## 中文
这个仓库提供基于 `Qwen3-Reranker-4B` 的预转换 CoreML bundle,以及可直接运行的文本重排服务(`/v1/rerank`)。
### Bundle 规格
| 项目 | 值 |
| --- | --- |
| 基础模型 | `Qwen/Qwen3-Reranker-4B` |
| 任务类型 | 文本重排 |
| Profile | `b1_s128` |
| Bundle 路径 | `bundles/qwen3_reranker_ane_bundle_4b` |
| 默认模型名 | `qwen3-reranker-4b-ane` |
| 包体积(约) | `7.5G` |
### 范围说明
- 本版本仅支持**纯文本重排**
- 接口为 `POST /rerank``POST /v1/rerank`
### 快速开始
```bash
./setup_venv.sh
./run_server.sh
```
健康检查:
```bash
curl -s http://127.0.0.1:8000/health
```
重排请求:
```bash
curl -s http://127.0.0.1:8000/v1/rerank \
-H 'Content-Type: application/json' \
-d '{
"query": "capital of China",
"documents": [
"The capital of China is Beijing.",
"Gravity is a force."
],
"top_n": 2,
"return_documents": true
}'
```
### 说明
- 固定 shape profile(`s128`),偏向低功耗部署。
- 输入超过 profile 上限会明确报错。
- 首次请求会有预热延迟。
- 默认 `cpu_and_ne`,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。
## License
Apache-2.0. Please also follow the license and usage terms of the base Qwen model.