File size: 2,774 Bytes
1e1d0ce | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | ---
license: apache-2.0
language:
- en
- zh
tags:
- qwen3
- reranker
- coreml
- apple-silicon
- ane
pipeline_tag: text-ranking
library_name: coremltools
base_model: Qwen/Qwen3-Reranker-4B
---
# Qwen3-Reranker-4B-CoreML (ANE-Optimized)
## English
This repository provides a pre-converted CoreML bundle derived from `Qwen3-Reranker-4B` and an OpenAI-style rerank API service for Apple Silicon.
### Bundle Specs
| Item | Value |
| --- | --- |
| Base model | `Qwen/Qwen3-Reranker-4B` |
| Task | Text reranking |
| Profiles | `b1_s128` |
| Bundle path | `bundles/qwen3_reranker_ane_bundle_4b` |
| Default model id | `qwen3-reranker-4b-ane` |
| Package size (approx.) | `7.5G` |
### Scope
- This release is **text-only reranking**.
- Endpoint: `POST /rerank` and `POST /v1/rerank`.
### Quick Start
```bash
./setup_venv.sh
./run_server.sh
```
Health check:
```bash
curl -s http://127.0.0.1:8000/health
```
Rerank request:
```bash
curl -s http://127.0.0.1:8000/v1/rerank \
-H 'Content-Type: application/json' \
-d '{
"query": "capital of China",
"documents": [
"The capital of China is Beijing.",
"Gravity is a force."
],
"top_n": 2,
"return_documents": true
}'
```
### Notes
- Fixed shape profile (`s128`) for low-power deployment.
- Inputs longer than profile capacity return an explicit error.
- First request has warm-up latency.
- Default compute setting is `cpu_and_ne` (ANE-preferred, not ANE-guaranteed).
## 中文
这个仓库提供基于 `Qwen3-Reranker-4B` 的预转换 CoreML bundle,以及可直接运行的文本重排服务(`/v1/rerank`)。
### Bundle 规格
| 项目 | 值 |
| --- | --- |
| 基础模型 | `Qwen/Qwen3-Reranker-4B` |
| 任务类型 | 文本重排 |
| Profile | `b1_s128` |
| Bundle 路径 | `bundles/qwen3_reranker_ane_bundle_4b` |
| 默认模型名 | `qwen3-reranker-4b-ane` |
| 包体积(约) | `7.5G` |
### 范围说明
- 本版本仅支持**纯文本重排**。
- 接口为 `POST /rerank` 与 `POST /v1/rerank`。
### 快速开始
```bash
./setup_venv.sh
./run_server.sh
```
健康检查:
```bash
curl -s http://127.0.0.1:8000/health
```
重排请求:
```bash
curl -s http://127.0.0.1:8000/v1/rerank \
-H 'Content-Type: application/json' \
-d '{
"query": "capital of China",
"documents": [
"The capital of China is Beijing.",
"Gravity is a force."
],
"top_n": 2,
"return_documents": true
}'
```
### 说明
- 固定 shape profile(`s128`),偏向低功耗部署。
- 输入超过 profile 上限会明确报错。
- 首次请求会有预热延迟。
- 默认 `cpu_and_ne`,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。
## License
Apache-2.0. Please also follow the license and usage terms of the base Qwen model.
|