File size: 2,774 Bytes
1e1d0ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: apache-2.0
language:
- en
- zh
tags:
- qwen3
- reranker
- coreml
- apple-silicon
- ane
pipeline_tag: text-ranking
library_name: coremltools
base_model: Qwen/Qwen3-Reranker-4B
---

# Qwen3-Reranker-4B-CoreML (ANE-Optimized)

## English

This repository provides a pre-converted CoreML bundle derived from `Qwen3-Reranker-4B` and an OpenAI-style rerank API service for Apple Silicon.

### Bundle Specs

| Item | Value |
| --- | --- |
| Base model | `Qwen/Qwen3-Reranker-4B` |
| Task | Text reranking |
| Profiles | `b1_s128` |
| Bundle path | `bundles/qwen3_reranker_ane_bundle_4b` |
| Default model id | `qwen3-reranker-4b-ane` |
| Package size (approx.) | `7.5G` |

### Scope

- This release is **text-only reranking**.
- Endpoint: `POST /rerank` and `POST /v1/rerank`.

### Quick Start

```bash
./setup_venv.sh
./run_server.sh
```

Health check:

```bash
curl -s http://127.0.0.1:8000/health
```

Rerank request:

```bash
curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'
```

### Notes

- Fixed shape profile (`s128`) for low-power deployment.
- Inputs longer than profile capacity return an explicit error.
- First request has warm-up latency.
- Default compute setting is `cpu_and_ne` (ANE-preferred, not ANE-guaranteed).

## 中文

这个仓库提供基于 `Qwen3-Reranker-4B` 的预转换 CoreML bundle,以及可直接运行的文本重排服务(`/v1/rerank`)。

### Bundle 规格

| 项目 | 值 |
| --- | --- |
| 基础模型 | `Qwen/Qwen3-Reranker-4B` |
| 任务类型 | 文本重排 |
| Profile | `b1_s128` |
| Bundle 路径 | `bundles/qwen3_reranker_ane_bundle_4b` |
| 默认模型名 | `qwen3-reranker-4b-ane` |
| 包体积(约) | `7.5G` |

### 范围说明

- 本版本仅支持**纯文本重排**- 接口为 `POST /rerank``POST /v1/rerank`### 快速开始

```bash
./setup_venv.sh
./run_server.sh
```

健康检查:

```bash
curl -s http://127.0.0.1:8000/health
```

重排请求:

```bash
curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'
```

### 说明

- 固定 shape profile(`s128`),偏向低功耗部署。
- 输入超过 profile 上限会明确报错。
- 首次请求会有预热延迟。
- 默认 `cpu_and_ne`,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。

## License

Apache-2.0. Please also follow the license and usage terms of the base Qwen model.