File size: 2,774 Bytes

1e1d0ce

---
license: apache-2.0
language:
- en
- zh
tags:
- qwen3
- reranker
- coreml
- apple-silicon
- ane
pipeline_tag: text-ranking
library_name: coremltools
base_model: Qwen/Qwen3-Reranker-4B
---

# Qwen3-Reranker-4B-CoreML (ANE-Optimized)

## English

This repository provides a pre-converted CoreML bundle derived from `Qwen3-Reranker-4B` and an OpenAI-style rerank API service for Apple Silicon.

### Bundle Specs

| Item | Value |
| --- | --- |
| Base model | `Qwen/Qwen3-Reranker-4B` |
| Task | Text reranking |
| Profiles | `b1_s128` |
| Bundle path | `bundles/qwen3_reranker_ane_bundle_4b` |
| Default model id | `qwen3-reranker-4b-ane` |
| Package size (approx.) | `7.5G` |

### Scope

- This release is **text-only reranking**.
- Endpoint: `POST /rerank` and `POST /v1/rerank`.

### Quick Start

```bash
./setup_venv.sh
./run_server.sh
```

Health check:

```bash
curl -s http://127.0.0.1:8000/health
```

Rerank request:

```bash
curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'
```

### Notes

- Fixed shape profile (`s128`) for low-power deployment.
- Inputs longer than profile capacity return an explicit error.
- First request has warm-up latency.
- Default compute setting is `cpu_and_ne` (ANE-preferred, not ANE-guaranteed).

## 中文

这个仓库提供基于 `Qwen3-Reranker-4B` 的预转换 CoreML bundle，以及可直接运行的文本重排服务（`/v1/rerank`）。

### Bundle 规格

| 项目 | 值 |
| --- | --- |
| 基础模型 | `Qwen/Qwen3-Reranker-4B` |
| 任务类型 | 文本重排 |
| Profile | `b1_s128` |
| Bundle 路径 | `bundles/qwen3_reranker_ane_bundle_4b` |
| 默认模型名 | `qwen3-reranker-4b-ane` |
| 包体积（约） | `7.5G` |

### 范围说明

- 本版本仅支持**纯文本重排**。
- 接口为 `POST /rerank` 与 `POST /v1/rerank`。

### 快速开始

```bash
./setup_venv.sh
./run_server.sh
```

健康检查：

```bash
curl -s http://127.0.0.1:8000/health
```

重排请求：

```bash
curl -s http://127.0.0.1:8000/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "capital of China",
    "documents": [
      "The capital of China is Beijing.",
      "Gravity is a force."
    ],
    "top_n": 2,
    "return_documents": true
  }'
```

### 说明

- 固定 shape profile（`s128`），偏向低功耗部署。
- 输入超过 profile 上限会明确报错。
- 首次请求会有预热延迟。
- 默认 `cpu_and_ne`，是偏向 ANE 调度，不等于 100% 仅 ANE 执行。

## License

Apache-2.0. Please also follow the license and usage terms of the base Qwen model.