| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - zh |
| | tags: |
| | - qwen3 |
| | - reranker |
| | - coreml |
| | - apple-silicon |
| | - ane |
| | pipeline_tag: text-ranking |
| | library_name: coremltools |
| | base_model: Qwen/Qwen3-Reranker-4B |
| | --- |
| | |
| | # Qwen3-Reranker-4B-CoreML (ANE-Optimized) |
| |
|
| | ## English |
| |
|
| | This repository provides a pre-converted CoreML bundle derived from `Qwen3-Reranker-4B` and an OpenAI-style rerank API service for Apple Silicon. |
| |
|
| | ### Bundle Specs |
| |
|
| | | Item | Value | |
| | | --- | --- | |
| | | Base model | `Qwen/Qwen3-Reranker-4B` | |
| | | Task | Text reranking | |
| | | Profiles | `b1_s128` | |
| | | Bundle path | `bundles/qwen3_reranker_ane_bundle_4b` | |
| | | Default model id | `qwen3-reranker-4b-ane` | |
| | | Package size (approx.) | `7.5G` | |
| |
|
| | ### Scope |
| |
|
| | - This release is **text-only reranking**. |
| | - Endpoint: `POST /rerank` and `POST /v1/rerank`. |
| |
|
| | ### Quick Start |
| |
|
| | ```bash |
| | ./setup_venv.sh |
| | ./run_server.sh |
| | ``` |
| |
|
| | Health check: |
| |
|
| | ```bash |
| | curl -s http://127.0.0.1:8000/health |
| | ``` |
| |
|
| | Rerank request: |
| |
|
| | ```bash |
| | curl -s http://127.0.0.1:8000/v1/rerank \ |
| | -H 'Content-Type: application/json' \ |
| | -d '{ |
| | "query": "capital of China", |
| | "documents": [ |
| | "The capital of China is Beijing.", |
| | "Gravity is a force." |
| | ], |
| | "top_n": 2, |
| | "return_documents": true |
| | }' |
| | ``` |
| |
|
| | ### Notes |
| |
|
| | - Fixed shape profile (`s128`) for low-power deployment. |
| | - Inputs longer than profile capacity return an explicit error. |
| | - First request has warm-up latency. |
| | - Default compute setting is `cpu_and_ne` (ANE-preferred, not ANE-guaranteed). |
| |
|
| | ## 中文 |
| |
|
| | 这个仓库提供基于 `Qwen3-Reranker-4B` 的预转换 CoreML bundle,以及可直接运行的文本重排服务(`/v1/rerank`)。 |
| |
|
| | ### Bundle 规格 |
| |
|
| | | 项目 | 值 | |
| | | --- | --- | |
| | | 基础模型 | `Qwen/Qwen3-Reranker-4B` | |
| | | 任务类型 | 文本重排 | |
| | | Profile | `b1_s128` | |
| | | Bundle 路径 | `bundles/qwen3_reranker_ane_bundle_4b` | |
| | | 默认模型名 | `qwen3-reranker-4b-ane` | |
| | | 包体积(约) | `7.5G` | |
| |
|
| | ### 范围说明 |
| |
|
| | - 本版本仅支持**纯文本重排**。 |
| | - 接口为 `POST /rerank` 与 `POST /v1/rerank`。 |
| |
|
| | ### 快速开始 |
| |
|
| | ```bash |
| | ./setup_venv.sh |
| | ./run_server.sh |
| | ``` |
| |
|
| | 健康检查: |
| |
|
| | ```bash |
| | curl -s http://127.0.0.1:8000/health |
| | ``` |
| |
|
| | 重排请求: |
| |
|
| | ```bash |
| | curl -s http://127.0.0.1:8000/v1/rerank \ |
| | -H 'Content-Type: application/json' \ |
| | -d '{ |
| | "query": "capital of China", |
| | "documents": [ |
| | "The capital of China is Beijing.", |
| | "Gravity is a force." |
| | ], |
| | "top_n": 2, |
| | "return_documents": true |
| | }' |
| | ``` |
| |
|
| | ### 说明 |
| |
|
| | - 固定 shape profile(`s128`),偏向低功耗部署。 |
| | - 输入超过 profile 上限会明确报错。 |
| | - 首次请求会有预热延迟。 |
| | - 默认 `cpu_and_ne`,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。 |
| |
|
| | ## License |
| |
|
| | Apache-2.0. Please also follow the license and usage terms of the base Qwen model. |
| |
|