| --- |
| license: apache-2.0 |
| language: |
| - en |
| - zh |
| tags: |
| - qwen3 |
| - reranker |
| - coreml |
| - apple-silicon |
| - ane |
| pipeline_tag: text-ranking |
| library_name: coremltools |
| base_model: Qwen/Qwen3-Reranker-0.6B |
| --- |
| |
| # Qwen3-Reranker-0.6B-CoreML (ANE-Optimized) |
|
|
| ## English |
|
|
| This repository provides a pre-converted CoreML bundle derived from `Qwen3-Reranker-0.6B` and an OpenAI-style rerank API service for Apple Silicon. |
|
|
| ### Bundle Specs |
|
|
| | Item | Value | |
| | --- | --- | |
| | Base model | `Qwen/Qwen3-Reranker-0.6B` | |
| | Task | Text reranking | |
| | Profiles | `b1_s128`, `b4_s128` | |
| | Bundle path | `bundles/qwen3_reranker_ane_bundle` | |
| | Default model id | `qwen3-reranker-0.6b-ane` | |
| | Package size (approx.) | `1.8G` | |
|
|
| ### Scope |
|
|
| - This release is **text-only reranking**. |
| - Endpoint: `POST /rerank` and `POST /v1/rerank`. |
|
|
| ### Quick Start |
|
|
| ```bash |
| ./setup_venv.sh |
| ./run_server.sh |
| ``` |
|
|
| Health check: |
|
|
| ```bash |
| curl -s http://127.0.0.1:8000/health |
| ``` |
|
|
| Rerank request: |
|
|
| ```bash |
| curl -s http://127.0.0.1:8000/v1/rerank \ |
| -H 'Content-Type: application/json' \ |
| -d '{ |
| "query": "capital of China", |
| "documents": [ |
| "The capital of China is Beijing.", |
| "Gravity is a force." |
| ], |
| "top_n": 2, |
| "return_documents": true |
| }' |
| ``` |
|
|
| ### Notes |
|
|
| - Fixed shape profiles (`s128`), optimized for low power usage. |
| - Inputs longer than profile capacity return an explicit error. |
| - First request has warm-up latency. |
| - Default compute setting is `cpu_and_ne` (ANE-preferred, not ANE-guaranteed). |
|
|
| ## 中文 |
|
|
| 这个仓库提供基于 `Qwen3-Reranker-0.6B` 的预转换 CoreML bundle,以及可直接运行的文本重排服务(`/v1/rerank`)。 |
|
|
| ### Bundle 规格 |
|
|
| | 项目 | 值 | |
| | --- | --- | |
| | 基础模型 | `Qwen/Qwen3-Reranker-0.6B` | |
| | 任务类型 | 文本重排 | |
| | Profile | `b1_s128`、`b4_s128` | |
| | Bundle 路径 | `bundles/qwen3_reranker_ane_bundle` | |
| | 默认模型名 | `qwen3-reranker-0.6b-ane` | |
| | 包体积(约) | `1.8G` | |
|
|
| ### 范围说明 |
|
|
| - 本版本仅支持**纯文本重排**。 |
| - 接口为 `POST /rerank` 与 `POST /v1/rerank`。 |
|
|
| ### 快速开始 |
|
|
| ```bash |
| ./setup_venv.sh |
| ./run_server.sh |
| ``` |
|
|
| 健康检查: |
|
|
| ```bash |
| curl -s http://127.0.0.1:8000/health |
| ``` |
|
|
| 重排请求: |
|
|
| ```bash |
| curl -s http://127.0.0.1:8000/v1/rerank \ |
| -H 'Content-Type: application/json' \ |
| -d '{ |
| "query": "capital of China", |
| "documents": [ |
| "The capital of China is Beijing.", |
| "Gravity is a force." |
| ], |
| "top_n": 2, |
| "return_documents": true |
| }' |
| ``` |
|
|
| ### 说明 |
|
|
| - 固定 shape profile(`s128`),偏向低功耗部署。 |
| - 输入超过 profile 上限会明确报错。 |
| - 首次请求会有预热延迟。 |
| - 默认 `cpu_and_ne`,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。 |
|
|
| ## License |
|
|
| Apache-2.0. Please also follow the license and usage terms of the base Qwen model. |
|
|