--- license: apache-2.0 language: - en - zh tags: - qwen3 - reranker - coreml - apple-silicon - ane pipeline_tag: text-ranking library_name: coremltools base_model: Qwen/Qwen3-Reranker-8B --- # Qwen3-Reranker-8B-CoreML (ANE-Optimized) ## English This repository provides a pre-converted CoreML bundle derived from `Qwen3-Reranker-8B` and an OpenAI-style rerank API service for Apple Silicon. ### Bundle Specs | Item | Value | | --- | --- | | Base model | `Qwen/Qwen3-Reranker-8B` | | Task | Text reranking | | Profiles | `b1_s128` | | Bundle path | `bundles/qwen3_reranker_ane_bundle_8b` | | Default model id | `qwen3-reranker-8b-ane` | | Package size (approx.) | `14G` | ### Scope - This release is **text-only reranking**. - Endpoint: `POST /rerank` and `POST /v1/rerank`. ### Quick Start ```bash ./setup_venv.sh ./run_server.sh ``` Health check: ```bash curl -s http://127.0.0.1:8000/health ``` Rerank request: ```bash curl -s http://127.0.0.1:8000/v1/rerank \ -H 'Content-Type: application/json' \ -d '{ "query": "capital of China", "documents": [ "The capital of China is Beijing.", "Gravity is a force." ], "top_n": 2, "return_documents": true }' ``` ### Notes - Fixed shape profile (`s128`) for low-power deployment. - Inputs longer than profile capacity return an explicit error. - First request has warm-up latency. - Default compute setting is `cpu_and_ne` (ANE-preferred, not ANE-guaranteed). ## 中文 这个仓库提供基于 `Qwen3-Reranker-8B` 的预转换 CoreML bundle,以及可直接运行的文本重排服务(`/v1/rerank`)。 ### Bundle 规格 | 项目 | 值 | | --- | --- | | 基础模型 | `Qwen/Qwen3-Reranker-8B` | | 任务类型 | 文本重排 | | Profile | `b1_s128` | | Bundle 路径 | `bundles/qwen3_reranker_ane_bundle_8b` | | 默认模型名 | `qwen3-reranker-8b-ane` | | 包体积(约) | `14G` | ### 范围说明 - 本版本仅支持**纯文本重排**。 - 接口为 `POST /rerank` 与 `POST /v1/rerank`。 ### 快速开始 ```bash ./setup_venv.sh ./run_server.sh ``` 健康检查: ```bash curl -s http://127.0.0.1:8000/health ``` 重排请求: ```bash curl -s http://127.0.0.1:8000/v1/rerank \ -H 'Content-Type: application/json' \ -d '{ "query": "capital of China", "documents": [ "The capital of China is Beijing.", "Gravity is a force." ], "top_n": 2, "return_documents": true }' ``` ### 说明 - 固定 shape profile(`s128`),偏向低功耗部署。 - 输入超过 profile 上限会明确报错。 - 首次请求会有预热延迟。 - 默认 `cpu_and_ne`,是偏向 ANE 调度,不等于 100% 仅 ANE 执行。 ## License Apache-2.0. Please also follow the license and usage terms of the base Qwen model.