v0.47.0
Browse filesSee https://github.com/quic/ai-hub-models/releases/v0.47.0 for changelog.
README.md
CHANGED
|
@@ -29,13 +29,7 @@ There are two ways to deploy this model on your device:
|
|
| 29 |
|
| 30 |
### Option 1: Download Pre-Exported Models
|
| 31 |
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
| Runtime | Precision | Chipset | SDK Versions | Download |
|
| 35 |
-
|---|---|---|---|---|
|
| 36 |
-
| GENIE | w8a16 | qualcomm-qcs8275 | QAIRT 2.37 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/qwen2_5_7b_instruct/releases/v0.46.0/qwen2_5_7b_instruct-genie-w8a16-qualcomm_qcs8275.zip)
|
| 37 |
-
|
| 38 |
-
For more device-specific assets and performance metrics, visit **[Qwen2.5-7B-Instruct on Qualcomm® AI Hub](https://aihub.qualcomm.com/models/qwen2_5_7b_instruct)**.
|
| 39 |
|
| 40 |
|
| 41 |
### Option 2: Export with Custom Configurations
|
|
@@ -56,7 +50,7 @@ See our repository for [Qwen2.5-7B-Instruct on GitHub](https://github.com/quic/a
|
|
| 56 |
**Model Stats:**
|
| 57 |
- Input sequence length for Prompt Processor: 128
|
| 58 |
- Context length: 4096
|
| 59 |
-
-
|
| 60 |
- Supported languages: Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
| 61 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|
| 62 |
- Response Rate: Rate of response generation after the first response token.
|
|
|
|
| 29 |
|
| 30 |
### Option 1: Download Pre-Exported Models
|
| 31 |
|
| 32 |
+
Download pre-exported model assets from **[Qwen2.5-7B-Instruct on Qualcomm® AI Hub](https://aihub.qualcomm.com/models/qwen2_5_7b_instruct)**.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
|
| 35 |
### Option 2: Export with Custom Configurations
|
|
|
|
| 50 |
**Model Stats:**
|
| 51 |
- Input sequence length for Prompt Processor: 128
|
| 52 |
- Context length: 4096
|
| 53 |
+
- Quantization Type: w4a16 + w8a16 (few layers)
|
| 54 |
- Supported languages: Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
| 55 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|
| 56 |
- Response Rate: Rate of response generation after the first response token.
|