v0.47.0
Browse filesSee https://github.com/quic/ai-hub-models/releases/v0.47.0 for changelog.
README.md
CHANGED
|
@@ -37,7 +37,7 @@ Download pre-exported model assets from **[Qwen2-7B-Instruct on Qualcomm® AI Hu
|
|
| 37 |
- Input sequence length for Prompt Processor: 128
|
| 38 |
- Context length: 4096
|
| 39 |
- Number of parameters: 7.07B
|
| 40 |
-
-
|
| 41 |
- Information about the model parts: Prompt Processor and Token Generator are split into 5 parts each. Each corresponding Prompt Processor and Token Generator part share weights.
|
| 42 |
- Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
|
| 43 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|
|
|
|
| 37 |
- Input sequence length for Prompt Processor: 128
|
| 38 |
- Context length: 4096
|
| 39 |
- Number of parameters: 7.07B
|
| 40 |
+
- Quantization Type: w4a16 + w8a16 (few layers)
|
| 41 |
- Information about the model parts: Prompt Processor and Token Generator are split into 5 parts each. Each corresponding Prompt Processor and Token Generator part share weights.
|
| 42 |
- Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
|
| 43 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|