qaihm-bot commited on
Commit
a14af66
·
verified ·
1 Parent(s): 7dd6cfa

See https://github.com/quic/ai-hub-models/releases/v0.47.0 for changelog.

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -37,7 +37,7 @@ Download pre-exported model assets from **[Qwen2-7B-Instruct on Qualcomm® AI Hu
37
  - Input sequence length for Prompt Processor: 128
38
  - Context length: 4096
39
  - Number of parameters: 7.07B
40
- - Precision: w4a16 + w8a16 (few layers)
41
  - Information about the model parts: Prompt Processor and Token Generator are split into 5 parts each. Each corresponding Prompt Processor and Token Generator part share weights.
42
  - Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
43
  - TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
 
37
  - Input sequence length for Prompt Processor: 128
38
  - Context length: 4096
39
  - Number of parameters: 7.07B
40
+ - Quantization Type: w4a16 + w8a16 (few layers)
41
  - Information about the model parts: Prompt Processor and Token Generator are split into 5 parts each. Each corresponding Prompt Processor and Token Generator part share weights.
42
  - Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
43
  - TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).