qaihm-bot commited on
Commit
0d002dd
·
verified ·
1 Parent(s): 1e4a2a6

See https://github.com/quic/ai-hub-models/releases/v0.45.0 for changelog.

Files changed (1) hide show
  1. README.md +0 -13
README.md CHANGED
@@ -32,20 +32,7 @@ This model is an implementation of Qwen2-7B-Instruct found [here](https://github
32
  - Context length: 4096
33
  - Number of parameters: 7.07B
34
  - Precision: w4a16 + w8a16 (few layers)
35
- - Num of key-value heads: 8
36
  - Information about the model parts: Prompt Processor and Token Generator are split into 5 parts each. Each corresponding Prompt Processor and Token Generator part share weights.
37
- - Prompt processor model size: 5.16 GB
38
- - Prompt processor input (part1): 128 tokens
39
- - Prompt processor output (part1): Embeddings output
40
- - Prompt processor input (other parts): 128 tokens + KVCache initialized with pad token
41
- - Prompt processor output (other parts): 128 output tokens + KVCache for token generator
42
- - Token generator model size: 5.16 GB
43
- - Token generator input (part1): 128 tokens
44
- - Token generator output (part1): Embeddings output
45
- - Token generator input (other parts): 1 input token + past KVCache
46
- - Token generator output (other parts): 1 output token + KVCache for next iteration
47
- - Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations.
48
- - Minimum QNN SDK version required: 2.27.7
49
  - Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
50
  - TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
51
  - Response Rate: Rate of response generation after the first response token.
 
32
  - Context length: 4096
33
  - Number of parameters: 7.07B
34
  - Precision: w4a16 + w8a16 (few layers)
 
35
  - Information about the model parts: Prompt Processor and Token Generator are split into 5 parts each. Each corresponding Prompt Processor and Token Generator part share weights.
 
 
 
 
 
 
 
 
 
 
 
 
36
  - Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
37
  - TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
38
  - Response Rate: Rate of response generation after the first response token.