v0.47.0
Browse filesSee https://github.com/quic/ai-hub-models/releases/v0.47.0 for changelog.
README.md
CHANGED
|
@@ -41,7 +41,7 @@ See our repository for [Llama-v3.2-3B-Instruct on GitHub](https://github.com/qui
|
|
| 41 |
**Model Stats:**
|
| 42 |
- Input sequence length for Prompt Processor: 128
|
| 43 |
- Maximum context length: 4096
|
| 44 |
-
-
|
| 45 |
- Supported languages: English.
|
| 46 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|
| 47 |
- Response Rate: Rate of response generation after the first response token.
|
|
@@ -52,9 +52,11 @@ See our repository for [Llama-v3.2-3B-Instruct on GitHub](https://github.com/qui
|
|
| 52 |
| Llama-v3.2-3B-Instruct | GENIE | w4 | Snapdragon® 8 Elite Mobile | 4096 | 13.83 | 0.088195 - 2.82225
|
| 53 |
| Llama-v3.2-3B-Instruct | GENIE | w4 | Qualcomm® SA8295P | 1024 | 3.523 | 0.37311700000000003 - 2.9849360000000003
|
| 54 |
| Llama-v3.2-3B-Instruct | GENIE | w4 | Snapdragon® 8 Elite Gen 5 Mobile | 4096 | 18.00883 | 0.131546 - 4.209475
|
| 55 |
-
| Llama-v3.2-3B-Instruct | GENIE | w4a16 | Snapdragon® 8 Elite Mobile | 4096 |
|
| 56 |
-
| Llama-v3.2-3B-Instruct | GENIE | w4a16 | Snapdragon® X Elite | 4096 |
|
| 57 |
-
| Llama-v3.2-3B-Instruct | GENIE | w4a16 |
|
|
|
|
|
|
|
| 58 |
|
| 59 |
## License
|
| 60 |
* The license for the original implementation of Llama-v3.2-3B-Instruct can be found
|
|
|
|
| 41 |
**Model Stats:**
|
| 42 |
- Input sequence length for Prompt Processor: 128
|
| 43 |
- Maximum context length: 4096
|
| 44 |
+
- Quantization Type: w4 + w8 (few layers) with fp16 activations and w4a16 + w8a16 (few layers) are supported
|
| 45 |
- Supported languages: English.
|
| 46 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|
| 47 |
- Response Rate: Rate of response generation after the first response token.
|
|
|
|
| 52 |
| Llama-v3.2-3B-Instruct | GENIE | w4 | Snapdragon® 8 Elite Mobile | 4096 | 13.83 | 0.088195 - 2.82225
|
| 53 |
| Llama-v3.2-3B-Instruct | GENIE | w4 | Qualcomm® SA8295P | 1024 | 3.523 | 0.37311700000000003 - 2.9849360000000003
|
| 54 |
| Llama-v3.2-3B-Instruct | GENIE | w4 | Snapdragon® 8 Elite Gen 5 Mobile | 4096 | 18.00883 | 0.131546 - 4.209475
|
| 55 |
+
| Llama-v3.2-3B-Instruct | GENIE | w4a16 | Snapdragon® 8 Elite Mobile | 4096 | 28.03 | 0.08204900000000001 - 2.6255680000000003
|
| 56 |
+
| Llama-v3.2-3B-Instruct | GENIE | w4a16 | Snapdragon® X Elite | 4096 | 11.87 | 0.116884 - 3.740288
|
| 57 |
+
| Llama-v3.2-3B-Instruct | GENIE | w4a16 | Qualcomm® SA8775P | 4096 | 17.47 | 0.109614 - 3.507648
|
| 58 |
+
| Llama-v3.2-3B-Instruct | GENIE | w4a16 | Snapdragon® 8 Elite Gen 5 Mobile | 4096 | 32.65 | 0.06895399999999999 - 2.2065279999999996
|
| 59 |
+
| Llama-v3.2-3B-Instruct | GENIE | w4a16 | Snapdragon® X2 Elite | 4096 | 42.77 | 0.075045 - 2.40144
|
| 60 |
|
| 61 |
## License
|
| 62 |
* The license for the original implementation of Llama-v3.2-3B-Instruct can be found
|