v0.47.0
Browse filesSee https://github.com/quic/ai-hub-models/releases/v0.47.0 for changelog.
README.md
CHANGED
|
@@ -50,7 +50,7 @@ See our repository for [Falcon3-7B-Instruct on GitHub](https://github.com/quic/a
|
|
| 50 |
**Model Stats:**
|
| 51 |
- Input sequence length for Prompt Processor: 128
|
| 52 |
- Context length: 4096
|
| 53 |
-
-
|
| 54 |
- Supported languages: English, French, Spanish, Portuguese.
|
| 55 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|
| 56 |
- Response Rate: Rate of response generation after the first response token.
|
|
|
|
| 50 |
**Model Stats:**
|
| 51 |
- Input sequence length for Prompt Processor: 128
|
| 52 |
- Context length: 4096
|
| 53 |
+
- Quantization Type: w4a16 + w8a16 (few layers)
|
| 54 |
- Supported languages: English, French, Spanish, Portuguese.
|
| 55 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|
| 56 |
- Response Rate: Rate of response generation after the first response token.
|