v0.50.1
Browse filesSee https://github.com/qualcomm/ai-hub-models/releases/v0.50.1 for changelog.
README.md
CHANGED
|
@@ -16,7 +16,7 @@ pipeline_tag: text-generation
|
|
| 16 |
Llama 3 is a family of LLMs. The model is quantized to w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-Quantized's latency.
|
| 17 |
|
| 18 |
This is based on the implementation of Llama-v3.1-8B-Instruct found [here](https://github.com/meta-llama/llama3/tree/main).
|
| 19 |
-
This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/blob/main/qai_hub_models/models/llama_v3_1_8b_instruct) library to export with custom configurations. More details on model performance across various devices, can be found [here](#performance-summary).
|
| 20 |
|
| 21 |
Qualcomm AI Hub Models uses [Qualcomm AI Hub Workbench](https://workbench.aihub.qualcomm.com) to compile, profile, and evaluate this model. [Sign up](https://myaccount.qualcomm.com/signup) to run these models on a hosted Qualcomm® device.
|
| 22 |
|
|
@@ -26,12 +26,12 @@ Please follow the [LLM on-device deployment](https://github.com/qualcomm/ai-hub-
|
|
| 26 |
|
| 27 |
## Getting Started
|
| 28 |
Due to licensing restrictions, we cannot distribute pre-exported model assets for this model.
|
| 29 |
-
Use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/blob/main/qai_hub_models/models/llama_v3_1_8b_instruct) Python library to compile and export the model with your own:
|
| 30 |
- Custom weights (e.g., fine-tuned checkpoints)
|
| 31 |
- Custom input shapes
|
| 32 |
- Target device and runtime configurations
|
| 33 |
|
| 34 |
-
See our repository for [Llama-v3.1-8B-Instruct on GitHub](https://github.com/qualcomm/ai-hub-models/blob/main/qai_hub_models/models/llama_v3_1_8b_instruct) for usage instructions.
|
| 35 |
|
| 36 |
|
| 37 |
## Model Details
|
|
|
|
| 16 |
Llama 3 is a family of LLMs. The model is quantized to w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-Quantized's latency.
|
| 17 |
|
| 18 |
This is based on the implementation of Llama-v3.1-8B-Instruct found [here](https://github.com/meta-llama/llama3/tree/main).
|
| 19 |
+
This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/blob/main/src/qai_hub_models/models/llama_v3_1_8b_instruct) library to export with custom configurations. More details on model performance across various devices, can be found [here](#performance-summary).
|
| 20 |
|
| 21 |
Qualcomm AI Hub Models uses [Qualcomm AI Hub Workbench](https://workbench.aihub.qualcomm.com) to compile, profile, and evaluate this model. [Sign up](https://myaccount.qualcomm.com/signup) to run these models on a hosted Qualcomm® device.
|
| 22 |
|
|
|
|
| 26 |
|
| 27 |
## Getting Started
|
| 28 |
Due to licensing restrictions, we cannot distribute pre-exported model assets for this model.
|
| 29 |
+
Use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/blob/main/src/qai_hub_models/models/llama_v3_1_8b_instruct) Python library to compile and export the model with your own:
|
| 30 |
- Custom weights (e.g., fine-tuned checkpoints)
|
| 31 |
- Custom input shapes
|
| 32 |
- Target device and runtime configurations
|
| 33 |
|
| 34 |
+
See our repository for [Llama-v3.1-8B-Instruct on GitHub](https://github.com/qualcomm/ai-hub-models/blob/main/src/qai_hub_models/models/llama_v3_1_8b_instruct) for usage instructions.
|
| 35 |
|
| 36 |
|
| 37 |
## Model Details
|