qaihm-bot commited on
Commit
7236d86
·
verified ·
1 Parent(s): 2a4144f

See https://github.com/qualcomm/ai-hub-models/releases/v0.49.1 for changelog.

Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -16,7 +16,7 @@ pipeline_tag: text-generation
16
  Llama 2 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to w4a16(4-bit weights and 16-bit activations) and part of the model is quantized to w8a16(8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-KVCache-Quantized's latency.
17
 
18
  This is based on the implementation of Llama-v2-7B-Chat found [here](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
19
- This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the [Qualcomm® AI Hub Models](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/llama_v2_7b_chat) library to export with custom configurations. More details on model performance across various devices, can be found [here](#performance-summary).
20
 
21
  Qualcomm AI Hub Models uses [Qualcomm AI Hub Workbench](https://workbench.aihub.qualcomm.com) to compile, profile, and evaluate this model. [Sign up](https://myaccount.qualcomm.com/signup) to run these models on a hosted Qualcomm® device.
22
 
@@ -58,12 +58,12 @@ print(fibonacci(5))
58
 
59
  ## Getting Started
60
  Due to licensing restrictions, we cannot distribute pre-exported model assets for this model.
61
- Use the [Qualcomm® AI Hub Models](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/llama_v2_7b_chat) Python library to compile and export the model with your own:
62
  - Custom weights (e.g., fine-tuned checkpoints)
63
  - Custom input shapes
64
  - Target device and runtime configurations
65
 
66
- See our repository for [Llama-v2-7B-Chat on GitHub](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/llama_v2_7b_chat) for usage instructions.
67
 
68
 
69
  ## Model Details
 
16
  Llama 2 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to w4a16(4-bit weights and 16-bit activations) and part of the model is quantized to w8a16(8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-KVCache-Quantized's latency.
17
 
18
  This is based on the implementation of Llama-v2-7B-Chat found [here](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
19
+ This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/tree/v0.49.1/qai_hub_models/models/llama_v2_7b_chat) library to export with custom configurations. More details on model performance across various devices, can be found [here](#performance-summary).
20
 
21
  Qualcomm AI Hub Models uses [Qualcomm AI Hub Workbench](https://workbench.aihub.qualcomm.com) to compile, profile, and evaluate this model. [Sign up](https://myaccount.qualcomm.com/signup) to run these models on a hosted Qualcomm® device.
22
 
 
58
 
59
  ## Getting Started
60
  Due to licensing restrictions, we cannot distribute pre-exported model assets for this model.
61
+ Use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/tree/v0.49.1/qai_hub_models/models/llama_v2_7b_chat) Python library to compile and export the model with your own:
62
  - Custom weights (e.g., fine-tuned checkpoints)
63
  - Custom input shapes
64
  - Target device and runtime configurations
65
 
66
+ See our repository for [Llama-v2-7B-Chat on GitHub](https://github.com/qualcomm/ai-hub-models/tree/v0.49.1/qai_hub_models/models/llama_v2_7b_chat) for usage instructions.
67
 
68
 
69
  ## Model Details