fraseque commited on
Commit
adefff8
·
verified ·
1 Parent(s): 9043499

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -23,8 +23,8 @@ This is an FP8-quantized version of Meta's Llama 3.2 1B model, specifically opti
23
 
24
  ### Model Description
25
 
26
- This model is a deployment-optimized version of Llama 3.2 1B that has been quantized to FP8 precision and compiled for AWS Neuron devices. AWS Neuron is the SDK used to run deep learning workloads on AWS Inferentia and Trainium chips, which are purpose-built machine learning accelerators.
27
- For better performance set Tp_degree=8 on Inf2.24xlarge [Total Token Throughput = ~2.5k tokens/sec]
28
 
29
  ### Key Features
30
 
 
23
 
24
  ### Model Description
25
 
26
+ - This model is a deployment-optimized version of Llama 3.2 1B that has been quantized to FP8 precision and compiled for AWS Neuron devices. AWS Neuron is the SDK used to run deep learning workloads on AWS Inferentia and Trainium chips, which are purpose-built machine learning accelerators.
27
+ - **Note:** For better performance set Tp_degree=8 on Inf2.24xlarge [Total Token Throughput = ~2.5k tokens/sec]
28
 
29
  ### Key Features
30