fraseque commited on
Commit
9043499
·
verified ·
1 Parent(s): df3c3cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -24,6 +24,7 @@ This is an FP8-quantized version of Meta's Llama 3.2 1B model, specifically opti
24
  ### Model Description
25
 
26
  This model is a deployment-optimized version of Llama 3.2 1B that has been quantized to FP8 precision and compiled for AWS Neuron devices. AWS Neuron is the SDK used to run deep learning workloads on AWS Inferentia and Trainium chips, which are purpose-built machine learning accelerators.
 
27
 
28
  ### Key Features
29
 
 
24
  ### Model Description
25
 
26
  This model is a deployment-optimized version of Llama 3.2 1B that has been quantized to FP8 precision and compiled for AWS Neuron devices. AWS Neuron is the SDK used to run deep learning workloads on AWS Inferentia and Trainium chips, which are purpose-built machine learning accelerators.
27
+ For better performance set Tp_degree=8 on Inf2.24xlarge [Total Token Throughput = ~2.5k tokens/sec]
28
 
29
  ### Key Features
30