carsonhxsu commited on
Commit
5eabfad
·
1 Parent(s): 70750a1

[NewFeature] Support inference of LLaMA (7B/13B) using int8 quantization

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -39,7 +39,7 @@ We use the LLaMA.13B model for measurement, but this optimized inference is appl
39
  | --- | --- | --- | --- | --- | --- |
40
  | Torch LLaMA | 24.65| 167.3 | 322.97 | 407.99 | OOM |
41
  | lyraLLaMA fp16 | 53.67 | 421.38 | 804.31 | 1519.28| 2679.82 |
42
- | lyraLLaMA int8 | 138.48 | 993.22 | 1741 | 2816.81 | 4146.52 |
43
 
44
  ## Docker Environment Recommendation
45
 
 
39
  | --- | --- | --- | --- | --- | --- |
40
  | Torch LLaMA | 24.65| 167.3 | 322.97 | 407.99 | OOM |
41
  | lyraLLaMA fp16 | 53.67 | 421.38 | 804.31 | 1519.28| 2679.82 |
42
+ | lyraLLaMA int8 | 79.81 | 603.15 | 1117.27 | 1966.52 | 3200.32 |
43
 
44
  ## Docker Environment Recommendation
45