TMElyralab
/

lyraLLaMA

carsonhxsu commited on Aug 18, 2023

Commit

5eabfad

1 Parent(s): 70750a1

[NewFeature] Support inference of LLaMA (7B/13B) using int8 quantization

Files changed (1) hide show

README.md CHANGED Viewed

@@ -39,7 +39,7 @@ We use the LLaMA.13B model for measurement, but this optimized inference is appl
 | --- | --- | --- | --- | --- | --- |
 | Torch LLaMA | 24.65| 167.3 | 322.97 | 407.99 | OOM |
 | lyraLLaMA fp16 | 53.67 | 421.38 | 804.31 | 1519.28| 2679.82 |
-| lyraLLaMA int8 | 138.48 | 993.22 | 1741 | 2816.81 | 4146.52 |
 ## Docker Environment Recommendation

 | --- | --- | --- | --- | --- | --- |
 | Torch LLaMA | 24.65| 167.3 | 322.97 | 407.99 | OOM |
 | lyraLLaMA fp16 | 53.67 | 421.38 | 804.31 | 1519.28| 2679.82 |
+| lyraLLaMA int8 | 79.81 | 603.15 | 1117.27 | 1966.52 | 3200.32 |
 ## Docker Environment Recommendation