carsonhxsu
commited on
Commit
·
5eabfad
1
Parent(s):
70750a1
[NewFeature] Support inference of LLaMA (7B/13B) using int8 quantization
Browse files
README.md
CHANGED
|
@@ -39,7 +39,7 @@ We use the LLaMA.13B model for measurement, but this optimized inference is appl
|
|
| 39 |
| --- | --- | --- | --- | --- | --- |
|
| 40 |
| Torch LLaMA | 24.65| 167.3 | 322.97 | 407.99 | OOM |
|
| 41 |
| lyraLLaMA fp16 | 53.67 | 421.38 | 804.31 | 1519.28| 2679.82 |
|
| 42 |
-
| lyraLLaMA int8 |
|
| 43 |
|
| 44 |
## Docker Environment Recommendation
|
| 45 |
|
|
|
|
| 39 |
| --- | --- | --- | --- | --- | --- |
|
| 40 |
| Torch LLaMA | 24.65| 167.3 | 322.97 | 407.99 | OOM |
|
| 41 |
| lyraLLaMA fp16 | 53.67 | 421.38 | 804.31 | 1519.28| 2679.82 |
|
| 42 |
+
| lyraLLaMA int8 | 79.81 | 603.15 | 1117.27 | 1966.52 | 3200.32 |
|
| 43 |
|
| 44 |
## Docker Environment Recommendation
|
| 45 |
|