Commit
·
cceea9d
1
Parent(s):
a44ba12
Update README.md
Browse files
README.md
CHANGED
|
@@ -35,14 +35,14 @@ Below is average latency of generating a token using a prompt of varying size us
|
|
| 35 |
|
| 36 |
| Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
|
| 37 |
|-------------|------------|----------------|-------------------|
|
| 38 |
-
|
|
| 39 |
-
| 256 | 1 |
|
| 40 |
-
| 1024 | 1 |
|
| 41 |
-
| 2048 | 1 |
|
| 42 |
-
|
|
| 43 |
-
| 256 | 4 |
|
| 44 |
-
| 1024 | 4 |
|
| 45 |
-
| 2048 | 4 | N/A |
|
| 46 |
|
| 47 |
## Usage Example
|
| 48 |
|
|
|
|
| 35 |
|
| 36 |
| Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
|
| 37 |
|-------------|------------|----------------|-------------------|
|
| 38 |
+
| 32 | 1 | 53.64ms | 15.68ms |
|
| 39 |
+
| 256 | 1 | 59.55ms | 26.05ms |
|
| 40 |
+
| 1024 | 1 | 89.82ms | 99.05ms |
|
| 41 |
+
| 2048 | 1 | 208.0ms | 227.0ms |
|
| 42 |
+
| 32 | 4 | 70.8ms | 19.62ms |
|
| 43 |
+
| 256 | 4 | 78.6ms | 81.29ms |
|
| 44 |
+
| 1024 | 4 | 373.7ms | 369.6ms |
|
| 45 |
+
| 2048 | 4 | N/A | 879.2ms |
|
| 46 |
|
| 47 |
## Usage Example
|
| 48 |
|