Update README.md
Browse files
README.md
CHANGED
|
@@ -69,6 +69,25 @@ docker run \
|
|
| 69 |
bash
|
| 70 |
```
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
2. Install Dependencies
|
| 73 |
# Install required dependencies
|
| 74 |
```bash
|
|
@@ -142,6 +161,10 @@ python3 benchmark_serving.py --backend vllm --base-url http://127.0.0.1:8080 --d
|
|
| 142 |
|
| 143 |

|
| 144 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 145 |
|
| 146 |
**Quantization Details**
|
| 147 |
Quantization Format FP8 E4M3 (8-bit floating point)
|
|
|
|
| 69 |
bash
|
| 70 |
```
|
| 71 |
|
| 72 |
+
**On Inf2.24xlarge [Optional]**
|
| 73 |
+
```bash
|
| 74 |
+
docker run \
|
| 75 |
+
-it \
|
| 76 |
+
--device=/dev/neuron0 \
|
| 77 |
+
--device=/dev/neuron1 \
|
| 78 |
+
--device=/dev/neuron2 \
|
| 79 |
+
--device=/dev/neuron3 \
|
| 80 |
+
--device=/dev/neuron4 \
|
| 81 |
+
--device=/dev/neuron5 \
|
| 82 |
+
--cap-add SYS_ADMIN \
|
| 83 |
+
--cap-add IPC_LOCK \
|
| 84 |
+
-p 8080:8080 \
|
| 85 |
+
--name llama3-2-1B \
|
| 86 |
+
public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py311-sdk2.26.0-ubuntu22.04 \
|
| 87 |
+
bash
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
|
| 91 |
2. Install Dependencies
|
| 92 |
# Install required dependencies
|
| 93 |
```bash
|
|
|
|
| 161 |
|
| 162 |

|
| 163 |
|
| 164 |
+
Results on Inf2.24xlarge [6 neuron cores]
|
| 165 |
+
|
| 166 |
+

|
| 167 |
+
|
| 168 |
|
| 169 |
**Quantization Details**
|
| 170 |
Quantization Format FP8 E4M3 (8-bit floating point)
|