fraseque commited on
Commit
563d128
·
verified ·
1 Parent(s): adefff8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -69,6 +69,25 @@ docker run \
69
  bash
70
  ```
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  2. Install Dependencies
73
  # Install required dependencies
74
  ```bash
@@ -142,6 +161,10 @@ python3 benchmark_serving.py --backend vllm --base-url http://127.0.0.1:8080 --d
142
 
143
  ![Screenshot 2025-10-21 at 3.07.47 pm](https://cdn-uploads.huggingface.co/production/uploads/64ccd3db7a4f236357524396/XiJqFZMH995uUHJ_xbbvI.png)
144
 
 
 
 
 
145
 
146
  **Quantization Details**
147
  Quantization Format FP8 E4M3 (8-bit floating point)
 
69
  bash
70
  ```
71
 
72
+ **On Inf2.24xlarge [Optional]**
73
+ ```bash
74
+ docker run \
75
+ -it \
76
+ --device=/dev/neuron0 \
77
+ --device=/dev/neuron1 \
78
+ --device=/dev/neuron2 \
79
+ --device=/dev/neuron3 \
80
+ --device=/dev/neuron4 \
81
+ --device=/dev/neuron5 \
82
+ --cap-add SYS_ADMIN \
83
+ --cap-add IPC_LOCK \
84
+ -p 8080:8080 \
85
+ --name llama3-2-1B \
86
+ public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py311-sdk2.26.0-ubuntu22.04 \
87
+ bash
88
+ ```
89
+
90
+
91
  2. Install Dependencies
92
  # Install required dependencies
93
  ```bash
 
161
 
162
  ![Screenshot 2025-10-21 at 3.07.47 pm](https://cdn-uploads.huggingface.co/production/uploads/64ccd3db7a4f236357524396/XiJqFZMH995uUHJ_xbbvI.png)
163
 
164
+ Results on Inf2.24xlarge [6 neuron cores]
165
+
166
+ ![Screenshot 2025-10-23 at 12.52.57 pm](https://cdn-uploads.huggingface.co/production/uploads/64ccd3db7a4f236357524396/CP7xB8TdEF1PLrIpF7gRD.png)
167
+
168
 
169
  **Quantization Details**
170
  Quantization Format FP8 E4M3 (8-bit floating point)