Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -152,7 +152,7 @@ See the [vLLM recipes page](https://recipes.vllm.ai/poolside/Laguna-XS.2) for ad
152
  For lower latency, serve Laguna XS.2 with the [Laguna-XS.2 DFlash speculator](https://huggingface.co/poolside/Laguna-XS.2-speculator.dflash) — a 5-layer Llama-style draft model that proposes up to 7 tokens per step at ~70% per-position acceptance on coding tasks.
153
 
154
  > [!NOTE]
155
- > DFlash support landed in vLLM via [vllm-project/vllm#41880](https://github.com/vllm-project/vllm/pull/41880) and is available in vLLM 0.21.0 and later. `VLLM_USE_DEEP_GEMM=0` is required: DeepGEMM is currently incompatible with the DFlash draft path.
156
 
157
  ```shell
158
  VLLM_USE_DEEP_GEMM=0 vllm serve poolside/Laguna-XS.2 \
 
152
  For lower latency, serve Laguna XS.2 with the [Laguna-XS.2 DFlash speculator](https://huggingface.co/poolside/Laguna-XS.2-speculator.dflash) — a 5-layer Llama-style draft model that proposes up to 7 tokens per step at ~70% per-position acceptance on coding tasks.
153
 
154
  > [!NOTE]
155
+ > DFlash support landed in vLLM via [vllm-project/vllm#41880](https://github.com/vllm-project/vllm/pull/41880) and is available in vLLM 0.21.0 and later.
156
 
157
  ```shell
158
  VLLM_USE_DEEP_GEMM=0 vllm serve poolside/Laguna-XS.2 \