mansaripo commited on
Commit
a372444
·
verified ·
1 Parent(s): 317675d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -159,6 +159,10 @@ accelerate launch eval.py \
159
  Use `quartet_2_impl=pseudoquant` on non-Blackwell GPUs (uses Triton-based FP4 emulation).
160
  Attention backend options: `pytorch` (default), `flash2`, `flash3`, `flash4`.
161
 
 
 
 
 
162
  ### Dependencies
163
 
164
  - Python ≥ 3.11
 
159
  Use `quartet_2_impl=pseudoquant` on non-Blackwell GPUs (uses Triton-based FP4 emulation).
160
  Attention backend options: `pytorch` (default), `flash2`, `flash3`, `flash4`.
161
 
162
+ ### Serving with vLLM
163
+
164
+ CloverLM can be served using [vLLM](https://github.com/vllm-project/vllm) with a custom Quartet II quantization plugin. See [`vllm_plugin/SERVING.md`](vllm_plugin/SERVING.md) for full setup instructions.
165
+
166
  ### Dependencies
167
 
168
  - Python ≥ 3.11