Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -159,6 +159,10 @@ accelerate launch eval.py \
|
|
| 159 |
Use `quartet_2_impl=pseudoquant` on non-Blackwell GPUs (uses Triton-based FP4 emulation).
|
| 160 |
Attention backend options: `pytorch` (default), `flash2`, `flash3`, `flash4`.
|
| 161 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
### Dependencies
|
| 163 |
|
| 164 |
- Python ≥ 3.11
|
|
|
|
| 159 |
Use `quartet_2_impl=pseudoquant` on non-Blackwell GPUs (uses Triton-based FP4 emulation).
|
| 160 |
Attention backend options: `pytorch` (default), `flash2`, `flash3`, `flash4`.
|
| 161 |
|
| 162 |
+
### Serving with vLLM
|
| 163 |
+
|
| 164 |
+
CloverLM can be served using [vLLM](https://github.com/vllm-project/vllm) with a custom Quartet II quantization plugin. See [`vllm_plugin/SERVING.md`](vllm_plugin/SERVING.md) for full setup instructions.
|
| 165 |
+
|
| 166 |
### Dependencies
|
| 167 |
|
| 168 |
- Python ≥ 3.11
|