daslab-testing
/

CloverLM

Text Generation

low-precision-training

Model card Files Files and versions

mansaripo commited on Mar 20

Commit

a372444

·

verified ·

1 Parent(s): 317675d

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -159,6 +159,10 @@ accelerate launch eval.py \
 Use `quartet_2_impl=pseudoquant` on non-Blackwell GPUs (uses Triton-based FP4 emulation).
 Attention backend options: `pytorch` (default), `flash2`, `flash3`, `flash4`.
 ### Dependencies
 - Python ≥ 3.11

 Use `quartet_2_impl=pseudoquant` on non-Blackwell GPUs (uses Triton-based FP4 emulation).
 Attention backend options: `pytorch` (default), `flash2`, `flash3`, `flash4`.
+### Serving with vLLM
+CloverLM can be served using [vLLM](https://github.com/vllm-project/vllm) with a custom Quartet II quantization plugin. See [`vllm_plugin/SERVING.md`](vllm_plugin/SERVING.md) for full setup instructions.
 ### Dependencies
 - Python ≥ 3.11