How to deploy it

#1
by asanchez75 - opened

In a H200 (140G VRAM), after the files have been downloaded using hf download, you could run CUDA_VISIBLE_DEVICES=1 HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1 vllm serve "alpindale/Mistral-Large-Instruct-2407-FP8" --port 8005 --max-model-len 32768 --tokenizer_mode mistral

Sign up or log in to comment