How to deploy it
#1
by asanchez75 - opened
In a H200 (140G VRAM), after the files have been downloaded using hf download, you could run CUDA_VISIBLE_DEVICES=1 HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1 vllm serve "alpindale/Mistral-Large-Instruct-2407-FP8" --port 8005 --max-model-len 32768 --tokenizer_mode mistral