How to deploy it

by asanchez75 - opened Feb 23

•

In a H200 (140G VRAM), after the files have been downloaded using hf download, you could run CUDA_VISIBLE_DEVICES=1 HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1 vllm serve "alpindale/Mistral-Large-Instruct-2407-FP8" --port 8005 --max-model-len 32768 --tokenizer_mode mistral

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment