A little confused about model card

#2
by pathosethoslogos - opened

Thanks for this model, and the Docker command on the model card is very, very helpful!

In an attempt to narrow down my bugs, I just wanted to ask about what you wrote on the model card. It says

Note: This model is not yet supported in a released VLLM container. There's some details on a pull request to support it here which could be used to run the model in VLLM. https://github.com/vllm-project/vllm/pull/31575
sudo docker run --runtime nvidia --gpus all -p 8000:8000 --ipc=host vllm/vllm-openai:nightly --model Firworks/IQuest-Coder-V1-40B-Instruct-nvfp4 --dtype auto --max-model-len 32768

Does this mean the above command includes the fixes from the GitHub issue? I don't need to include extra commands? (except maybe besides --platform "linux/arm64" for Nvidia Spark?)

The above command is my standard VLLM run line assuming nothing special needs to be done. I put the comment in there to say presently VLLM doesn't work but there's probably enough details in the PR to get it working if you want. Some day when that PR is merged and released the standard command will probably work and I'll remove the note. Ideally the IQuest team would have released a Docker image with their changes which I could have used instead in the command but I don't think they have as of yet.

Sign up or log in to comment