continuedev
/

instinct

Text Generation

text-generation-inference

Model card Files Files and versions

Adarsh-Iyer commited on Sep 2, 2025

Commit

63c7746

·

verified ·

1 Parent(s): 1171bdf

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -10,9 +10,10 @@ This repo contains the model weights for **Instinct**, [Continue](https://contin
 ## Serving the model
-There are many ways to serve a local model with Continue. If you wish to serve the model using [SGLang](https://github.com/sgl-project/sglang), as we did internally, you can use the following command, optionally adding `--quantization fp8` if desired.
-`python3 -m sglang.launch_server --model-path continuedev/instinct --load-format safetensors`
 ## Learn more

 ## Serving the model
+There are many ways to plug a local model into Continue; we internally used an endpoint served by [SGLang](https://github.com/sgl-project/sglang), which is one of the options below. We observed no significant performance changes with fp8 quantization, so this may be used if desired.
+* SGLang: `python3 -m sglang.launch_server --model-path continuedev/instinct --load-format safetensors`
+* vLLM  : `vllm serve continuedev/instinct --served-model-name instinct --load-format safetensors --enable-prefix-caching --enable-chunked-prefill`
 ## Learn more