Adarsh-Iyer commited on
Commit
fe52af9
·
verified ·
1 Parent(s): c9fe452

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -10,7 +10,11 @@ This repo contains the model weights for **Instinct**, [Continue](https://contin
10
 
11
  ## Serving the model
12
 
13
- There are many ways to plug a local model into Continue; we internally used an endpoint served by [SGLang](https://github.com/sgl-project/sglang), which is one of the options below. We observed no significant performance changes with fp8 quantization, so this may be used if desired.
 
 
 
 
14
 
15
  * SGLang: `python3 -m sglang.launch_server --model-path continuedev/instinct --load-format safetensors`
16
  * vLLM : `vllm serve continuedev/instinct --served-model-name instinct --load-format safetensors`
 
10
 
11
  ## Serving the model
12
 
13
+ We've released a [Q4_K_M GGUF quantization of Instinct](https://huggingface.co/continuedev/instinct-GGUF), for efficient local inference.
14
+
15
+ [Ollama Instructions coming soon]
16
+
17
+ Besides Ollama, there are many ways to plug a local model into Continue; we internally used an endpoint served by [SGLang](https://github.com/sgl-project/sglang), which is one of the options below. Quantizing for faster inference is also an option that worked well for us.
18
 
19
  * SGLang: `python3 -m sglang.launch_server --model-path continuedev/instinct --load-format safetensors`
20
  * vLLM : `vllm serve continuedev/instinct --served-model-name instinct --load-format safetensors`