LLM360
/

AmberChat

@@ -101,6 +101,38 @@ python3 -m fastchat.serve.cli --model-path LLM360/AmberChat
 | **LLM360/AmberChat** | **5.428125** |
 | [Nous-Hermes-13B](https://huggingface.co/NousResearch/Nous-Hermes-13b) | 5.51 |
 # Citation

 | **LLM360/AmberChat** | **5.428125** |
 | [Nous-Hermes-13B](https://huggingface.co/NousResearch/Nous-Hermes-13b) | 5.51 |
+# Using Quantized Models with Ollama
+Please follow these steps to use a quantized version of AmberChat on your personal computer or laptop:
+1. First, install Ollama by following the instructions provided [here](https://github.com/jmorganca/ollama/tree/main?tab=readme-ov-file#ollama). Next, download a quantized model checkpoint (such as [amberchat.Q8_0.gguf](https://huggingface.co/TheBloke/AmberChat-GGUF/blob/main/amberchat.Q8_0.gguf) for the 8 bit version) from [TheBloke/AmberChat-GGUF](https://huggingface.co/TheBloke/AmberChat-GGUF/tree/main). Create an Ollama Modelfile locally using the template provided below:
+```
+FROM amberchat.Q8_0.gguf
+TEMPLATE """{{ .System }}
+USER: {{ .Prompt }}
+ASSISTANT:
+"""
+SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
+"""
+PARAMETER stop "USER:"
+PARAMETER stop "ASSISTANT:"
+PARAMETER repeat_last_n   0
+PARAMETER num_ctx         2048
+PARAMETER seed            0
+PARAMETER num_predict    -1
+```
+Ensure that the FROM directive points to the downloaded checkpoint file.
+2. Now, you can proceed to build the model by running:
+```bash
+ollama create amberchat -f Modelfile
+```
+3. To run the model from the command line, execute the following:
+```bash
+ollama run amberchat
+```
+You need to build the model once and can just run it afterwards.
 # Citation