Instructions to use LLM360/AmberChat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM360/AmberChat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM360/AmberChat")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLM360/AmberChat") model = AutoModelForCausalLM.from_pretrained("LLM360/AmberChat") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LLM360/AmberChat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM360/AmberChat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/AmberChat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LLM360/AmberChat
- SGLang
How to use LLM360/AmberChat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM360/AmberChat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/AmberChat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM360/AmberChat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/AmberChat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LLM360/AmberChat with Docker Model Runner:
docker model run hf.co/LLM360/AmberChat
Commit ·
5425cac
1
Parent(s): 7cb16bf
Add instructions for Ollama
Browse files
README.md
CHANGED
|
@@ -101,6 +101,38 @@ python3 -m fastchat.serve.cli --model-path LLM360/AmberChat
|
|
| 101 |
| **LLM360/AmberChat** | **5.428125** |
|
| 102 |
| [Nous-Hermes-13B](https://huggingface.co/NousResearch/Nous-Hermes-13b) | 5.51 |
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
# Citation
|
| 106 |
|
|
|
|
| 101 |
| **LLM360/AmberChat** | **5.428125** |
|
| 102 |
| [Nous-Hermes-13B](https://huggingface.co/NousResearch/Nous-Hermes-13b) | 5.51 |
|
| 103 |
|
| 104 |
+
# Using Quantized Models with Ollama
|
| 105 |
+
|
| 106 |
+
Please follow these steps to use a quantized version of AmberChat on your personal computer or laptop:
|
| 107 |
+
|
| 108 |
+
1. First, install Ollama by following the instructions provided [here](https://github.com/jmorganca/ollama/tree/main?tab=readme-ov-file#ollama). Next, download a quantized model checkpoint (such as [amberchat.Q8_0.gguf](https://huggingface.co/TheBloke/AmberChat-GGUF/blob/main/amberchat.Q8_0.gguf) for the 8 bit version) from [TheBloke/AmberChat-GGUF](https://huggingface.co/TheBloke/AmberChat-GGUF/tree/main). Create an Ollama Modelfile locally using the template provided below:
|
| 109 |
+
```
|
| 110 |
+
FROM amberchat.Q8_0.gguf
|
| 111 |
+
|
| 112 |
+
TEMPLATE """{{ .System }}
|
| 113 |
+
USER: {{ .Prompt }}
|
| 114 |
+
ASSISTANT:
|
| 115 |
+
"""
|
| 116 |
+
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
|
| 117 |
+
"""
|
| 118 |
+
PARAMETER stop "USER:"
|
| 119 |
+
PARAMETER stop "ASSISTANT:"
|
| 120 |
+
PARAMETER repeat_last_n 0
|
| 121 |
+
PARAMETER num_ctx 2048
|
| 122 |
+
PARAMETER seed 0
|
| 123 |
+
PARAMETER num_predict -1
|
| 124 |
+
```
|
| 125 |
+
Ensure that the FROM directive points to the downloaded checkpoint file.
|
| 126 |
+
|
| 127 |
+
2. Now, you can proceed to build the model by running:
|
| 128 |
+
```bash
|
| 129 |
+
ollama create amberchat -f Modelfile
|
| 130 |
+
```
|
| 131 |
+
3. To run the model from the command line, execute the following:
|
| 132 |
+
```bash
|
| 133 |
+
ollama run amberchat
|
| 134 |
+
```
|
| 135 |
+
You need to build the model once and can just run it afterwards.
|
| 136 |
|
| 137 |
# Citation
|
| 138 |
|