Update README.md
Browse files
README.md
CHANGED
|
@@ -396,6 +396,15 @@ chatbot = pipeline("text-generation", model="ZeroAgency/Zero-Mistral-24B", max_n
|
|
| 396 |
chatbot(messages)
|
| 397 |
```
|
| 398 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 399 |
|
| 400 |
## Environmental Impact
|
| 401 |
|
|
|
|
| 396 |
chatbot(messages)
|
| 397 |
```
|
| 398 |
|
| 399 |
+
### llama-server
|
| 400 |
+
|
| 401 |
+
You can run llama-server - OpenAI compatible server for serving [GGUF version](https://huggingface.co/ZeroAgency/Zero-Mistral-24B-gguf) of model.
|
| 402 |
+
|
| 403 |
+
Example of running with docker container:
|
| 404 |
+
|
| 405 |
+
```
|
| 406 |
+
docker run --gpus all -v `pwd`:/mnt -p8000:8000 ghcr.io/ggml-org/llama.cpp:server-cuda -fa --port 8000 --host 0.0.0.0 --temp 0.0 --jinja -ngl 100 --api-key DUMMY-API-KEY -m /mnt/Zero-Mistral-24B-Q4_K_M_L.gguf
|
| 407 |
+
```
|
| 408 |
|
| 409 |
## Environmental Impact
|
| 410 |
|