bethrezen commited on
Commit
b173635
·
verified ·
1 Parent(s): f2b63f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -396,6 +396,15 @@ chatbot = pipeline("text-generation", model="ZeroAgency/Zero-Mistral-24B", max_n
396
  chatbot(messages)
397
  ```
398
 
 
 
 
 
 
 
 
 
 
399
 
400
  ## Environmental Impact
401
 
 
396
  chatbot(messages)
397
  ```
398
 
399
+ ### llama-server
400
+
401
+ You can run llama-server - OpenAI compatible server for serving [GGUF version](https://huggingface.co/ZeroAgency/Zero-Mistral-24B-gguf) of model.
402
+
403
+ Example of running with docker container:
404
+
405
+ ```
406
+ docker run --gpus all -v `pwd`:/mnt -p8000:8000 ghcr.io/ggml-org/llama.cpp:server-cuda -fa --port 8000 --host 0.0.0.0 --temp 0.0 --jinja -ngl 100 --api-key DUMMY-API-KEY -m /mnt/Zero-Mistral-24B-Q4_K_M_L.gguf
407
+ ```
408
 
409
  ## Environmental Impact
410