npc0
/

DISC-MedLLM-ggml

Text Generation

Model card Files Files and versions

npc0 commited on Oct 26, 2023

Commit

b707d61

·

1 Parent(s): 5209740

Update README.md

Files changed (1) hide show

README.md +37 -0

README.md CHANGED Viewed

@@ -5,3 +5,40 @@ license: apache-2.0
 This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.
 The weights are converted to GGML format using [baichuan13b.cpp](https://github.com/ouwei2013/baichuan13b.cpp) (based on [llama.cpp](https://github.com/ggerganov/llama.cpp))

 This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.
 The weights are converted to GGML format using [baichuan13b.cpp](https://github.com/ouwei2013/baichuan13b.cpp) (based on [llama.cpp](https://github.com/ggerganov/llama.cpp))
+## How to inference
+1. [Compile baichuan13b](https://github.com/ouwei2013/baichuan13b.cpp#build), a main executable `baichuan13b/build/bin/main` and a server `baichuan13b/build/bin/server` will be generated.
+2. Download the weight in this repository to `baichuan13b/build/bin/`
+3. For command line interface, the following command is useful. You can also read [the doc including other command line parameters](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/main#quick-start)
+    > ```bash
+    > cd baichuan13b/build/bin/
+    > ./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
+    > ```
+4. For API interface, the following command is usefule. You can also read [the doc about server command line options](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/server#llamacppexampleserver)
+    > ```bash
+    > cd baichuan13b/build/bin/
+    > ./server -m ggml-model-q4_0.bin -c 2048
+    > ```
+5. To test API interface, you can use `curl`:
+    > ```bash
+    > curl --request POST \
+    > --url http://localhost:8080/completion \
+    > --data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'
+    > ```
+### Use it in Python
+To use it in Python script like [cli_demo.py](https://github.com/FudanDISC/DISC-MedLLM/blob/main/cli_demo.py)
+all you need to do is replacing the `model.chat()` using `import requests`, POST to `localhost:8080` in JSON
+and decode HTTP return.
+```python
+import requests
+llm_output = requests.post(
+  "http://localhost:8080/completion"
+).json({
+  "prompt": "I feel sick. Nausea and Vomiting.",
+  "n_predict": 512
+}).json()
+print(llm_output)
+```