Update README.md
Browse files
README.md
CHANGED
|
@@ -5,3 +5,40 @@ license: apache-2.0
|
|
| 5 |
This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.
|
| 6 |
|
| 7 |
The weights are converted to GGML format using [baichuan13b.cpp](https://github.com/ouwei2013/baichuan13b.cpp) (based on [llama.cpp](https://github.com/ggerganov/llama.cpp))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.
|
| 6 |
|
| 7 |
The weights are converted to GGML format using [baichuan13b.cpp](https://github.com/ouwei2013/baichuan13b.cpp) (based on [llama.cpp](https://github.com/ggerganov/llama.cpp))
|
| 8 |
+
|
| 9 |
+
## How to inference
|
| 10 |
+
1. [Compile baichuan13b](https://github.com/ouwei2013/baichuan13b.cpp#build), a main executable `baichuan13b/build/bin/main` and a server `baichuan13b/build/bin/server` will be generated.
|
| 11 |
+
2. Download the weight in this repository to `baichuan13b/build/bin/`
|
| 12 |
+
3. For command line interface, the following command is useful. You can also read [the doc including other command line parameters](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/main#quick-start)
|
| 13 |
+
> ```bash
|
| 14 |
+
> cd baichuan13b/build/bin/
|
| 15 |
+
> ./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
|
| 16 |
+
> ```
|
| 17 |
+
|
| 18 |
+
4. For API interface, the following command is usefule. You can also read [the doc about server command line options](https://github.com/ouwei2013/baichuan13b.cpp/tree/master/examples/server#llamacppexampleserver)
|
| 19 |
+
> ```bash
|
| 20 |
+
> cd baichuan13b/build/bin/
|
| 21 |
+
> ./server -m ggml-model-q4_0.bin -c 2048
|
| 22 |
+
> ```
|
| 23 |
+
|
| 24 |
+
5. To test API interface, you can use `curl`:
|
| 25 |
+
> ```bash
|
| 26 |
+
> curl --request POST \
|
| 27 |
+
> --url http://localhost:8080/completion \
|
| 28 |
+
> --data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'
|
| 29 |
+
> ```
|
| 30 |
+
|
| 31 |
+
### Use it in Python
|
| 32 |
+
To use it in Python script like [cli_demo.py](https://github.com/FudanDISC/DISC-MedLLM/blob/main/cli_demo.py)
|
| 33 |
+
all you need to do is replacing the `model.chat()` using `import requests`, POST to `localhost:8080` in JSON
|
| 34 |
+
and decode HTTP return.
|
| 35 |
+
```python
|
| 36 |
+
import requests
|
| 37 |
+
llm_output = requests.post(
|
| 38 |
+
"http://localhost:8080/completion"
|
| 39 |
+
).json({
|
| 40 |
+
"prompt": "I feel sick. Nausea and Vomiting.",
|
| 41 |
+
"n_predict": 512
|
| 42 |
+
}).json()
|
| 43 |
+
print(llm_output)
|
| 44 |
+
```
|