Update README.md
Browse files
README.md
CHANGED
|
@@ -45,7 +45,7 @@ tags:
|
|
| 45 |
- [π Model Introduction](#model-introduction)
|
| 46 |
- [π Model Download](#model-download)
|
| 47 |
- [π Model Benchmark](#model-benchmark)
|
| 48 |
-
- [π Model Inference](#model-inference)
|
| 49 |
- [π Declarations & License](#declarations-license)
|
| 50 |
- [π₯ Company Introduction](#company-introduction)
|
| 51 |
|
|
@@ -278,9 +278,38 @@ CUDA_VISIBLE_DEVICES=0 python demo/text_generation.py --model OrionStarAI/Orion-
|
|
| 278 |
|
| 279 |
```
|
| 280 |
|
| 281 |
-
## 4.4
|
| 282 |
|
| 283 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 284 |
|
| 285 |
`````
|
| 286 |
User: Hello
|
|
@@ -302,7 +331,7 @@ User: Tell me a joke.
|
|
| 302 |
Orion-14B: Sure, here's a classic one-liner: Why don't scientists trust atoms? Because they make up everything.
|
| 303 |
`````
|
| 304 |
|
| 305 |
-
### 4.
|
| 306 |
|
| 307 |
`````
|
| 308 |
UserοΌθͺε·±γη΄Ήδ»γγ¦γγ γγ
|
|
|
|
| 45 |
- [π Model Introduction](#model-introduction)
|
| 46 |
- [π Model Download](#model-download)
|
| 47 |
- [π Model Benchmark](#model-benchmark)
|
| 48 |
+
- [π Model Inference](#model-inference) [<img src="./assets/imgs/vllm.png" alt="vllm" height="20"/>](#vllm) [<img src="./assets/imgs/llama_cpp.png" alt="llamacpp" height="20"/>](#llama-cpp)
|
| 49 |
- [π Declarations & License](#declarations-license)
|
| 50 |
- [π₯ Company Introduction](#company-introduction)
|
| 51 |
|
|
|
|
| 278 |
|
| 279 |
```
|
| 280 |
|
| 281 |
+
## 4.4. Inference by vllm
|
| 282 |
|
| 283 |
+
- Project URL<br>
|
| 284 |
+
https://github.com/vllm-project/vllm
|
| 285 |
+
|
| 286 |
+
- Pull Request<br>
|
| 287 |
+
https://github.com/vllm-project/vllm/pull/2539
|
| 288 |
+
|
| 289 |
+
<a name="llama-cpp"></a><br>
|
| 290 |
+
## 4.5. Inference by llama.cpp
|
| 291 |
+
|
| 292 |
+
- Project URL<br>
|
| 293 |
+
https://github.com/ggerganov/llama.cpp
|
| 294 |
+
|
| 295 |
+
- Pull Request<br>
|
| 296 |
+
https://github.com/ggerganov/llama.cpp/pull/5118
|
| 297 |
+
|
| 298 |
+
- How to convert to GGUF model
|
| 299 |
+
|
| 300 |
+
```shell
|
| 301 |
+
python convert-hf-to-gguf.py path/to/Orion-14B-Chat --outfile chat.gguf
|
| 302 |
+
```
|
| 303 |
+
|
| 304 |
+
- How to run generation
|
| 305 |
+
|
| 306 |
+
```shell
|
| 307 |
+
./main --frequency-penalty 0.5 --frequency-penalty 0.5 --top-k 5 --top-p 0.9 -m chat.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e
|
| 308 |
+
```
|
| 309 |
+
|
| 310 |
+
## 4.6 Example Output
|
| 311 |
+
|
| 312 |
+
### 4.6.1. Casual Chat
|
| 313 |
|
| 314 |
`````
|
| 315 |
User: Hello
|
|
|
|
| 331 |
Orion-14B: Sure, here's a classic one-liner: Why don't scientists trust atoms? Because they make up everything.
|
| 332 |
`````
|
| 333 |
|
| 334 |
+
### 4.6.2. Japanese & Korean Chat
|
| 335 |
|
| 336 |
`````
|
| 337 |
UserοΌθͺε·±γη΄Ήδ»γγ¦γγ γγ
|