Update README.md
Browse files
README.md
CHANGED
|
@@ -30,9 +30,7 @@ tags:
|
|
| 30 |
|
| 31 |
## Run with LlamaEdge
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
- LlamaEdge version: coming soon
|
| 36 |
|
| 37 |
- Prompt template
|
| 38 |
|
|
@@ -54,13 +52,13 @@ tags:
|
|
| 54 |
|
| 55 |
- Context size: `128000`
|
| 56 |
|
| 57 |
-
|
| 58 |
|
| 59 |
```bash
|
| 60 |
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Phi-3-medium-128k-instruct-Q5_K_M.gguf \
|
| 61 |
llama-api-server.wasm \
|
| 62 |
--prompt-template phi-3-chat \
|
| 63 |
-
--ctx-size
|
| 64 |
--model-name phi-3-medium-128k
|
| 65 |
```
|
| 66 |
|
|
@@ -70,9 +68,9 @@ tags:
|
|
| 70 |
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Phi-3-medium-128k-instruct-Q5_K_M.gguf \
|
| 71 |
llama-chat.wasm \
|
| 72 |
--prompt-template phi-3-chat \
|
| 73 |
-
--ctx-size
|
| 74 |
```
|
| 75 |
-
|
| 76 |
## Quantized GGUF Models
|
| 77 |
|
| 78 |
| Name | Quant method | Bits | Size | Use case |
|
|
|
|
| 30 |
|
| 31 |
## Run with LlamaEdge
|
| 32 |
|
| 33 |
+
- LlamaEdge version: [v0.11.2](https://github.com/LlamaEdge/LlamaEdge/releases/tag/0.11.2) and above
|
|
|
|
|
|
|
| 34 |
|
| 35 |
- Prompt template
|
| 36 |
|
|
|
|
| 52 |
|
| 53 |
- Context size: `128000`
|
| 54 |
|
| 55 |
+
- Run as LlamaEdge service
|
| 56 |
|
| 57 |
```bash
|
| 58 |
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Phi-3-medium-128k-instruct-Q5_K_M.gguf \
|
| 59 |
llama-api-server.wasm \
|
| 60 |
--prompt-template phi-3-chat \
|
| 61 |
+
--ctx-size 128000 \
|
| 62 |
--model-name phi-3-medium-128k
|
| 63 |
```
|
| 64 |
|
|
|
|
| 68 |
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Phi-3-medium-128k-instruct-Q5_K_M.gguf \
|
| 69 |
llama-chat.wasm \
|
| 70 |
--prompt-template phi-3-chat \
|
| 71 |
+
--ctx-size 128000
|
| 72 |
```
|
| 73 |
+
|
| 74 |
## Quantized GGUF Models
|
| 75 |
|
| 76 |
| Name | Quant method | Bits | Size | Use case |
|