Update README.md
Browse files
README.md
CHANGED
|
@@ -96,6 +96,27 @@ PLM-1.8B is a strong and reliable model, particularly in basic knowledge underst
|
|
| 96 |
|
| 97 |
## How to use PLM
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
### llama.cpp
|
| 100 |
|
| 101 |
The original contribution to the llama.cpp framwork is [Si1w/llama.cpp](https://github.com/Si1w/llama.cpp). Here is the usage:
|
|
@@ -106,7 +127,7 @@ cd llama.cpp
|
|
| 106 |
pip install -r requirements.txt
|
| 107 |
```
|
| 108 |
|
| 109 |
-
Then we can build with CPU of GPU (e.g. Orin). The build is based on `cmake`.
|
| 110 |
|
| 111 |
- For CPU
|
| 112 |
|
|
@@ -122,6 +143,18 @@ cmake -B build -DGGML_CUDA=ON
|
|
| 122 |
cmake --build build --config Release
|
| 123 |
```
|
| 124 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
## Future works
|
| 126 |
|
| 127 |
- [ ] Release vLLM, SGLang, and PowerInfer inference scripts for PLM.
|
|
|
|
| 96 |
|
| 97 |
## How to use PLM
|
| 98 |
|
| 99 |
+
Here we introduce some methods to use PLM models.
|
| 100 |
+
|
| 101 |
+
### Hugging Face
|
| 102 |
+
|
| 103 |
+
```python
|
| 104 |
+
import torch
|
| 105 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 106 |
+
|
| 107 |
+
# Load model and tokenizer
|
| 108 |
+
tokenizer = AutoTokenizer.from_pretrained("PLM-Team/PLM-1.8B-Instruct")
|
| 109 |
+
model = AutoModelForCausalLM.from_pretrained("PLM-Team/PLM-1.8B-Instruct", torch_dtype=torch.bfloat16)
|
| 110 |
+
|
| 111 |
+
# Input text
|
| 112 |
+
input_text = "Tell me something about reinforcement learning."
|
| 113 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
| 114 |
+
|
| 115 |
+
# Completion
|
| 116 |
+
output = model.generate(inputs["input_ids"], max_new_tokens=100)
|
| 117 |
+
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
### llama.cpp
|
| 121 |
|
| 122 |
The original contribution to the llama.cpp framwork is [Si1w/llama.cpp](https://github.com/Si1w/llama.cpp). Here is the usage:
|
|
|
|
| 127 |
pip install -r requirements.txt
|
| 128 |
```
|
| 129 |
|
| 130 |
+
Then, we can build with CPU of GPU (e.g. Orin). The build is based on `cmake`.
|
| 131 |
|
| 132 |
- For CPU
|
| 133 |
|
|
|
|
| 143 |
cmake --build build --config Release
|
| 144 |
```
|
| 145 |
|
| 146 |
+
Don't forget to download the GGUF files of the PLM. We use the quantization methods in `llama.cpp` to generate the quantized PLM.
|
| 147 |
+
|
| 148 |
+
```bash
|
| 149 |
+
huggingface-cli download --resume-download PLM-Team/PLM-1.8B-Instruct-gguf --local-dir PLM-Team/PLM-1.8B-Instruct-gguf
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
+
After build the `llama.cpp`, we can use `llama-cli` script to launch the PLM.
|
| 153 |
+
|
| 154 |
+
```bash
|
| 155 |
+
./build/bin/llama-cli -m ./PLM-Team/PLM-1.8B-Instruct-gguf/PLM-1.8B-Instruct-Q8_0.gguf -cnv -p "hello!" -n 128
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
## Future works
|
| 159 |
|
| 160 |
- [ ] Release vLLM, SGLang, and PowerInfer inference scripts for PLM.
|