Update README.md
Browse files
README.md
CHANGED
|
@@ -19,6 +19,7 @@ pipeline_tag: text-generation
|
|
| 19 |
|
| 20 |
LongWriter-glm4-9b is trained based on [glm-4-9b](https://huggingface.co/THUDM/glm-4-9b), and is capable of generating 10,000+ words at once.
|
| 21 |
|
|
|
|
| 22 |
|
| 23 |
A simple demo for deployment of the model:
|
| 24 |
```python
|
|
@@ -31,7 +32,36 @@ query = "Write a 10000-word China travel guide"
|
|
| 31 |
response, history = model.chat(tokenizer, query, history=[], max_new_tokens=32768, temperature=0.5)
|
| 32 |
print(response)
|
| 33 |
```
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
License: [glm-4-9b License](https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/LICENSE)
|
| 37 |
|
|
|
|
| 19 |
|
| 20 |
LongWriter-glm4-9b is trained based on [glm-4-9b](https://huggingface.co/THUDM/glm-4-9b), and is capable of generating 10,000+ words at once.
|
| 21 |
|
| 22 |
+
Environment: Same environment requirement as [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) (`transforemrs>=4.44.0`).
|
| 23 |
|
| 24 |
A simple demo for deployment of the model:
|
| 25 |
```python
|
|
|
|
| 32 |
response, history = model.chat(tokenizer, query, history=[], max_new_tokens=32768, temperature=0.5)
|
| 33 |
print(response)
|
| 34 |
```
|
| 35 |
+
You can also deploy the model with [vllm](https://github.com/vllm-project/vllm), which allows 10,000+ words generation within a minute. Here is an example code:
|
| 36 |
+
```python
|
| 37 |
+
from vllm import LLM, SamplingParams
|
| 38 |
+
model = LLM(
|
| 39 |
+
model= "THUDM/LongWriter-glm4-9b",
|
| 40 |
+
dtype="auto",
|
| 41 |
+
trust_remote_code=True,
|
| 42 |
+
tensor_parallel_size=1,
|
| 43 |
+
max_model_len=32768,
|
| 44 |
+
gpu_memory_utilization=1,
|
| 45 |
+
)
|
| 46 |
+
tokenizer = model.get_tokenizer()
|
| 47 |
+
stop_token_ids = [tokenizer.eos_token_id, tokenizer.get_command("<|user|>"), tokenizer.get_command("<|observation|>")]
|
| 48 |
+
generation_params = SamplingParams(
|
| 49 |
+
temperature=0.5,
|
| 50 |
+
top_p=0.8,
|
| 51 |
+
top_k=50,
|
| 52 |
+
max_tokens=32768,
|
| 53 |
+
repetition_penalty=1,
|
| 54 |
+
stop_token_ids=stop_token_ids
|
| 55 |
+
)
|
| 56 |
+
query = "Write a 10000-word China travel guide"
|
| 57 |
+
input_ids = tokenizer.build_chat_input(query, history=[], role='user').input_ids[0].tolist()
|
| 58 |
+
outputs = model.generate(
|
| 59 |
+
sampling_params=generation_params,
|
| 60 |
+
prompt_token_ids=[input_ids],
|
| 61 |
+
)
|
| 62 |
+
output = outputs[0]
|
| 63 |
+
print(output.outputs[0].text)
|
| 64 |
+
```
|
| 65 |
|
| 66 |
License: [glm-4-9b License](https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/LICENSE)
|
| 67 |
|