zai-org
/

LongWriter-glm4-9b

Text Generation

feature-extraction

Model card Files Files and versions

bys0318 commited on Aug 18, 2024

Commit

d8cde7a

·

verified ·

1 Parent(s): 6884326

Update README.md

Files changed (1) hide show

README.md +31 -1

README.md CHANGED Viewed

@@ -19,6 +19,7 @@ pipeline_tag: text-generation
 LongWriter-glm4-9b is trained based on [glm-4-9b](https://huggingface.co/THUDM/glm-4-9b), and is capable of generating 10,000+ words at once.
 A simple demo for deployment of the model:
 ```python
@@ -31,7 +32,36 @@ query = "Write a 10000-word China travel guide"
 response, history = model.chat(tokenizer, query, history=[], max_new_tokens=32768, temperature=0.5)
 print(response)
 ```
-Environment: Same environment requirement as [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) (`transforemrs>=4.44.0`).
 License: [glm-4-9b License](https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/LICENSE)

 LongWriter-glm4-9b is trained based on [glm-4-9b](https://huggingface.co/THUDM/glm-4-9b), and is capable of generating 10,000+ words at once.
+Environment: Same environment requirement as [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) (`transforemrs>=4.44.0`).
 A simple demo for deployment of the model:
 ```python
 response, history = model.chat(tokenizer, query, history=[], max_new_tokens=32768, temperature=0.5)
 print(response)
 ```
+You can also deploy the model with [vllm](https://github.com/vllm-project/vllm), which allows 10,000+ words generation within a minute. Here is an example code:
+```python
+from vllm import LLM, SamplingParams
+model = LLM(
+    model= "THUDM/LongWriter-glm4-9b",
+    dtype="auto",
+    trust_remote_code=True,
+    tensor_parallel_size=1,
+    max_model_len=32768,
+    gpu_memory_utilization=1,
+)
+tokenizer = model.get_tokenizer()
+stop_token_ids = [tokenizer.eos_token_id, tokenizer.get_command("<|user|>"), tokenizer.get_command("<|observation|>")]
+generation_params = SamplingParams(
+    temperature=0.5,
+    top_p=0.8,
+    top_k=50,
+    max_tokens=32768,
+    repetition_penalty=1,
+    stop_token_ids=stop_token_ids
+)
+query = "Write a 10000-word China travel guide"
+input_ids = tokenizer.build_chat_input(query, history=[], role='user').input_ids[0].tolist()
+outputs = model.generate(
+    sampling_params=generation_params,
+    prompt_token_ids=[input_ids],
+)
+output = outputs[0]
+print(output.outputs[0].text)
+```
 License: [glm-4-9b License](https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/LICENSE)