TMElyralab
/

lyraChatGLM

Model card Files Files and versions

bigmoyan commited on May 12, 2023

Commit

05581a1

·

1 Parent(s): acff406

Update README.md

Files changed (1) hide show

README.md +27 -19

README.md CHANGED Viewed

@@ -23,14 +23,11 @@ Among its main features are:
 ### test environment
 - device: Nvidia A100 40G
-- img size: 512x512
-- percision:fp16
-- steps: 30
-- solver: LMSD
-### text2img
 ## Model Sources
@@ -40,35 +37,46 @@ Among its main features are:
 ## Uses
 ```python
 from faster_chat_glm import GLM6B, FasterChatGLM
 # kernel for chat model.
-kernel = GLM6B(plan_path=plan_path,
-               batch_size=BATCH_SIZE,
                num_beams=1,
-               use_cache=USE_CACHE,
                num_heads=32,
                emb_size_per_heads=128,
                decoder_layers=28,
                vocab_size=150528,
                max_seq_len=MAX_OUT_LEN)
 chat = FasterChatGLM(model_dir=chatglm6b_dir, kernel=kernel).half().cuda()
 # generate
 sample_output = chat.generate(inputs=input_ids, max_length=MAX_OUT_LEN)
 ```
 ## Demo output
-### text2img
-![text2img_demo](./output/text2img_demo.jpg)
-### img2img
-![text2img_demo](./output/img2img_input.jpg)
-![text2img_demo](./output/img2img_demo.jpg)

 ### test environment
 - device: Nvidia A100 40G
+|version|speed|
+|:-:|:-:|
+|original|30 tokens/s|
+|lyraChatGLM|310 tokens/s|
 ## Model Sources
 ## Uses
 ```python
+from transformers import AutoTokenizer
 from faster_chat_glm import GLM6B, FasterChatGLM
+tokenizer = AutoTokenizer.from_pretrained(chatglm6b_dir, trust_remote_code=True)
+BATCH_SIZE = 8
+MAX_OUT_LEN = 50
+# prepare input
+input_str = ["音乐推荐应该考虑哪些因素？帮我写一篇不少于800字的方案。 ", ] *
+inputs = tokenizer(input_str, return_tensors="pt", padding=True)
+input_ids = inputs.input_ids.to('cuda:0')
 # kernel for chat model.
+kernel = GLM6B(plan_path="./models/glm6b-bs{BATCH_SIZE}.ftm",
+               batch_size=1,
                num_beams=1,
+               use_cache=True,
                num_heads=32,
                emb_size_per_heads=128,
                decoder_layers=28,
                vocab_size=150528,
                max_seq_len=MAX_OUT_LEN)
 chat = FasterChatGLM(model_dir=chatglm6b_dir, kernel=kernel).half().cuda()
 # generate
 sample_output = chat.generate(inputs=input_ids, max_length=MAX_OUT_LEN)
+# de-tokenize model output to text
+res = tokenizer.decode(sample_output[0], skip_special_tokens=True)
+print(res)
 ```
 ## Demo output
+### input
+音乐推荐应该考虑哪些因素？帮我写一篇不少于800字的方案。
+### output
+音乐推荐是音乐爱好者们经常面临的问题。一个好的音乐推荐应该能够根据用户的需求和喜好,推荐出符合他们口味的音乐。本文将探讨音乐