Update new version readme.
Browse files
README.md
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: creativeml-openrail-m
|
| 3 |
+
language: en
|
| 4 |
+
tags:
|
| 5 |
+
- LLM
|
| 6 |
+
- ChatGLM
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
## Breakings!
|
| 12 |
+
|
| 13 |
+
**We know what you want, and here they are!**
|
| 14 |
+
|
| 15 |
+
- Newly released lyraChatGLM model, suitable for Ampere(A100/A10) as well as Volta(V100)
|
| 16 |
+
- lyraChatGLM has been further optimized, reaches **9000tokens/s** on A100 and **3900 tokens/s** on V100, about **5.5x** faster than original version(2023/6/1).
|
| 17 |
+
- The memory usage was optimized too, now we can set batch_size up to **256** on A100!
|
| 18 |
+
|
| 19 |
+
**Note that the code was fully updated too, you need to use new API, see `Uses` below**
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
## Model Card for lyraChatGLM
|
| 23 |
+
|
| 24 |
+
lyraChatGLM is currently the **fastest ChatGLM-6B** available. To the best of our knowledge, it is the **first accelerated version of ChatGLM-6B**.
|
| 25 |
+
|
| 26 |
+
The inference speed of lyraChatGLM has achieved **300x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
|
| 27 |
+
|
| 28 |
+
Among its main features are:
|
| 29 |
+
|
| 30 |
+
- weights: original ChatGLM-6B weights released by THUDM.
|
| 31 |
+
- device: Nvidia GPU with Amperer architecture or Volta architecture (A100, A10, V100...).
|
| 32 |
+
- batch_size: compiled with dynamic batch size, maximum depends on device.
|
| 33 |
+
|
| 34 |
+
## Speed
|
| 35 |
+
|
| 36 |
+
- orginal version(fixed batch infer): commit id 1d240ba
|
| 37 |
+
|
| 38 |
+
### test on A100 40G
|
| 39 |
+
|
| 40 |
+
|version|max_batch_size|max_speed|
|
| 41 |
+
|:-:|:-:|:-:|
|
| 42 |
+
|original|1|30 tokens/s|
|
| 43 |
+
|original(fxied batch infer)|192|1638.52 toekns/s|
|
| 44 |
+
|lyraChatGLM(current)|256|9082.60+ tokens/s|
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
### test on V100
|
| 49 |
+
|version|max_batch_size|max_speed|
|
| 50 |
+
|:-:|:-:|:-:|
|
| 51 |
+
|original|1|17.83 tokens/s|
|
| 52 |
+
|original(fxied batch infer)|128|992.20 toekns/s|
|
| 53 |
+
|lyraChatGLM(current)|192|3911.45+ tokens/s|
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
## Model Sources
|
| 57 |
+
|
| 58 |
+
- **Repository:** https://huggingface.co/THUDM/chatglm-6b
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
## Docker Environment
|
| 62 |
+
|
| 63 |
+
- **docker image available** at [https://hub.docker.com/repository/docker/bigmoyan/lyrallm/general], pull image by:
|
| 64 |
+
|
| 65 |
+
```
|
| 66 |
+
docker pull bigmoyan/lyrallm:v0.1
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
## Uses
|
| 71 |
+
|
| 72 |
+
```python
|
| 73 |
+
from lyraChatGLM import LyraChatGLM6B
|
| 74 |
+
|
| 75 |
+
model_path = "./models/1-gpu-fp16.h5"
|
| 76 |
+
tokenizer_path = "./models"
|
| 77 |
+
data_type = "fp16"
|
| 78 |
+
int8_mode = 0
|
| 79 |
+
max_output_length = 150
|
| 80 |
+
arch = "Ampere" # Ampere or Volta
|
| 81 |
+
|
| 82 |
+
model = LyraChatGLM6B(model_path, tokenizer_path, data_type, int8_mode, arch)
|
| 83 |
+
prompt = "列出3个不同的机器学习算法,并说明它们的适用范围."
|
| 84 |
+
test_batch_size = 256
|
| 85 |
+
|
| 86 |
+
prompts = [prompt, ]
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
# If you want to get different output in same batch, you can set do_sample to True
|
| 90 |
+
output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=False)
|
| 91 |
+
|
| 92 |
+
print(output_texts)
|
| 93 |
+
|
| 94 |
+
```
|
| 95 |
+
## Demo output
|
| 96 |
+
|
| 97 |
+
### input
|
| 98 |
+
列出3个不同的机器学习算法,并说明它们的适用范围.
|
| 99 |
+
|
| 100 |
+
### output
|
| 101 |
+
以下是三个常见的机器学习算法及其适用范围:
|
| 102 |
+
|
| 103 |
+
1. 决策树(Decision Tree):决策树是一种基于分类和回归问题的朴素贝叶斯模型。它通过构建一系列逐步分裂的分支来预测结果。适用于那些具有简单特征、大量数据且数据集大小在可接受范围内的情况。
|
| 104 |
+
|
| 105 |
+
2. 随机森林(Random Forest):随机森林是一种集成学习算法,由多个决策树组成。它的优点是能够处理大规模数据和高维度的特征。适用于需要对多个变量进行建模的场景,例如医疗诊断、金融风险评估等。
|
| 106 |
+
|
| 107 |
+
3. 支持向量机(Support Vector Machine):支持向量机是一种监督学习方法,通常用于分类问题。它可以处理高维数据,并且具有较高的准确性。适用于需要对高维数据进行分类或回归的问题,例如图像识别、自然语言处理等。
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
## Citation
|
| 111 |
+
``` bibtex
|
| 112 |
+
@Misc{lyraChatGLM2023,
|
| 113 |
+
author = {Kangjian Wu, Zhengtao Wang, Yibo Lu, Bin Wu},
|
| 114 |
+
title = {lyraChatGLM: Accelerating ChatGLM by 5.5x+},
|
| 115 |
+
howpublished = {\url{https://huggingface.co/TMElyralab/lyraChatGLM}},
|
| 116 |
+
year = {2023}
|
| 117 |
+
}
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
## Report bug
|
| 121 |
+
- start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraChatGLM/discussions
|
| 122 |
+
- report bug with a `[bug]` mark in the title.
|
| 123 |
+
|
| 124 |
+
|