Update README.md
Browse files
README.md
CHANGED
|
@@ -7,11 +7,11 @@ tags:
|
|
| 7 |
- tensorRT
|
| 8 |
- Belle
|
| 9 |
---
|
| 10 |
-
## Model Card for
|
| 11 |
|
| 12 |
lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of Belle**.
|
| 13 |
|
| 14 |
-
The inference speed of
|
| 15 |
|
| 16 |
Among its main features are:
|
| 17 |
|
|
@@ -19,6 +19,12 @@ Among its main features are:
|
|
| 19 |
- device: Nvidia Ampere architechture or newer (e.g A100)
|
| 20 |
- batch_size: compiled with dynamic batch size, max batch_size = 8
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
## Speed
|
| 23 |
|
| 24 |
### test environment
|
|
@@ -33,7 +39,13 @@ Among its main features are:
|
|
| 33 |
|
| 34 |
- **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
|
| 35 |
|
|
|
|
|
|
|
|
|
|
| 36 |
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## Uses
|
| 39 |
|
|
@@ -47,7 +59,7 @@ model_dir = "./model"
|
|
| 47 |
model_name = "1-gpu-fp16.h5"
|
| 48 |
max_output_length = 512
|
| 49 |
|
| 50 |
-
|
| 51 |
model = LyraBelle(model_dir, model_name, data_type, 0)
|
| 52 |
output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
|
| 53 |
print(output_texts)
|
|
|
|
| 7 |
- tensorRT
|
| 8 |
- Belle
|
| 9 |
---
|
| 10 |
+
## Model Card for lyraBelle
|
| 11 |
|
| 12 |
lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of Belle**.
|
| 13 |
|
| 14 |
+
The inference speed of lyraBelle has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
|
| 15 |
|
| 16 |
Among its main features are:
|
| 17 |
|
|
|
|
| 19 |
- device: Nvidia Ampere architechture or newer (e.g A100)
|
| 20 |
- batch_size: compiled with dynamic batch size, max batch_size = 8
|
| 21 |
|
| 22 |
+
Note that:
|
| 23 |
+
**Some interface/code were set for future uses(see demo below).**
|
| 24 |
+
|
| 25 |
+
- **int8 mode**: not supported yet, please always set it to 0
|
| 26 |
+
- **data type**: only `fp16` available.
|
| 27 |
+
|
| 28 |
## Speed
|
| 29 |
|
| 30 |
### test environment
|
|
|
|
| 39 |
|
| 40 |
- **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
|
| 41 |
|
| 42 |
+
## Environment
|
| 43 |
+
|
| 44 |
+
- **docker image available** at [https://hub.docker.com/repository/docker/bigmoyan/lyrallm/general], pull image by:
|
| 45 |
|
| 46 |
+
```
|
| 47 |
+
docker pull bigmoyan/lyrallm:v0.1
|
| 48 |
+
```
|
| 49 |
|
| 50 |
## Uses
|
| 51 |
|
|
|
|
| 59 |
model_name = "1-gpu-fp16.h5"
|
| 60 |
max_output_length = 512
|
| 61 |
|
| 62 |
+
# int8 mode not supported, data_type only support fp16
|
| 63 |
model = LyraBelle(model_dir, model_name, data_type, 0)
|
| 64 |
output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
|
| 65 |
print(output_texts)
|