TMElyralab
/

lyraBELLE

Model card Files Files and versions

bigmoyan commited on May 22, 2023

Commit

194cf5c

·

1 Parent(s): bdd576c

Update README.md

Files changed (1) hide show

README.md +15 -9

README.md CHANGED Viewed

@@ -4,12 +4,11 @@ language:
 - en
 tags:
 - LLM
-- tensorRT
-- Belle
 ---
-## Model Card for lyraBelle
-lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of Belle**.
 The inference speed of lyraBelle has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
@@ -17,7 +16,6 @@ Among its main features are:
 - weights: original BELLE-7B-2M weights released by BelleGroup.
 - device: Nvidia Ampere architechture or newer (e.g A100)
-- batch_size: compiled with dynamic batch size, max batch_size = 8
 Note that:
 **Some interface/code were set for future uses(see demo below).**
@@ -30,7 +28,15 @@ Note that:
 ### test environment
 - device: Nvidia A100 40G
-- batch size: 8
@@ -77,12 +83,12 @@ print(output_texts)
 ``` bibtex
 @Misc{lyraBelle2023,
   author =       {Kangjian Wu, Zhengtao Wang, Bin Wu},
-  title =        {lyraChatGLM: Accelerating Belle by 10x+},
-  howpublished = {\url{https://huggingface.co/TMElyralab/lyraBelle},
   year =         {2023}
 }
 ```
 ## Report bug
-- start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraBelle/discussions
 - report bug with a `[bug]` mark in the title.

 - en
 tags:
 - LLM
+- BELLE
 ---
+## Model Card for lyraBELLE
+lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of BELLE**.
 The inference speed of lyraBelle has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
 - weights: original BELLE-7B-2M weights released by BelleGroup.
 - device: Nvidia Ampere architechture or newer (e.g A100)
 Note that:
 **Some interface/code were set for future uses(see demo below).**
 ### test environment
 - device: Nvidia A100 40G
+- warmup: 10 rounds
+- percision：fp16
+- batch size for our version: 64 (maximum under A100 40G)
+- batch size for original: xx (maximum under A100 40G)
+|version|batch size|speed|
+|:-:|:-:|
+|original|xxx|
+|lyraBELLE|80|3030.36/sec|
 ``` bibtex
 @Misc{lyraBelle2023,
   author =       {Kangjian Wu, Zhengtao Wang, Bin Wu},
+  title =        {lyraBELLE: Accelerating BELLE by 10x+},
+  howpublished = {\url{https://huggingface.co/TMElyralab/lyraBELLE},
   year =         {2023}
 }
 ```
 ## Report bug
+- start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraBELLE/discussions
 - report bug with a `[bug]` mark in the title.