Alsebay commited on
Commit
08d7322
Β·
verified Β·
1 Parent(s): 0efb997

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -6,7 +6,7 @@ language:
6
  ## Leaderboard
7
  |Rank|Name|Parameter|Context Length|Tag|Note|
8
  |:---:|---|:---:|:---:|:---:|---|
9
- |πŸ’Ž1|[HyouKan Series](https://huggingface.co/Alsebay/HyouKan-3x7B)|3x7B|<span style="color:cyan">8K</span> - <span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span>|Inference with 4bit (Transformers) | All-rounded Roleplay model. Understand well Character Card and good logic. The first version have 8k context lenght. <span style="color:red">|
10
  |πŸ†2|[SunnyRain](https://huggingface.co/Alsebay/SunnyRain-2x10.7B)|2x10.7B|<span style="color:green">4K</span>|<span style="color:#F53A85">Lewd</span>| To be real, it perform approximate like HyouKan in Roleplay, just got some strange behavious.|
11
  |✌3|[RainyMotip](https://huggingface.co/Alsebay/RainyMotip-2x7B)|2x7B|<span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span> |Good enough model, ok in Roleplay.|
12
  |4|[Nutopia](https://huggingface.co/Alsebay/Nutopia-7B)|7B|<span style="color:red">32K</span>|<span style="color:#F2EC4E">Not for Roleplay</span>|I don't think this work for Roleplay, but it good for solving problem|
@@ -21,11 +21,11 @@ language:
21
  - The Context Length affect too much to your Memory. Let's say I have 16GB Vram card, I can run the model in 2 ways, using Text-Generation-WebUI:
22
  1. Inference: download the origin model, apply args: ``--load-in-4bit --use_double_quant``. I can run all of my model in leaderboard. The bigger parameter is, the slower token can generate. (Ex:7B model could run in 15 token/s, since 3x7b model could only run in ~4-5 token/s)
23
  2. GGUF Quantization (Fastest,cheapest way to run): After you downloaded GGUF version of those models, sometimes, you can't run it although you can run other model that have bigger parameter. That because:
24
- - The context length: 16GB VRAM GPU could run maximum 2x10.7B (~ 19.2B) model with 4k context length. HyouKan is 3x7B(~ 18.5B) parameter, but have 8k(or 32k) context length that need a lot of RAM/VRAM to load. (``--auto-devices`` may help you run the model, I don't know.) => 7B 32k is ~ 13b 4k RAM/VRAM usage in GGUF version.
25
  - That model is bug/broken.😏
26
  - Bigger model will have more information that you need for your Character Card.
27
  - Best GGUF version that you should run (balance speed/performance): Q4_K_M, Q5_K_M (Slower than Q4)
28
 
29
  # Useful link:
30
  - https://huggingface.co/spaces/Vokturz/can-it-run-llm
31
- - https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator (refer only, if you copy my Alsebay/HyouKan-3x7B repo and paste in this, yes, it can load like normal. But the first message is...)
 
6
  ## Leaderboard
7
  |Rank|Name|Parameter|Context Length|Tag|Note|
8
  |:---:|---|:---:|:---:|:---:|---|
9
+ |πŸ’Ž1|[HyouKan Series](https://huggingface.co/Alsebay/HyouKan-3x7B)|3x7B|<span style="color:cyan">8K</span> - <span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span>| All-rounded Roleplay model. Understand well Character Card and good logic. The first version have 8k context lenght. <span style="color:red">|
10
  |πŸ†2|[SunnyRain](https://huggingface.co/Alsebay/SunnyRain-2x10.7B)|2x10.7B|<span style="color:green">4K</span>|<span style="color:#F53A85">Lewd</span>| To be real, it perform approximate like HyouKan in Roleplay, just got some strange behavious.|
11
  |✌3|[RainyMotip](https://huggingface.co/Alsebay/RainyMotip-2x7B)|2x7B|<span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span> |Good enough model, ok in Roleplay.|
12
  |4|[Nutopia](https://huggingface.co/Alsebay/Nutopia-7B)|7B|<span style="color:red">32K</span>|<span style="color:#F2EC4E">Not for Roleplay</span>|I don't think this work for Roleplay, but it good for solving problem|
 
21
  - The Context Length affect too much to your Memory. Let's say I have 16GB Vram card, I can run the model in 2 ways, using Text-Generation-WebUI:
22
  1. Inference: download the origin model, apply args: ``--load-in-4bit --use_double_quant``. I can run all of my model in leaderboard. The bigger parameter is, the slower token can generate. (Ex:7B model could run in 15 token/s, since 3x7b model could only run in ~4-5 token/s)
23
  2. GGUF Quantization (Fastest,cheapest way to run): After you downloaded GGUF version of those models, sometimes, you can't run it although you can run other model that have bigger parameter. That because:
24
+ - The context length: 16GB VRAM GPU could run maximum 2x10.7B (~ 19.2B) model with 4k context length. (5 token/s)
25
  - That model is bug/broken.😏
26
  - Bigger model will have more information that you need for your Character Card.
27
  - Best GGUF version that you should run (balance speed/performance): Q4_K_M, Q5_K_M (Slower than Q4)
28
 
29
  # Useful link:
30
  - https://huggingface.co/spaces/Vokturz/can-it-run-llm
31
+ - https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator