Alsebay
/

My_LLMs_Leaderboard

English

Model card Files Files and versions

xet

Community

Alsebay commited on Apr 7, 2024

Commit

08d7322

verified ·

1 Parent(s): 0efb997

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ language:
 ## Leaderboard
 |Rank|Name|Parameter|Context Length|Tag|Note|
 |:---:|---|:---:|:---:|:---:|---|
-|💎1|[HyouKan Series](https://huggingface.co/Alsebay/HyouKan-3x7B)|3x7B|<span style="color:cyan">8K</span> - <span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span>|Inference with 4bit (Transformers) | All-rounded Roleplay model. Understand well Character Card and good logic. The first version have 8k context lenght. <span style="color:red">|
 |🏆2|[SunnyRain](https://huggingface.co/Alsebay/SunnyRain-2x10.7B)|2x10.7B|<span style="color:green">4K</span>|<span style="color:#F53A85">Lewd</span>| To be real, it perform approximate like HyouKan in Roleplay, just got some strange behavious.|
 |✌3|[RainyMotip](https://huggingface.co/Alsebay/RainyMotip-2x7B)|2x7B|<span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span> |Good enough model, ok in Roleplay.|
 |4|[Nutopia](https://huggingface.co/Alsebay/Nutopia-7B)|7B|<span style="color:red">32K</span>|<span style="color:#F2EC4E">Not for Roleplay</span>|I don't think this work for Roleplay, but it good for solving problem|
@@ -21,11 +21,11 @@ language:
 - The Context Length affect too much to your Memory. Let's say I have 16GB Vram card, I can run the model in 2 ways, using Text-Generation-WebUI:
   1. Inference: download the origin model, apply args: ``--load-in-4bit --use_double_quant``. I can run all of my model in leaderboard. The bigger parameter is, the slower token can generate. (Ex:7B model could run in 15 token/s, since 3x7b model could only run in ~4-5 token/s)
   2. GGUF Quantization (Fastest,cheapest way to run): After you downloaded GGUF version of those models, sometimes, you can't run it although you can run other model that have bigger parameter. That because:
-       - The context length: 16GB VRAM GPU could run maximum 2x10.7B (~ 19.2B) model with 4k context length. HyouKan is 3x7B(~ 18.5B) parameter, but have 8k(or 32k) context length that need a lot of RAM/VRAM to load. (``--auto-devices`` may help you run the model, I don't know.) => 7B 32k is ~ 13b 4k RAM/VRAM usage in GGUF version.
        - That model is bug/broken.😏
 - Bigger model will have more information that you need for your Character Card.
 - Best GGUF version that you should run (balance speed/performance): Q4_K_M, Q5_K_M (Slower than Q4)
 # Useful link:
 - https://huggingface.co/spaces/Vokturz/can-it-run-llm
-- https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator (refer only, if you copy my Alsebay/HyouKan-3x7B repo and paste in this, yes, it can load like normal. But the first message is...)

 ## Leaderboard
 |Rank|Name|Parameter|Context Length|Tag|Note|
 |:---:|---|:---:|:---:|:---:|---|
+|💎1|[HyouKan Series](https://huggingface.co/Alsebay/HyouKan-3x7B)|3x7B|<span style="color:cyan">8K</span> - <span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span>| All-rounded Roleplay model. Understand well Character Card and good logic. The first version have 8k context lenght. <span style="color:red">|
 |🏆2|[SunnyRain](https://huggingface.co/Alsebay/SunnyRain-2x10.7B)|2x10.7B|<span style="color:green">4K</span>|<span style="color:#F53A85">Lewd</span>| To be real, it perform approximate like HyouKan in Roleplay, just got some strange behavious.|
 |✌3|[RainyMotip](https://huggingface.co/Alsebay/RainyMotip-2x7B)|2x7B|<span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span> |Good enough model, ok in Roleplay.|
 |4|[Nutopia](https://huggingface.co/Alsebay/Nutopia-7B)|7B|<span style="color:red">32K</span>|<span style="color:#F2EC4E">Not for Roleplay</span>|I don't think this work for Roleplay, but it good for solving problem|
 - The Context Length affect too much to your Memory. Let's say I have 16GB Vram card, I can run the model in 2 ways, using Text-Generation-WebUI:
   1. Inference: download the origin model, apply args: ``--load-in-4bit --use_double_quant``. I can run all of my model in leaderboard. The bigger parameter is, the slower token can generate. (Ex:7B model could run in 15 token/s, since 3x7b model could only run in ~4-5 token/s)
   2. GGUF Quantization (Fastest,cheapest way to run): After you downloaded GGUF version of those models, sometimes, you can't run it although you can run other model that have bigger parameter. That because:
+       - The context length: 16GB VRAM GPU could run maximum 2x10.7B (~ 19.2B) model with 4k context length. (5 token/s)
        - That model is bug/broken.😏
 - Bigger model will have more information that you need for your Character Card.
 - Best GGUF version that you should run (balance speed/performance): Q4_K_M, Q5_K_M (Slower than Q4)
 # Useful link:
 - https://huggingface.co/spaces/Vokturz/can-it-run-llm
+- https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator