Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,7 @@ language:
|
|
| 6 |
## Leaderboard
|
| 7 |
|Rank|Name|Parameter|Context Length|Tag|Note|
|
| 8 |
|:---:|---|:---:|:---:|:---:|---|
|
| 9 |
-
|π1|[HyouKan Series](https://huggingface.co/Alsebay/HyouKan-3x7B)|3x7B|<span style="color:cyan">8K</span> - <span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span>|
|
| 10 |
|π2|[SunnyRain](https://huggingface.co/Alsebay/SunnyRain-2x10.7B)|2x10.7B|<span style="color:green">4K</span>|<span style="color:#F53A85">Lewd</span>| To be real, it perform approximate like HyouKan in Roleplay, just got some strange behavious.|
|
| 11 |
|β3|[RainyMotip](https://huggingface.co/Alsebay/RainyMotip-2x7B)|2x7B|<span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span> |Good enough model, ok in Roleplay.|
|
| 12 |
|4|[Nutopia](https://huggingface.co/Alsebay/Nutopia-7B)|7B|<span style="color:red">32K</span>|<span style="color:#F2EC4E">Not for Roleplay</span>|I don't think this work for Roleplay, but it good for solving problem|
|
|
@@ -21,11 +21,11 @@ language:
|
|
| 21 |
- The Context Length affect too much to your Memory. Let's say I have 16GB Vram card, I can run the model in 2 ways, using Text-Generation-WebUI:
|
| 22 |
1. Inference: download the origin model, apply args: ``--load-in-4bit --use_double_quant``. I can run all of my model in leaderboard. The bigger parameter is, the slower token can generate. (Ex:7B model could run in 15 token/s, since 3x7b model could only run in ~4-5 token/s)
|
| 23 |
2. GGUF Quantization (Fastest,cheapest way to run): After you downloaded GGUF version of those models, sometimes, you can't run it although you can run other model that have bigger parameter. That because:
|
| 24 |
-
- The context length: 16GB VRAM GPU could run maximum 2x10.7B (~ 19.2B) model with 4k context length.
|
| 25 |
- That model is bug/broken.π
|
| 26 |
- Bigger model will have more information that you need for your Character Card.
|
| 27 |
- Best GGUF version that you should run (balance speed/performance): Q4_K_M, Q5_K_M (Slower than Q4)
|
| 28 |
|
| 29 |
# Useful link:
|
| 30 |
- https://huggingface.co/spaces/Vokturz/can-it-run-llm
|
| 31 |
-
- https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
|
|
|
|
| 6 |
## Leaderboard
|
| 7 |
|Rank|Name|Parameter|Context Length|Tag|Note|
|
| 8 |
|:---:|---|:---:|:---:|:---:|---|
|
| 9 |
+
|π1|[HyouKan Series](https://huggingface.co/Alsebay/HyouKan-3x7B)|3x7B|<span style="color:cyan">8K</span> - <span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span>| All-rounded Roleplay model. Understand well Character Card and good logic. The first version have 8k context lenght. <span style="color:red">|
|
| 10 |
|π2|[SunnyRain](https://huggingface.co/Alsebay/SunnyRain-2x10.7B)|2x10.7B|<span style="color:green">4K</span>|<span style="color:#F53A85">Lewd</span>| To be real, it perform approximate like HyouKan in Roleplay, just got some strange behavious.|
|
| 11 |
|β3|[RainyMotip](https://huggingface.co/Alsebay/RainyMotip-2x7B)|2x7B|<span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span> |Good enough model, ok in Roleplay.|
|
| 12 |
|4|[Nutopia](https://huggingface.co/Alsebay/Nutopia-7B)|7B|<span style="color:red">32K</span>|<span style="color:#F2EC4E">Not for Roleplay</span>|I don't think this work for Roleplay, but it good for solving problem|
|
|
|
|
| 21 |
- The Context Length affect too much to your Memory. Let's say I have 16GB Vram card, I can run the model in 2 ways, using Text-Generation-WebUI:
|
| 22 |
1. Inference: download the origin model, apply args: ``--load-in-4bit --use_double_quant``. I can run all of my model in leaderboard. The bigger parameter is, the slower token can generate. (Ex:7B model could run in 15 token/s, since 3x7b model could only run in ~4-5 token/s)
|
| 23 |
2. GGUF Quantization (Fastest,cheapest way to run): After you downloaded GGUF version of those models, sometimes, you can't run it although you can run other model that have bigger parameter. That because:
|
| 24 |
+
- The context length: 16GB VRAM GPU could run maximum 2x10.7B (~ 19.2B) model with 4k context length. (5 token/s)
|
| 25 |
- That model is bug/broken.π
|
| 26 |
- Bigger model will have more information that you need for your Character Card.
|
| 27 |
- Best GGUF version that you should run (balance speed/performance): Q4_K_M, Q5_K_M (Slower than Q4)
|
| 28 |
|
| 29 |
# Useful link:
|
| 30 |
- https://huggingface.co/spaces/Vokturz/can-it-run-llm
|
| 31 |
+
- https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
|