File size: 4,201 Bytes
deefe71
 
 
 
 
 
0efb997
 
2f581e8
 
904d01c
2f581e8
 
 
 
 
deefe71
 
 
 
 
2f581e8
 
deefe71
 
 
 
08d7322
deefe71
 
77cc57d
 
 
7115ed0
08d7322
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
language:
- en
---
# This is the Leaderboard about ranking my own model :) Also some useful information (Maybe). Main purpose is for Roleplay
## Leaderboard
|Rank|Name|Parameter|Context Length|Tag|Note|
|:---:|---|:---:|:---:|:---:|---|
|๐Ÿ’Ž1|[Narumashi-RT](https://huggingface.co/Alsebay/Narumashi-RT-11B-test)|11B|<span style="color:green">4K</span>|<span style="color:#F53A85">Lewd</span>|Good for Roleplay, although it is LLAMA2. Thank Sao10k :) Could handle some (limited) TSF content.|
|๐Ÿ†2|[NaruMoE](https://huggingface.co/Alsebay/NaruMOE-v1-3x7B)|3x7B|<span style="color:cyan">8K</span> - <span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span>| AVG model, could only handle limited extra content I want. |
|โœŒ3|[NarumashiRTS](https://huggingface.co/Alsebay/NarumashiRTS-V2)|7B|<span style="color:cyan">8K</span>|<span style="color:#40C5F0">Neurral</span>| Base on Kunoichi-7B, so it good enough. Know the extra content. Not lewd and will skip lewd content sometime.|
|4|[HyouKan Series](https://huggingface.co/Alsebay/HyouKan-3x7B)|3x7B|<span style="color:cyan">8K</span> - <span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span>|<span style="color:red">ATTENTION: DON'T USE GGUF VERSION SINCE IT HAVE SOME BUGS (VARY BY VERSION)</span> All-rounded Roleplay model. Understand well Character Card and good logic. The first version have 8k context lenght. <span style="color:red">|
|5|[SunnyRain](https://huggingface.co/Alsebay/SunnyRain-2x10.7B)|2x10.7B|<span style="color:green">4K</span>|<span style="color:#F53A85">Lewd</span>| To be real, it perform approximate like HyouKan in Roleplay, just got some strange behavious.|
|6|[RainyMotip](https://huggingface.co/Alsebay/RainyMotip-2x7B)|2x7B|<span style="color:red">32K</span>|<span style="color:#40C5F0">Neurral</span> |Good enough model, ok in Roleplay.|
|7|[Nutopia](https://huggingface.co/Alsebay/Nutopia-7B)|7B|<span style="color:red">32K</span>|<span style="color:#F2EC4E">Not for Roleplay</span>|I don't think this work for Roleplay, but it good for solving problem|
|8|[TripedalChiken](https://huggingface.co/Alsebay/TripedalChiken)|2x7B|<span style="color:red">32K</span>|<span style="color:#F2EC4E">Not for Roleplay</span>|Solving problem is good, but for Roleplay, I don't think so|

## Note:
- <span style="color:#F53A85">Lewd</span> : perform well NSFW content. Some of lewd words will appear in normal content if your Character Card have NSFW informations.
- <span style="color:#40C5F0">Neurral</span> : perform well SFW content, can perform well NSFW content (limited maybe). Lewd words will less appear in chat/roleplay than <span style="color:#F53A85">Lewd</span>
- <span style="color:#F2EC4E">Not for Roleplay</span> : seem that those model with this tag not understand well Character Card. But its logical is very good.
- **RT**: Rough Translation Dataset that could lead to worse performance than original model.
- **CN**: Chinese dataset pretrain, maybe not understand extra content in English. (I can't find any good english verion.)
# Some experience:
- The Context Length affect too much to your Memory. Let's say I have 16GB Vram card, I can run the model in 2 ways, using Text-Generation-WebUI:
  1. Inference: download the origin model, apply args: ``--load-in-4bit --use_double_quant``. I can run all of my model in leaderboard. The bigger parameter is, the slower token can generate. (Ex:7B model could run in 15 token/s, since 3x7b model could only run in ~4-5 token/s)
  2. GGUF Quantization (Fastest,cheapest way to run): After you downloaded GGUF version of those models, sometimes, you can't run it although you can run other model that have bigger parameter. That because:
       - The context length: 16GB VRAM GPU could run maximum 2x10.7B (~ 19.2B) model with 4k context length. (5 token/s) 
       - That model is bug/broken.๐Ÿ˜
- Bigger model will have more information that you need for your Character Card.
- Best GGUF version that you should run (balance speed/performance): Q4_K_M, Q5_K_M (Slower than Q4)

# Useful link:
- https://huggingface.co/spaces/Vokturz/can-it-run-llm
- https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator