Update README.md
Browse files
README.md
CHANGED
|
@@ -19,4 +19,9 @@ And in Layla, chats with 30K tokens and still working were normal.
|
|
| 19 |
First I tried adapring RoPE from Wingless Imp 8B by Sicarius. However model went a bit nuts. That being said it's unknown if the issue wasn't on LM Studio side (their standard format of prompt is good, but I noticed that for example Impish Nemo hates it).
|
| 20 |
I'm currently testing Q8_0.
|
| 21 |
|
| 22 |
-
If anyone wants to help, please ask and I will upload one of the standard K quants or ARM quants. No Imatrix though, because I don't have a proper file to generate one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
First I tried adapring RoPE from Wingless Imp 8B by Sicarius. However model went a bit nuts. That being said it's unknown if the issue wasn't on LM Studio side (their standard format of prompt is good, but I noticed that for example Impish Nemo hates it).
|
| 20 |
I'm currently testing Q8_0.
|
| 21 |
|
| 22 |
+
If anyone wants to help, please ask and I will upload one of the standard K quants or ARM quants. No Imatrix though, because I don't have a proper file to generate one.
|
| 23 |
+
|
| 24 |
+
1st Update: 24/25th.02.2026 - like most Llama 3.1 8B models, the issues seem to start around 40-50K tokens. Prompting has to be way more careful. I didn't try generate super long stories yet, because I don't know how on the software I use (LM Studio and others often have limit of 8192 per message).
|
| 25 |
+
Maybe I could try to use llama.cpp directly? However something tells me that results might be mixed. This model has been trained on 16K stories, yes - but that usually means that the model will go off script at 24K at latest.
|
| 26 |
+
Maybe some super low temperature and very strict setting, but then it won't be "creative" writing. (Don't get me started on calling writing using AI "creative" - even the best models can abstract for real only a tiny amount. 8B - not really).
|
| 27 |
+
I will measure perplexity today, but I need to have access to bigger stories. The biggest contiguous one that I have is maybe 180K tokens (it's one story. I Could access a bigger one though)
|