FunnyPunch commited on
Commit
9354174
·
verified ·
1 Parent(s): e6eb7f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -19,4 +19,9 @@ And in Layla, chats with 30K tokens and still working were normal.
19
  First I tried adapring RoPE from Wingless Imp 8B by Sicarius. However model went a bit nuts. That being said it's unknown if the issue wasn't on LM Studio side (their standard format of prompt is good, but I noticed that for example Impish Nemo hates it).
20
  I'm currently testing Q8_0.
21
 
22
- If anyone wants to help, please ask and I will upload one of the standard K quants or ARM quants. No Imatrix though, because I don't have a proper file to generate one.
 
 
 
 
 
 
19
  First I tried adapring RoPE from Wingless Imp 8B by Sicarius. However model went a bit nuts. That being said it's unknown if the issue wasn't on LM Studio side (their standard format of prompt is good, but I noticed that for example Impish Nemo hates it).
20
  I'm currently testing Q8_0.
21
 
22
+ If anyone wants to help, please ask and I will upload one of the standard K quants or ARM quants. No Imatrix though, because I don't have a proper file to generate one.
23
+
24
+ 1st Update: 24/25th.02.2026 - like most Llama 3.1 8B models, the issues seem to start around 40-50K tokens. Prompting has to be way more careful. I didn't try generate super long stories yet, because I don't know how on the software I use (LM Studio and others often have limit of 8192 per message).
25
+ Maybe I could try to use llama.cpp directly? However something tells me that results might be mixed. This model has been trained on 16K stories, yes - but that usually means that the model will go off script at 24K at latest.
26
+ Maybe some super low temperature and very strict setting, but then it won't be "creative" writing. (Don't get me started on calling writing using AI "creative" - even the best models can abstract for real only a tiny amount. 8B - not really).
27
+ I will measure perplexity today, but I need to have access to bigger stories. The biggest contiguous one that I have is maybe 180K tokens (it's one story. I Could access a bigger one though)