FunnyPunch
/

LLAMA-3_8B_Unaligned_BETA_Long_Context

Model card Files Files and versions

FunnyPunch commited on 10 days ago

Commit

9354174

·

verified ·

1 Parent(s): e6eb7f2

Update README.md

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -19,4 +19,9 @@ And in Layla, chats with 30K tokens and still working were normal.
 First I tried adapring RoPE from Wingless Imp 8B by Sicarius. However model went a bit nuts. That being said it's unknown if the issue wasn't on LM Studio side (their standard format of prompt is good, but I noticed that for example Impish Nemo hates it).
 I'm currently testing Q8_0.
-If anyone wants to help, please ask and I will upload one of the standard K quants or ARM quants. No Imatrix though, because I don't have a proper file to generate one.

 First I tried adapring RoPE from Wingless Imp 8B by Sicarius. However model went a bit nuts. That being said it's unknown if the issue wasn't on LM Studio side (their standard format of prompt is good, but I noticed that for example Impish Nemo hates it).
 I'm currently testing Q8_0.
+If anyone wants to help, please ask and I will upload one of the standard K quants or ARM quants. No Imatrix though, because I don't have a proper file to generate one.
+1st Update: 24/25th.02.2026 - like most Llama 3.1 8B models, the issues seem to start around 40-50K tokens. Prompting has to be way more careful. I didn't try generate super long stories yet, because I don't know how on the software I use (LM Studio and others often have limit of 8192 per message).
+Maybe I could try to use llama.cpp directly? However something tells me that results might be mixed. This model has been trained on 16K stories, yes - but that usually means that the model will go off script at 24K at latest.
+Maybe some super low temperature and very strict setting, but then it won't be "creative" writing. (Don't get me started on calling writing using AI "creative" - even the best models can abstract for real only a tiny amount. 8B - not really).
+I will measure perplexity today, but I need to have access to bigger stories. The biggest contiguous one that I have is maybe 180K tokens (it's one story. I Could access a bigger one though)