FunnyPunch
/

LLAMA-3_8B_Unaligned_BETA_Long_Context

Model card Files Files and versions

FunnyPunch commited on 8 days ago

Commit

258d829

·

verified ·

1 Parent(s): 8bf35be

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -8,4 +8,15 @@ I strongly suggest not downloading, or if you do...I guess tell me how bad this
 Go here for original: https://huggingface.co/SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
-24.02.2026 - serious tests on Q8_0 started.

 Go here for original: https://huggingface.co/SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
+24.02.2026 - serious tests on Q8_0 started.
+I initially had an idea, because I was using Q4_0 and similar on the phone where I have 32768 or 65536 context length set up in Layla. And Layla ignores the context limit unlike LM Studio.
+But in LM Studio it wasn't viable, despite the model being extremely fast (few seconds to generate response from Q8_0, regardless how far one was into the chat), because LM Studio forces one to the limit.
+Since limit in original Llama_3_8B_Unaligned_BETA is 16384, if you entered in LM Studio manually 32768, you would end up, if you were unlucky with 3276 context length and if you were lucky with 12768 context length.
+And in Layla, chats with 30K tokens and still working were normal.
+First I tried adapring RoPE from Wingless Imp 8B by Sicarius. However model went a bit nuts. That being said it's unknown if the issue wasn't on LM Studio side (their standard format of prompt is good, but I noticed that for example Impish Nemo hates it).
+I'm currently testing Q8_0.
+If anyone wants to help, please ask and I will upload one of the standard K quants or ARM quants. No Imatrix though, because I don't have a proper file to generate one.