Cannot summarize 8000 tokens
Cannot summarize 8000 tokens with output 1200 token ..
500 token output after that it repeats itself
Need more info about your machine, setup, and config.
I just noticed that the llama.cpp version for GGUF_Q8 is crucial... the model works reasonably well, but there is no “thinking process,” for example.
so i use different pre-wheels ... if think part works sometimes its overthinking endless ... i can not really instruct how many token/words or character should be summarize its everytime a coincidence ~10% of my 8000token input, if ist more than ~1000t it repeats itself ...
and in most cases after the summary comes a again short summary... but this is an issue on most models...
i tried with and without system-prompt, tried with "create an overview..." "create a summary..." also with or without eg: max words 1000 ...
nothing is consistant ...
