High CPU Usage / Slow Context Processing
#5
by
PussyHut
- opened
For save your time:
If you have encounter high CPU usage/slow context processing problem, this is unrelated to quantification, it's llama.cpp issue:
- https://github.com/ggml-org/llama.cpp/issues/18948
- https://github.com/ggml-org/llama.cpp/issues/18944
Temporary quick fix is to disable flash attention. --flash-attn off
PussyHut
changed discussion title from
High CPU usage when set `--flash-attn on/-fa on`
to High CPU Usage / Slow Context Processing
For save your time:
If you have encounter high CPU usage/slow context processing problem, this is unrelated to quantification, it's llama.cpp issue:
- https://github.com/ggml-org/llama.cpp/issues/18948
- https://github.com/ggml-org/llama.cpp/issues/18944
Temporary quick fix is to disable flash attention.
--flash-attn off
Thank you very helpful we shall put it in our guide if anyone experiences this!