High CPU Usage / Slow Context Processing

#5
by PussyHut - opened

For save your time:

If you have encounter high CPU usage/slow context processing problem, this is unrelated to quantification, it's llama.cpp issue:

Temporary quick fix is to disable flash attention. --flash-attn off

PussyHut changed discussion title from High CPU usage when set `--flash-attn on/-fa on` to High CPU Usage / Slow Context Processing

For save your time:

If you have encounter high CPU usage/slow context processing problem, this is unrelated to quantification, it's llama.cpp issue:

Temporary quick fix is to disable flash attention. --flash-attn off

Thank you very helpful we shall put it in our guide if anyone experiences this!

Sign up or log in to comment