DMA Paging Error (0xc01e0200)
flm run gpt-oss:20b
[FLM] Configuring NPU Power Mode to performance (flm default)
[Warning] Version check timed out; continuing without update info.
[FLM] Loading model: C:\Users\johndoe\Documents\flm\models\GPT-OSS-20B-NPU2
Error: Failed to submit command to hw queue (0xc01e0200):
Even after the video memory manager split the DMA buffer, the video
memory manager could not page-in all of the required allocations into
video memory at the same time. The device is unable to continue.
Device: HX 370 with 32G RAM
NPU driver: 32.0.203.314
Please check this https://fastflowlm.com/docs/models/gpt-oss/#:~:text=Copy-,%F0%9F%93%9D%20NOTE,-Memory%20Requirements%0A%E2%9A%A0%EF%B8%8F%20Note%3A%20Running%20gpt%2Doss%3A20b
Also, are you using flm v0.9.23 now?
I have reviewed the official troubleshooting guide at https://fastflowlm.com/docs/instructions/cli/, which suggests that RAM shortage is a primary culprit. However, my system telemetry shows ~20GB of free physical RAM at the moment the flm run command initiates.
Please check "total mem" in Task Manager -> Perf. -> NPU
There is an internal cap on amount of mem that can be accessed by NPU.
Also, check the update of https://github.com/lemonade-sdk/lemonade/issues/688
Hope the cap can be lifted soon ...
I solved the problem by adjusting the amount of RAM dedicated to the iGPU in the BIOS.
I solved the problem by adjusting the amount of RAM dedicated to the iGPU in the BIOS.
Really, how did you do that? How much "total mem" do you have in in Task Manager -> Perf. -> NPU?
Do you mind sharing it? Many of us have a 32 GB sys and we can really benefit from it! TY!!
I got the same problem of dma paging error due to insufficient system ram, which in my case is 32G. I understand I can adjust the ram size to solve the problem. This is done by using Amd adrenlin tools(see underlying pic). My case is 32G ram+96G vram, for a combined total of 128G, which is typical of Strix Halo. I can reduce from 96G of vram to transfer the amount to system ram. However, I prefer not to do so as I want as much vram as possible reserved for larger models to run in vram.
One thing I would like to point is that this dma paging error only ocur when the prompt is relatively long. In my case, I set ctx-len to 16384 and the error would ocur when my prompt token is around 10k(response token should be around 1k ,so total context length under 16k setup). Shorter prompt would not trigger a dma error. Either with long or short prompts, the flm server launch is successful with ctx-len of 16384.
For gpt-oss:20b, I would love to see a smaller sized model from current 14.4G, which has been achieved at around 13.4G with recent AMD ryzenAi 1.7 onnx npu model release. This 1G difference is marginally extremely valuable for longer context inference which gives my npu a stronger edge serving normal llm requests while keeping the potential of Gpu serving more challenging tasks such as coding.
The NPU shares memory with the operating system. If too much system memory is allocated exclusively to the iGPU, the NPU will lack sufficient memory for GPT-OSS model inference. In my case, flm successfully launched GPT-OSS model inference when there were 24GB of free system memory.
