Decode Context Parallel not working
#11
by
pratiknarola
- opened
Heyyy
When I am trying to setup the model with FP8 on 8x H200 GPUs.
When trying with -dcp / --decode-context-parallel , I am getting
7|glm-5 | 2026-02-12 15:21:55 +05:30: (EngineCore_DP0 pid=17530) ERROR 02-12 15:21:55 [core.py:1006] RuntimeError:
Worker failed with error 'DCP requires attention impls to return the softmax lse for decode,
but the impl FlashMLASparseImpl does not return the softmax lse for decode.',
please check the stack trace above for the root cause