Is there a benefit of this version vs the original MXFP4?

#5
by SuperbEmphasis - opened

I'm currently running gpt-oss-120b using 2xH100 gpus via vllm.

But is there a benefit of using this version? Im wondering if using FP8 with the H100 would have a faster response since the H100 can utilize the FP8 cores at the cost of increased VRAM usage?

Precision loss wont be to noticeable, but there will be some minor differences between FP8 and FP4

Sign up or log in to comment