Two Questions about nvfp4 testing.
I have been testing NVFP4 quantization for a few months now. With this I find that only during training do the last block/first block kept at high precision matters.
My first question, has this model been fully quantized to its true possible ability and tested internally to find out unstable output or is this something that has not been tested as last blocks such as model.diffusion_model.transformer_blocks.47 have been kept at high precision?
Also to build off of that because this is a video model has there been tests with maintaining some layers at fp32 instead of bf16 for the NVFP4 mix as other video models benefit from this for NVFP4 such as wan2.2? This would make it possible to maybe overall deceases the size and maintain the stable output if this has been tested with last/first blocks quantized and found unstable output.
I look forward to hear back about this! Thank you!