NVFP4 quant via modelopt 0.43.0rc3 + TensorRT-Model-Optimizer examples, recipe general/ptq/nvfp4_experts_only-fp8_kv (calib_size=1024, moe_calib_experts_ratio=1.0, cnn_dailymail + Nemotron-Post-Training-Dataset-v2, batch_size=64->32 auto-fallback, source=FP8 M2.7, FP8->bf16 on-forward dequant, 4x H100 SXM)
6c665e7 verified | { | |
| "bos_token_id": 200019, | |
| "do_sample": true, | |
| "eos_token_id": 200020, | |
| "temperature": 1.0, | |
| "top_p": 0.95, | |
| "top_k": 40, | |
| "transformers_version": "4.46.1" | |
| } | |