pytorch to onnx conversion code and opset
Hi, I am looking to use the onnx file and convert to TensorRT engine for use in Nvidia Triton. when building into a FP16 tensorrt engine, I get the following warning message:
[TRT] [W] Detected layernorm nodes in FP16.
[10/21/2025-21:15:21] [TRT] [W] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.
What opset did you do the onnx conversion in? Do you have any pytorch to onnx conversion code for this model, where the opset number can be specified?
Thanks
I set opset=17.
Thanks! also, for the onnx export what attention mechanism was used? I attempted with flash_attention_ 2 myself but ran into errors so curious if this that or w/ SDPA