pytorch to onnx conversion code and opset

by akelch-pc - opened Oct 21, 2025

Oct 21, 2025

Hi, I am looking to use the onnx file and convert to TensorRT engine for use in Nvidia Triton. when building into a FP16 tensorrt engine, I get the following warning message:

[TRT] [W] Detected layernorm nodes in FP16.
[10/21/2025-21:15:21] [TRT] [W] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.

What opset did you do the onnx conversion in? Do you have any pytorch to onnx conversion code for this model, where the opset number can be specified?

Thanks

zhiqing

Owner Oct 22, 2025

•

edited Oct 22, 2025

I set opset=17.

akelch-pc

Oct 22, 2025

Thanks! also, for the onnx export what attention mechanism was used? I attempted with flash_attention_ 2 myself but ran into errors so curious if this that or w/ SDPA

zhiqing

Owner Oct 22, 2025

This comment has been hidden (marked as Resolved)

zhiqing changed discussion status to closed Dec 22, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment