polygraphy_bug / error.log
Yuekai Zhang
add log
d35729b
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[V] Model: ./encoder.onnx
[I] RUNNING | Command: /usr/local/bin/polygraphy run ./encoder.onnx --fp16 --onnxrt --trt --atol 1e-3 --rtol 1e-3 --pool-limit workspace:1000000000 --save-engine=./encoder1_fp16.plan --verbose --onnx-outputs mark all --trt-outputs mark all --trt-min-shapes chunk_xs:[1,67,80] chunk_lens:[1] offset:[1,1] att_cache:[1,12,4,80,128] cnn_cache:[1,12,256,7] cache_mask:[1,1,80] --trt-opt-shapes chunk_xs:[16,67,80] chunk_lens:[16] offset:[16,1] att_cache:[16,12,4,80,128] cnn_cache:[16,12,256,7] cache_mask:[16,1,80] --trt-max-shapes chunk_xs:[32,67,80] chunk_lens:[32] offset:[32,1] att_cache:[32,12,4,80,128] cnn_cache:[32,12,256,7] cache_mask:[32,1,80] --input-shapes chunk_xs:[16,67,80] chunk_lens:[16] offset:[16,1] att_cache:[16,12,4,80,128] cnn_cache:[16,12,256,7] cache_mask:[16,1,80] --validate
[V] Loaded Module: polygraphy | Version: 0.46.2 | Path: ['/usr/local/lib/python3.8/dist-packages/polygraphy']
[V] Loaded extension modules: []
[V] Loaded Module: tensorrt | Version: 8.6.0 | Path: ['/usr/local/lib/python3.8/dist-packages/tensorrt']
[I] Will generate inference input data according to provided TensorMetadata: {chunk_xs [shape=(16, 67, 80)],
chunk_lens [shape=(16,)],
offset [shape=(16, 1)],
att_cache [shape=(16, 12, 4, 80, 128)],
cnn_cache [shape=(16, 12, 256, 7)],
cache_mask [shape=(16, 1, 80)]}
[I] onnxrt-runner-N0-04/18/23-11:19:47 | Activating and starting inference
[I] Loading model: /mnt/samsung-t7/yuekai/benchmark/triton_speech_recognition_benchmark_new/model_repo_stateful_conformer_aishell2_wenet/encoder/1/encoder.onnx
[V] Loaded Module: onnx | Version: 1.13.1 | Path: ['/usr/local/lib/python3.8/dist-packages/onnx']
[V] Marking all ONNX tensors as outputs
[V] Loaded Module: onnxruntime | Version: 1.13.1 | Path: ['/usr/local/lib/python3.8/dist-packages/onnxruntime']
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[V] Loaded Module: numpy | Version: 1.23.5 | Path: ['/usr/local/lib/python3.8/dist-packages/numpy']
[V] Loading inputs from data loader
[V] Generating data using numpy seed: 1
[V] Input tensor: chunk_xs | Generating input data in range: [0.0, 1.0]
[V] Input tensor: chunk_lens | Generating input data in range: [0, 1]
[V] Input tensor: offset | Generating input data in range: [0, 1]
[V] Input tensor: att_cache | Generating input data in range: [0.0, 1.0]
[V] Input tensor: cnn_cache | Generating input data in range: [0.0, 1.0]
[V] Input tensor: cache_mask | Generating input data in range: [0.0, 1.0]
[I] onnxrt-runner-N0-04/18/23-11:19:47
---- Inference Input(s) ----
{chunk_xs [dtype=float16, shape=(16, 67, 80)],
chunk_lens [dtype=int32, shape=(16,)],
offset [dtype=int64, shape=(16, 1)],
att_cache [dtype=float16, shape=(16, 12, 4, 80, 128)],
cnn_cache [dtype=float16, shape=(16, 12, 256, 7)],
cache_mask [dtype=float16, shape=(16, 1, 80)]}
[V] onnxrt-runner-N0-04/18/23-11:19:47 | Input metadata is: {chunk_xs [dtype=float16, shape=('B', 67, 80)],
chunk_lens [dtype=int32, shape=('B',)],
offset [dtype=int64, shape=('B', 1)],
att_cache [dtype=float16, shape=('B', 12, 4, 80, 128)],
cnn_cache [dtype=float16, shape=('B', 12, 256, 7)],
cache_mask [dtype=float16, shape=('B', 1, 80)]}
[I] onnxrt-runner-N0-04/18/23-11:19:47
---- Inference Output(s) ----
{offset.1 [dtype=int64, shape=(16,)],
onnx::Gather_464 [dtype=int64, shape=(3,)],
onnx::Cast_466 [dtype=int64, shape=()],
onnx::Gather_467 [dtype=int64, shape=(1,)],
onnx::Unsqueeze_469 [dtype=int64, shape=()],
onnx::Range_473 [dtype=int64, shape=()],
onnx::Unsqueeze_475 [dtype=int64, shape=(67,)],
onnx::Expand_477 [dtype=int64, shape=(1, 67)],
onnx::Concat_479 [dtype=int64, shape=(1,)],
onnx::Concat_481 [dtype=int64, shape=(1,)],
onnx::Reshape_482 [dtype=int64, shape=(2,)],
onnx::Shape_484 [dtype=int64, shape=(2,)],
onnx::ConstantOfShape_485 [dtype=int64, shape=(1,)],
onnx::Mul_486 [dtype=int64, shape=(2,)],
onnx::Equal_488 [dtype=int64, shape=(2,)],
onnx::Where_489 [dtype=bool, shape=(2,)],
onnx::Expand_490 [dtype=int64, shape=(2,)],
onnx::GreaterOrEqual_491 [dtype=int64, shape=(16, 67)],
onnx::Cast_493 [dtype=int32, shape=(16, 1)],
onnx::GreaterOrEqual_494 [dtype=int64, shape=(16, 1)],
onnx::Unsqueeze_495 [dtype=bool, shape=(16, 67)],
onnx::Not_497 [dtype=bool, shape=(16, 1, 67)],
onnx::Cast_498 [dtype=bool, shape=(16, 1, 67)],
onnx::Slice_499 [dtype=float16, shape=(16, 1, 67)],
onnx::Shape_500 [dtype=float16, shape=(12, 16, 4, 80, 128)],
onnx::Gather_501 [dtype=float16, shape=(12, 16, 256, 7)],
onnx::Mul_502 [dtype=float16, shape=(16, 67, 80)],
onnx::Unsqueeze_503 [dtype=float16, shape=(16, 67, 80)],
input [dtype=float16, shape=(16, 1, 67, 80)],
input.3 [dtype=float16, shape=(16, 256, 33, 39)],
input.7 [dtype=float16, shape=(16, 256, 33, 39)],
input.11 [dtype=float16, shape=(16, 256, 16, 19)],
onnx::Shape_509 [dtype=float16, shape=(16, 256, 16, 19)],
onnx::Gather_510 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_512 [dtype=int64, shape=()],
onnx::Gather_513 [dtype=int64, shape=(4,)],
onnx::Mul_515 [dtype=int64, shape=()],
onnx::Gather_516 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_518 [dtype=int64, shape=()],
onnx::Gather_519 [dtype=int64, shape=(4,)],
onnx::Mul_521 [dtype=int64, shape=()],
onnx::Reshape_522 [dtype=float16, shape=(16, 16, 256, 19)],
onnx::Unsqueeze_523 [dtype=int64, shape=()],
onnx::Concat_525 [dtype=int64, shape=(1,)],
onnx::Concat_527 [dtype=int64, shape=(1,)],
onnx::Concat_529 [dtype=int64, shape=(1,)],
onnx::Reshape_530 [dtype=int64, shape=(3,)],
onnx::MatMul_531 [dtype=float16, shape=(16, 16, 4864)],
onnx::Add_533 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_534 [dtype=float16, shape=(16, 16, 256)],
input.15 [dtype=float16, shape=(16, 16, 256)],
onnx::Slice_541 [dtype=float16, shape=(16, 1, 33)],
onnx::Concat_546 [dtype=float16, shape=(16, 1, 16)],
onnx::Gather_547 [dtype=int64, shape=(5,)],
onnx::Sub_549 [dtype=int64, shape=()],
mask [dtype=float16, shape=(16, 1, 96)],
offset.3 [dtype=int64, shape=(16,)],
onnx::Gather_552 [dtype=int64, shape=(3,)],
onnx::Add_554 [dtype=int64, shape=()],
size [dtype=int64, shape=()],
onnx::Add_557 [dtype=int64, shape=(16, 1)],
onnx::Range_561 [dtype=int64, shape=()],
onnx::Cast_563 [dtype=int64, shape=(96,)],
onnx::Add_564 [dtype=int64, shape=(96,)],
onnx::Greater_565 [dtype=int64, shape=(16, 96)],
onnx::Cast_567 [dtype=bool, shape=(16, 96)],
onnx::Mul_568 [dtype=int64, shape=(16, 96)],
index [dtype=int64, shape=(16, 96)],
input.19 [dtype=float16, shape=(16, 96, 256)],
onnx::Shape_572 [dtype=float16, shape=(16, 96, 256)],
r_cache_mask [dtype=float16, shape=(16, 1, 80)],
tensor [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_581 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_582 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_583 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_585 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_586 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_588 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_589 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_590 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_591 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_592 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_594 [dtype=float16, shape=(16, 16, 2048)],
input.23 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_596 [dtype=float16, shape=(16, 16, 2048)],
input.27 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_599 [dtype=float16, shape=(16, 16, 256)],
input.31 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_602 [dtype=float16, shape=(16, 16, 256)],
input.35 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_604 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_605 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_607 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_608 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_610 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_611 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_612 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_613 [dtype=float16, shape=(16, 16, 256)],
query [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_615 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_617 [dtype=int64, shape=()],
onnx::Add_619 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_620 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_625 [dtype=int64, shape=(1,)],
onnx::Reshape_632 [dtype=int64, shape=(4,)],
onnx::Add_633 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_635 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_636 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_641 [dtype=int64, shape=(1,)],
onnx::Reshape_648 [dtype=int64, shape=(4,)],
onnx::Transpose_649 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_651 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_652 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_657 [dtype=int64, shape=(1,)],
onnx::Reshape_664 [dtype=int64, shape=(4,)],
onnx::Transpose_665 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_666 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_667 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_669 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_670 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_671 [dtype=float16, shape=(16, 4, 96, 64)],
v [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_673 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_674 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_676 [dtype=int64, shape=()],
onnx::Reshape_678 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_683 [dtype=int64, shape=(1,)],
onnx::Reshape_690 [dtype=int64, shape=(4,)],
onnx::Transpose_691 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_692 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_693 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_694 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_695 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_696 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_697 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_698 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_699 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_700 [dtype=float16, shape=(16, 4, 16, 96)],
scores [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_703 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_705 [dtype=int64, shape=()],
onnx::Equal_707 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_709 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_710 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_712 [dtype=int64, shape=()],
onnx::Slice_718 [dtype=int64, shape=(1,)],
onnx::Cast_722 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_723 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_725 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_726 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_727 [dtype=bool, shape=(16, 1, 1, 96)],
input.39 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_730 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_731 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_735 [dtype=int64, shape=(1,)],
onnx::Reshape_740 [dtype=int64, shape=(3,)],
onnx::MatMul_741 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_743 [dtype=float16, shape=(16, 16, 256)],
input.43 [dtype=float16, shape=(16, 16, 256)],
input.47 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_746 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_747 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_749 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_750 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_752 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_753 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_754 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_755 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_756 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_757 [dtype=float16, shape=(16, 256, 16)],
input.51 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_763 [dtype=float16, shape=(16, 256, 7)],
x [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_765 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_766 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_767 [dtype=float16, shape=(16, 256, 23)],
input.55 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_769 [dtype=float16, shape=(16, 256, 16)],
input.59 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_771 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_772 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_774 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_775 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_777 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_778 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_779 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_780 [dtype=float16, shape=(16, 16, 256)],
input.63 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_782 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_783 [dtype=float16, shape=(16, 16, 256)],
input.67 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_785 [dtype=float16, shape=(16, 256, 16)],
input.71 [dtype=float16, shape=(16, 16, 256)],
input.75 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_788 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_789 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_791 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_792 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_794 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_795 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_796 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_797 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_798 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_800 [dtype=float16, shape=(16, 16, 2048)],
input.79 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_802 [dtype=float16, shape=(16, 16, 2048)],
input.83 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_805 [dtype=float16, shape=(16, 16, 256)],
input.87 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_808 [dtype=float16, shape=(16, 16, 256)],
input.91 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_810 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_811 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_813 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_814 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_816 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_817 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_818 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_819 [dtype=float16, shape=(16, 16, 256)],
input.95 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_825 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_827 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_829 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.3 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_833 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_834 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_835 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_837 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_838 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_840 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_841 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_842 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_843 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_844 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_846 [dtype=float16, shape=(16, 16, 2048)],
input.99 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_848 [dtype=float16, shape=(16, 16, 2048)],
input.103 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_851 [dtype=float16, shape=(16, 16, 256)],
input.107 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_854 [dtype=float16, shape=(16, 16, 256)],
input.111 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_856 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_857 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_859 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_860 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_862 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_863 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_864 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_865 [dtype=float16, shape=(16, 16, 256)],
query.3 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_867 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_869 [dtype=int64, shape=()],
onnx::Add_871 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_872 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_877 [dtype=int64, shape=(1,)],
onnx::Reshape_884 [dtype=int64, shape=(4,)],
onnx::Add_885 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_887 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_888 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_893 [dtype=int64, shape=(1,)],
onnx::Reshape_900 [dtype=int64, shape=(4,)],
onnx::Transpose_901 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_903 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_904 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_909 [dtype=int64, shape=(1,)],
onnx::Reshape_916 [dtype=int64, shape=(4,)],
onnx::Transpose_917 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_918 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_919 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_921 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_922 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_923 [dtype=float16, shape=(16, 4, 96, 64)],
v.3 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_925 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_926 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_928 [dtype=int64, shape=()],
onnx::Reshape_930 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_935 [dtype=int64, shape=(1,)],
onnx::Reshape_942 [dtype=int64, shape=(4,)],
onnx::Transpose_943 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_944 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_945 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_946 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_947 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_948 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_949 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_950 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_951 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_952 [dtype=float16, shape=(16, 4, 16, 96)],
scores.3 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_955 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_957 [dtype=int64, shape=()],
onnx::Equal_959 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_961 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_962 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_964 [dtype=int64, shape=()],
onnx::Slice_970 [dtype=int64, shape=(1,)],
onnx::Cast_974 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_975 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_977 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_978 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_979 [dtype=bool, shape=(16, 1, 1, 96)],
input.115 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_982 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_983 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_987 [dtype=int64, shape=(1,)],
onnx::Reshape_992 [dtype=int64, shape=(3,)],
onnx::MatMul_993 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_995 [dtype=float16, shape=(16, 16, 256)],
input.119 [dtype=float16, shape=(16, 16, 256)],
input.123 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_998 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_999 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1001 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1002 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1004 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1005 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1006 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1007 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_1008 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1009 [dtype=float16, shape=(16, 256, 16)],
input.127 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_1015 [dtype=float16, shape=(16, 256, 7)],
x.3 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_1017 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_1018 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_1019 [dtype=float16, shape=(16, 256, 23)],
input.131 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_1021 [dtype=float16, shape=(16, 256, 16)],
input.135 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1023 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1024 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1026 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1027 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1029 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1030 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1031 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1032 [dtype=float16, shape=(16, 16, 256)],
input.139 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_1034 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_1035 [dtype=float16, shape=(16, 16, 256)],
input.143 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_1037 [dtype=float16, shape=(16, 256, 16)],
input.147 [dtype=float16, shape=(16, 16, 256)],
input.151 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1040 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1041 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1043 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1044 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1046 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1047 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1048 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1049 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_1050 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1052 [dtype=float16, shape=(16, 16, 2048)],
input.155 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_1054 [dtype=float16, shape=(16, 16, 2048)],
input.159 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_1057 [dtype=float16, shape=(16, 16, 256)],
input.163 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1060 [dtype=float16, shape=(16, 16, 256)],
input.167 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1062 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1063 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1065 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1066 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1068 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1069 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1070 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1071 [dtype=float16, shape=(16, 16, 256)],
input.171 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_1077 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_1079 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_1081 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.7 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_1085 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_1086 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1087 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1089 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1090 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1092 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1093 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1094 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1095 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_1096 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1098 [dtype=float16, shape=(16, 16, 2048)],
input.175 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_1100 [dtype=float16, shape=(16, 16, 2048)],
input.179 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_1103 [dtype=float16, shape=(16, 16, 256)],
input.183 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1106 [dtype=float16, shape=(16, 16, 256)],
input.187 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1108 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1109 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1111 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1112 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1114 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1115 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1116 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1117 [dtype=float16, shape=(16, 16, 256)],
query.7 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_1119 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_1121 [dtype=int64, shape=()],
onnx::Add_1123 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1124 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1129 [dtype=int64, shape=(1,)],
onnx::Reshape_1136 [dtype=int64, shape=(4,)],
onnx::Add_1137 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_1139 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1140 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1145 [dtype=int64, shape=(1,)],
onnx::Reshape_1152 [dtype=int64, shape=(4,)],
onnx::Transpose_1153 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_1155 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1156 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1161 [dtype=int64, shape=(1,)],
onnx::Reshape_1168 [dtype=int64, shape=(4,)],
onnx::Transpose_1169 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_1170 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_1171 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_1173 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_1174 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_1175 [dtype=float16, shape=(16, 4, 96, 64)],
v.7 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_1177 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_1178 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_1180 [dtype=int64, shape=()],
onnx::Reshape_1182 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_1187 [dtype=int64, shape=(1,)],
onnx::Reshape_1194 [dtype=int64, shape=(4,)],
onnx::Transpose_1195 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_1196 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_1197 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_1198 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_1199 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_1200 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_1201 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_1202 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_1203 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_1204 [dtype=float16, shape=(16, 4, 16, 96)],
scores.7 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_1207 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_1209 [dtype=int64, shape=()],
onnx::Equal_1211 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_1213 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_1214 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_1216 [dtype=int64, shape=()],
onnx::Slice_1222 [dtype=int64, shape=(1,)],
onnx::Cast_1226 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_1227 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_1229 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_1230 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_1231 [dtype=bool, shape=(16, 1, 1, 96)],
input.191 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_1234 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_1235 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_1239 [dtype=int64, shape=(1,)],
onnx::Reshape_1244 [dtype=int64, shape=(3,)],
onnx::MatMul_1245 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1247 [dtype=float16, shape=(16, 16, 256)],
input.195 [dtype=float16, shape=(16, 16, 256)],
input.199 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1250 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1251 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1253 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1254 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1256 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1257 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1258 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1259 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_1260 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1261 [dtype=float16, shape=(16, 256, 16)],
input.203 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_1267 [dtype=float16, shape=(16, 256, 7)],
x.7 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_1269 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_1270 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_1271 [dtype=float16, shape=(16, 256, 23)],
input.207 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_1273 [dtype=float16, shape=(16, 256, 16)],
input.211 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1275 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1276 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1278 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1279 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1281 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1282 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1283 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1284 [dtype=float16, shape=(16, 16, 256)],
input.215 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_1286 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_1287 [dtype=float16, shape=(16, 16, 256)],
input.219 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_1289 [dtype=float16, shape=(16, 256, 16)],
input.223 [dtype=float16, shape=(16, 16, 256)],
input.227 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1292 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1293 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1295 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1296 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1298 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1299 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1300 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1301 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_1302 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1304 [dtype=float16, shape=(16, 16, 2048)],
input.231 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_1306 [dtype=float16, shape=(16, 16, 2048)],
input.235 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_1309 [dtype=float16, shape=(16, 16, 256)],
input.239 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1312 [dtype=float16, shape=(16, 16, 256)],
input.243 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1314 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1315 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1317 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1318 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1320 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1321 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1322 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1323 [dtype=float16, shape=(16, 16, 256)],
input.247 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_1329 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_1331 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_1333 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.11 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_1337 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_1338 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1339 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1341 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1342 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1344 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1345 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1346 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1347 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_1348 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1350 [dtype=float16, shape=(16, 16, 2048)],
input.251 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_1352 [dtype=float16, shape=(16, 16, 2048)],
input.255 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_1355 [dtype=float16, shape=(16, 16, 256)],
input.259 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1358 [dtype=float16, shape=(16, 16, 256)],
input.263 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1360 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1361 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1363 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1364 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1366 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1367 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1368 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1369 [dtype=float16, shape=(16, 16, 256)],
query.11 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_1371 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_1373 [dtype=int64, shape=()],
onnx::Add_1375 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1376 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1381 [dtype=int64, shape=(1,)],
onnx::Reshape_1388 [dtype=int64, shape=(4,)],
onnx::Add_1389 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_1391 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1392 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1397 [dtype=int64, shape=(1,)],
onnx::Reshape_1404 [dtype=int64, shape=(4,)],
onnx::Transpose_1405 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_1407 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1408 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1413 [dtype=int64, shape=(1,)],
onnx::Reshape_1420 [dtype=int64, shape=(4,)],
onnx::Transpose_1421 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_1422 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_1423 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_1425 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_1426 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_1427 [dtype=float16, shape=(16, 4, 96, 64)],
v.11 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_1429 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_1430 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_1432 [dtype=int64, shape=()],
onnx::Reshape_1434 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_1439 [dtype=int64, shape=(1,)],
onnx::Reshape_1446 [dtype=int64, shape=(4,)],
onnx::Transpose_1447 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_1448 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_1449 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_1450 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_1451 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_1452 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_1453 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_1454 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_1455 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_1456 [dtype=float16, shape=(16, 4, 16, 96)],
scores.11 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_1459 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_1461 [dtype=int64, shape=()],
onnx::Equal_1463 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_1465 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_1466 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_1468 [dtype=int64, shape=()],
onnx::Slice_1474 [dtype=int64, shape=(1,)],
onnx::Cast_1478 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_1479 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_1481 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_1482 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_1483 [dtype=bool, shape=(16, 1, 1, 96)],
input.267 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_1486 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_1487 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_1491 [dtype=int64, shape=(1,)],
onnx::Reshape_1496 [dtype=int64, shape=(3,)],
onnx::MatMul_1497 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1499 [dtype=float16, shape=(16, 16, 256)],
input.271 [dtype=float16, shape=(16, 16, 256)],
input.275 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1502 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1503 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1505 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1506 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1508 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1509 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1510 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1511 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_1512 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1513 [dtype=float16, shape=(16, 256, 16)],
input.279 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_1519 [dtype=float16, shape=(16, 256, 7)],
x.11 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_1521 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_1522 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_1523 [dtype=float16, shape=(16, 256, 23)],
input.283 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_1525 [dtype=float16, shape=(16, 256, 16)],
input.287 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1527 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1528 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1530 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1531 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1533 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1534 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1535 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1536 [dtype=float16, shape=(16, 16, 256)],
input.291 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_1538 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_1539 [dtype=float16, shape=(16, 16, 256)],
input.295 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_1541 [dtype=float16, shape=(16, 256, 16)],
input.299 [dtype=float16, shape=(16, 16, 256)],
input.303 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1544 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1545 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1547 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1548 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1550 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1551 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1552 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1553 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_1554 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1556 [dtype=float16, shape=(16, 16, 2048)],
input.307 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_1558 [dtype=float16, shape=(16, 16, 2048)],
input.311 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_1561 [dtype=float16, shape=(16, 16, 256)],
input.315 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1564 [dtype=float16, shape=(16, 16, 256)],
input.319 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1566 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1567 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1569 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1570 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1572 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1573 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1574 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1575 [dtype=float16, shape=(16, 16, 256)],
input.323 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_1581 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_1583 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_1585 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.15 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_1589 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_1590 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1591 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1593 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1594 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1596 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1597 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1598 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1599 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_1600 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1602 [dtype=float16, shape=(16, 16, 2048)],
input.327 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_1604 [dtype=float16, shape=(16, 16, 2048)],
input.331 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_1607 [dtype=float16, shape=(16, 16, 256)],
input.335 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1610 [dtype=float16, shape=(16, 16, 256)],
input.339 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1612 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1613 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1615 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1616 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1618 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1619 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1620 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1621 [dtype=float16, shape=(16, 16, 256)],
query.15 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_1623 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_1625 [dtype=int64, shape=()],
onnx::Add_1627 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1628 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1633 [dtype=int64, shape=(1,)],
onnx::Reshape_1640 [dtype=int64, shape=(4,)],
onnx::Add_1641 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_1643 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1644 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1649 [dtype=int64, shape=(1,)],
onnx::Reshape_1656 [dtype=int64, shape=(4,)],
onnx::Transpose_1657 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_1659 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1660 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1665 [dtype=int64, shape=(1,)],
onnx::Reshape_1672 [dtype=int64, shape=(4,)],
onnx::Transpose_1673 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_1674 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_1675 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_1677 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_1678 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_1679 [dtype=float16, shape=(16, 4, 96, 64)],
v.15 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_1681 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_1682 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_1684 [dtype=int64, shape=()],
onnx::Reshape_1686 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_1691 [dtype=int64, shape=(1,)],
onnx::Reshape_1698 [dtype=int64, shape=(4,)],
onnx::Transpose_1699 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_1700 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_1701 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_1702 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_1703 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_1704 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_1705 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_1706 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_1707 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_1708 [dtype=float16, shape=(16, 4, 16, 96)],
scores.15 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_1711 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_1713 [dtype=int64, shape=()],
onnx::Equal_1715 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_1717 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_1718 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_1720 [dtype=int64, shape=()],
onnx::Slice_1726 [dtype=int64, shape=(1,)],
onnx::Cast_1730 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_1731 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_1733 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_1734 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_1735 [dtype=bool, shape=(16, 1, 1, 96)],
input.343 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_1738 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_1739 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_1743 [dtype=int64, shape=(1,)],
onnx::Reshape_1748 [dtype=int64, shape=(3,)],
onnx::MatMul_1749 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1751 [dtype=float16, shape=(16, 16, 256)],
input.347 [dtype=float16, shape=(16, 16, 256)],
input.351 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1754 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1755 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1757 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1758 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1760 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1761 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1762 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1763 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_1764 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1765 [dtype=float16, shape=(16, 256, 16)],
input.355 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_1771 [dtype=float16, shape=(16, 256, 7)],
x.15 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_1773 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_1774 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_1775 [dtype=float16, shape=(16, 256, 23)],
input.359 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_1777 [dtype=float16, shape=(16, 256, 16)],
input.363 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1779 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1780 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1782 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1783 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1785 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1786 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1787 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1788 [dtype=float16, shape=(16, 16, 256)],
input.367 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_1790 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_1791 [dtype=float16, shape=(16, 16, 256)],
input.371 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_1793 [dtype=float16, shape=(16, 256, 16)],
input.375 [dtype=float16, shape=(16, 16, 256)],
input.379 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1796 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1797 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1799 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1800 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1802 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1803 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1804 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1805 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_1806 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1808 [dtype=float16, shape=(16, 16, 2048)],
input.383 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_1810 [dtype=float16, shape=(16, 16, 2048)],
input.387 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_1813 [dtype=float16, shape=(16, 16, 256)],
input.391 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1816 [dtype=float16, shape=(16, 16, 256)],
input.395 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1818 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1819 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1821 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1822 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1824 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1825 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1826 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1827 [dtype=float16, shape=(16, 16, 256)],
input.399 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_1833 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_1835 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_1837 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.19 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_1841 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_1842 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1843 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1845 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1846 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1848 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1849 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1850 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1851 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_1852 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1854 [dtype=float16, shape=(16, 16, 2048)],
input.403 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_1856 [dtype=float16, shape=(16, 16, 2048)],
input.407 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_1859 [dtype=float16, shape=(16, 16, 256)],
input.411 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1862 [dtype=float16, shape=(16, 16, 256)],
input.415 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_1864 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_1865 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_1867 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1868 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_1870 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_1871 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_1872 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_1873 [dtype=float16, shape=(16, 16, 256)],
query.19 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_1875 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_1877 [dtype=int64, shape=()],
onnx::Add_1879 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1880 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1885 [dtype=int64, shape=(1,)],
onnx::Reshape_1892 [dtype=int64, shape=(4,)],
onnx::Add_1893 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_1895 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1896 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1901 [dtype=int64, shape=(1,)],
onnx::Reshape_1908 [dtype=int64, shape=(4,)],
onnx::Transpose_1909 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_1911 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_1912 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_1917 [dtype=int64, shape=(1,)],
onnx::Reshape_1924 [dtype=int64, shape=(4,)],
onnx::Transpose_1925 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_1926 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_1927 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_1929 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_1930 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_1931 [dtype=float16, shape=(16, 4, 96, 64)],
v.19 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_1933 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_1934 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_1936 [dtype=int64, shape=()],
onnx::Reshape_1938 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_1943 [dtype=int64, shape=(1,)],
onnx::Reshape_1950 [dtype=int64, shape=(4,)],
onnx::Transpose_1951 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_1952 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_1953 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_1954 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_1955 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_1956 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_1957 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_1958 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_1959 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_1960 [dtype=float16, shape=(16, 4, 16, 96)],
scores.19 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_1963 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_1965 [dtype=int64, shape=()],
onnx::Equal_1967 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_1969 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_1970 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_1972 [dtype=int64, shape=()],
onnx::Slice_1978 [dtype=int64, shape=(1,)],
onnx::Cast_1982 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_1983 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_1985 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_1986 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_1987 [dtype=bool, shape=(16, 1, 1, 96)],
input.419 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_1990 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_1991 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_1995 [dtype=int64, shape=(1,)],
onnx::Reshape_2000 [dtype=int64, shape=(3,)],
onnx::MatMul_2001 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2003 [dtype=float16, shape=(16, 16, 256)],
input.423 [dtype=float16, shape=(16, 16, 256)],
input.427 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2006 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2007 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2009 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2010 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2012 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2013 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2014 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2015 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_2016 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2017 [dtype=float16, shape=(16, 256, 16)],
input.431 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_2023 [dtype=float16, shape=(16, 256, 7)],
x.19 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_2025 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_2026 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_2027 [dtype=float16, shape=(16, 256, 23)],
input.435 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_2029 [dtype=float16, shape=(16, 256, 16)],
input.439 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2031 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2032 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2034 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2035 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2037 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2038 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2039 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2040 [dtype=float16, shape=(16, 16, 256)],
input.443 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_2042 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_2043 [dtype=float16, shape=(16, 16, 256)],
input.447 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_2045 [dtype=float16, shape=(16, 256, 16)],
input.451 [dtype=float16, shape=(16, 16, 256)],
input.455 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2048 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2049 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2051 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2052 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2054 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2055 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2056 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2057 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_2058 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2060 [dtype=float16, shape=(16, 16, 2048)],
input.459 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_2062 [dtype=float16, shape=(16, 16, 2048)],
input.463 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_2065 [dtype=float16, shape=(16, 16, 256)],
input.467 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2068 [dtype=float16, shape=(16, 16, 256)],
input.471 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2070 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2071 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2073 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2074 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2076 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2077 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2078 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2079 [dtype=float16, shape=(16, 16, 256)],
input.475 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_2085 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_2087 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_2089 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.23 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_2093 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_2094 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2095 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2097 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2098 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2100 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2101 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2102 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2103 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_2104 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2106 [dtype=float16, shape=(16, 16, 2048)],
input.479 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_2108 [dtype=float16, shape=(16, 16, 2048)],
input.483 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_2111 [dtype=float16, shape=(16, 16, 256)],
input.487 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2114 [dtype=float16, shape=(16, 16, 256)],
input.491 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2116 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2117 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2119 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2120 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2122 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2123 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2124 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2125 [dtype=float16, shape=(16, 16, 256)],
query.23 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_2127 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_2129 [dtype=int64, shape=()],
onnx::Add_2131 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2132 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2137 [dtype=int64, shape=(1,)],
onnx::Reshape_2144 [dtype=int64, shape=(4,)],
onnx::Add_2145 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_2147 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2148 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2153 [dtype=int64, shape=(1,)],
onnx::Reshape_2160 [dtype=int64, shape=(4,)],
onnx::Transpose_2161 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_2163 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2164 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2169 [dtype=int64, shape=(1,)],
onnx::Reshape_2176 [dtype=int64, shape=(4,)],
onnx::Transpose_2177 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_2178 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_2179 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_2181 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_2182 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_2183 [dtype=float16, shape=(16, 4, 96, 64)],
v.23 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_2185 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_2186 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_2188 [dtype=int64, shape=()],
onnx::Reshape_2190 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_2195 [dtype=int64, shape=(1,)],
onnx::Reshape_2202 [dtype=int64, shape=(4,)],
onnx::Transpose_2203 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_2204 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_2205 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_2206 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_2207 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_2208 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_2209 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_2210 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_2211 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_2212 [dtype=float16, shape=(16, 4, 16, 96)],
scores.23 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_2215 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_2217 [dtype=int64, shape=()],
onnx::Equal_2219 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_2221 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_2222 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_2224 [dtype=int64, shape=()],
onnx::Slice_2230 [dtype=int64, shape=(1,)],
onnx::Cast_2234 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_2235 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_2237 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_2238 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_2239 [dtype=bool, shape=(16, 1, 1, 96)],
input.495 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_2242 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_2243 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_2247 [dtype=int64, shape=(1,)],
onnx::Reshape_2252 [dtype=int64, shape=(3,)],
onnx::MatMul_2253 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2255 [dtype=float16, shape=(16, 16, 256)],
input.499 [dtype=float16, shape=(16, 16, 256)],
input.503 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2258 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2259 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2261 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2262 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2264 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2265 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2266 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2267 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_2268 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2269 [dtype=float16, shape=(16, 256, 16)],
input.507 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_2275 [dtype=float16, shape=(16, 256, 7)],
x.23 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_2277 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_2278 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_2279 [dtype=float16, shape=(16, 256, 23)],
input.511 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_2281 [dtype=float16, shape=(16, 256, 16)],
input.515 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2283 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2284 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2286 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2287 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2289 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2290 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2291 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2292 [dtype=float16, shape=(16, 16, 256)],
input.519 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_2294 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_2295 [dtype=float16, shape=(16, 16, 256)],
input.523 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_2297 [dtype=float16, shape=(16, 256, 16)],
input.527 [dtype=float16, shape=(16, 16, 256)],
input.531 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2300 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2301 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2303 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2304 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2306 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2307 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2308 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2309 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_2310 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2312 [dtype=float16, shape=(16, 16, 2048)],
input.535 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_2314 [dtype=float16, shape=(16, 16, 2048)],
input.539 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_2317 [dtype=float16, shape=(16, 16, 256)],
input.543 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2320 [dtype=float16, shape=(16, 16, 256)],
input.547 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2322 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2323 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2325 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2326 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2328 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2329 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2330 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2331 [dtype=float16, shape=(16, 16, 256)],
input.551 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_2337 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_2339 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_2341 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.27 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_2345 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_2346 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2347 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2349 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2350 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2352 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2353 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2354 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2355 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_2356 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2358 [dtype=float16, shape=(16, 16, 2048)],
input.555 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_2360 [dtype=float16, shape=(16, 16, 2048)],
input.559 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_2363 [dtype=float16, shape=(16, 16, 256)],
input.563 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2366 [dtype=float16, shape=(16, 16, 256)],
input.567 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2368 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2369 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2371 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2372 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2374 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2375 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2376 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2377 [dtype=float16, shape=(16, 16, 256)],
query.27 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_2379 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_2381 [dtype=int64, shape=()],
onnx::Add_2383 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2384 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2389 [dtype=int64, shape=(1,)],
onnx::Reshape_2396 [dtype=int64, shape=(4,)],
onnx::Add_2397 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_2399 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2400 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2405 [dtype=int64, shape=(1,)],
onnx::Reshape_2412 [dtype=int64, shape=(4,)],
onnx::Transpose_2413 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_2415 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2416 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2421 [dtype=int64, shape=(1,)],
onnx::Reshape_2428 [dtype=int64, shape=(4,)],
onnx::Transpose_2429 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_2430 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_2431 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_2433 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_2434 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_2435 [dtype=float16, shape=(16, 4, 96, 64)],
v.27 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_2437 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_2438 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_2440 [dtype=int64, shape=()],
onnx::Reshape_2442 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_2447 [dtype=int64, shape=(1,)],
onnx::Reshape_2454 [dtype=int64, shape=(4,)],
onnx::Transpose_2455 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_2456 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_2457 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_2458 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_2459 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_2460 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_2461 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_2462 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_2463 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_2464 [dtype=float16, shape=(16, 4, 16, 96)],
scores.27 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_2467 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_2469 [dtype=int64, shape=()],
onnx::Equal_2471 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_2473 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_2474 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_2476 [dtype=int64, shape=()],
onnx::Slice_2482 [dtype=int64, shape=(1,)],
onnx::Cast_2486 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_2487 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_2489 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_2490 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_2491 [dtype=bool, shape=(16, 1, 1, 96)],
input.571 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_2494 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_2495 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_2499 [dtype=int64, shape=(1,)],
onnx::Reshape_2504 [dtype=int64, shape=(3,)],
onnx::MatMul_2505 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2507 [dtype=float16, shape=(16, 16, 256)],
input.575 [dtype=float16, shape=(16, 16, 256)],
input.579 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2510 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2511 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2513 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2514 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2516 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2517 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2518 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2519 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_2520 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2521 [dtype=float16, shape=(16, 256, 16)],
input.583 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_2527 [dtype=float16, shape=(16, 256, 7)],
x.27 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_2529 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_2530 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_2531 [dtype=float16, shape=(16, 256, 23)],
input.587 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_2533 [dtype=float16, shape=(16, 256, 16)],
input.591 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2535 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2536 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2538 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2539 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2541 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2542 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2543 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2544 [dtype=float16, shape=(16, 16, 256)],
input.595 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_2546 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_2547 [dtype=float16, shape=(16, 16, 256)],
input.599 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_2549 [dtype=float16, shape=(16, 256, 16)],
input.603 [dtype=float16, shape=(16, 16, 256)],
input.607 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2552 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2553 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2555 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2556 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2558 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2559 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2560 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2561 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_2562 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2564 [dtype=float16, shape=(16, 16, 2048)],
input.611 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_2566 [dtype=float16, shape=(16, 16, 2048)],
input.615 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_2569 [dtype=float16, shape=(16, 16, 256)],
input.619 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2572 [dtype=float16, shape=(16, 16, 256)],
input.623 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2574 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2575 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2577 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2578 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2580 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2581 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2582 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2583 [dtype=float16, shape=(16, 16, 256)],
input.627 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_2589 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_2591 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_2593 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.31 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_2597 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_2598 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2599 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2601 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2602 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2604 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2605 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2606 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2607 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_2608 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2610 [dtype=float16, shape=(16, 16, 2048)],
input.631 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_2612 [dtype=float16, shape=(16, 16, 2048)],
input.635 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_2615 [dtype=float16, shape=(16, 16, 256)],
input.639 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2618 [dtype=float16, shape=(16, 16, 256)],
input.643 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2620 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2621 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2623 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2624 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2626 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2627 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2628 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2629 [dtype=float16, shape=(16, 16, 256)],
query.31 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_2631 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_2633 [dtype=int64, shape=()],
onnx::Add_2635 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2636 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2641 [dtype=int64, shape=(1,)],
onnx::Reshape_2648 [dtype=int64, shape=(4,)],
onnx::Add_2649 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_2651 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2652 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2657 [dtype=int64, shape=(1,)],
onnx::Reshape_2664 [dtype=int64, shape=(4,)],
onnx::Transpose_2665 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_2667 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2668 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2673 [dtype=int64, shape=(1,)],
onnx::Reshape_2680 [dtype=int64, shape=(4,)],
onnx::Transpose_2681 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_2682 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_2683 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_2685 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_2686 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_2687 [dtype=float16, shape=(16, 4, 96, 64)],
v.31 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_2689 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_2690 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_2692 [dtype=int64, shape=()],
onnx::Reshape_2694 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_2699 [dtype=int64, shape=(1,)],
onnx::Reshape_2706 [dtype=int64, shape=(4,)],
onnx::Transpose_2707 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_2708 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_2709 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_2710 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_2711 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_2712 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_2713 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_2714 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_2715 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_2716 [dtype=float16, shape=(16, 4, 16, 96)],
scores.31 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_2719 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_2721 [dtype=int64, shape=()],
onnx::Equal_2723 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_2725 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_2726 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_2728 [dtype=int64, shape=()],
onnx::Slice_2734 [dtype=int64, shape=(1,)],
onnx::Cast_2738 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_2739 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_2741 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_2742 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_2743 [dtype=bool, shape=(16, 1, 1, 96)],
input.647 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_2746 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_2747 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_2751 [dtype=int64, shape=(1,)],
onnx::Reshape_2756 [dtype=int64, shape=(3,)],
onnx::MatMul_2757 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2759 [dtype=float16, shape=(16, 16, 256)],
input.651 [dtype=float16, shape=(16, 16, 256)],
input.655 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2762 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2763 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2765 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2766 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2768 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2769 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2770 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2771 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_2772 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2773 [dtype=float16, shape=(16, 256, 16)],
input.659 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_2779 [dtype=float16, shape=(16, 256, 7)],
x.31 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_2781 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_2782 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_2783 [dtype=float16, shape=(16, 256, 23)],
input.663 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_2785 [dtype=float16, shape=(16, 256, 16)],
input.667 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2787 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2788 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2790 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2791 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2793 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2794 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2795 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2796 [dtype=float16, shape=(16, 16, 256)],
input.671 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_2798 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_2799 [dtype=float16, shape=(16, 16, 256)],
input.675 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_2801 [dtype=float16, shape=(16, 256, 16)],
input.679 [dtype=float16, shape=(16, 16, 256)],
input.683 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2804 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2805 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2807 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2808 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2810 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2811 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2812 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2813 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_2814 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2816 [dtype=float16, shape=(16, 16, 2048)],
input.687 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_2818 [dtype=float16, shape=(16, 16, 2048)],
input.691 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_2821 [dtype=float16, shape=(16, 16, 256)],
input.695 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2824 [dtype=float16, shape=(16, 16, 256)],
input.699 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2826 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2827 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2829 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2830 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2832 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2833 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2834 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2835 [dtype=float16, shape=(16, 16, 256)],
input.703 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_2841 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_2843 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_2845 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.35 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_2849 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_2850 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2851 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2853 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2854 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2856 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2857 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2858 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2859 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_2860 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2862 [dtype=float16, shape=(16, 16, 2048)],
input.707 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_2864 [dtype=float16, shape=(16, 16, 2048)],
input.711 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_2867 [dtype=float16, shape=(16, 16, 256)],
input.715 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2870 [dtype=float16, shape=(16, 16, 256)],
input.719 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_2872 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_2873 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_2875 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2876 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_2878 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_2879 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_2880 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_2881 [dtype=float16, shape=(16, 16, 256)],
query.35 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_2883 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_2885 [dtype=int64, shape=()],
onnx::Add_2887 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2888 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2893 [dtype=int64, shape=(1,)],
onnx::Reshape_2900 [dtype=int64, shape=(4,)],
onnx::Add_2901 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_2903 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2904 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2909 [dtype=int64, shape=(1,)],
onnx::Reshape_2916 [dtype=int64, shape=(4,)],
onnx::Transpose_2917 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_2919 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_2920 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_2925 [dtype=int64, shape=(1,)],
onnx::Reshape_2932 [dtype=int64, shape=(4,)],
onnx::Transpose_2933 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_2934 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_2935 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_2937 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_2938 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_2939 [dtype=float16, shape=(16, 4, 96, 64)],
v.35 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_2941 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_2942 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_2944 [dtype=int64, shape=()],
onnx::Reshape_2946 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_2951 [dtype=int64, shape=(1,)],
onnx::Reshape_2958 [dtype=int64, shape=(4,)],
onnx::Transpose_2959 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_2960 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_2961 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_2962 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_2963 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_2964 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_2965 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_2966 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_2967 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_2968 [dtype=float16, shape=(16, 4, 16, 96)],
scores.35 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_2971 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_2973 [dtype=int64, shape=()],
onnx::Equal_2975 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_2977 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_2978 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_2980 [dtype=int64, shape=()],
onnx::Slice_2986 [dtype=int64, shape=(1,)],
onnx::Cast_2990 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_2991 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_2993 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_2994 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_2995 [dtype=bool, shape=(16, 1, 1, 96)],
input.723 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_2998 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_2999 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_3003 [dtype=int64, shape=(1,)],
onnx::Reshape_3008 [dtype=int64, shape=(3,)],
onnx::MatMul_3009 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3011 [dtype=float16, shape=(16, 16, 256)],
input.727 [dtype=float16, shape=(16, 16, 256)],
input.731 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3014 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3015 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3017 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3018 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3020 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3021 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3022 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3023 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_3024 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_3025 [dtype=float16, shape=(16, 256, 16)],
input.735 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_3031 [dtype=float16, shape=(16, 256, 7)],
x.35 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_3033 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_3034 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_3035 [dtype=float16, shape=(16, 256, 23)],
input.739 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_3037 [dtype=float16, shape=(16, 256, 16)],
input.743 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3039 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3040 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3042 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3043 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3045 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3046 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3047 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3048 [dtype=float16, shape=(16, 16, 256)],
input.747 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_3050 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_3051 [dtype=float16, shape=(16, 16, 256)],
input.751 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_3053 [dtype=float16, shape=(16, 256, 16)],
input.755 [dtype=float16, shape=(16, 16, 256)],
input.759 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3056 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3057 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3059 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3060 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3062 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3063 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3064 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3065 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_3066 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3068 [dtype=float16, shape=(16, 16, 2048)],
input.763 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_3070 [dtype=float16, shape=(16, 16, 2048)],
input.767 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_3073 [dtype=float16, shape=(16, 16, 256)],
input.771 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3076 [dtype=float16, shape=(16, 16, 256)],
input.775 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3078 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3079 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3081 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3082 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3084 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3085 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3086 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3087 [dtype=float16, shape=(16, 16, 256)],
input.779 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_3093 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_3095 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_3097 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.39 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_3101 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_3102 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3103 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3105 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3106 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3108 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3109 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3110 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3111 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_3112 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3114 [dtype=float16, shape=(16, 16, 2048)],
input.783 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_3116 [dtype=float16, shape=(16, 16, 2048)],
input.787 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_3119 [dtype=float16, shape=(16, 16, 256)],
input.791 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3122 [dtype=float16, shape=(16, 16, 256)],
input.795 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3124 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3125 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3127 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3128 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3130 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3131 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3132 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3133 [dtype=float16, shape=(16, 16, 256)],
query.39 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_3135 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_3137 [dtype=int64, shape=()],
onnx::Add_3139 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_3140 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_3145 [dtype=int64, shape=(1,)],
onnx::Reshape_3152 [dtype=int64, shape=(4,)],
onnx::Add_3153 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_3155 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_3156 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_3161 [dtype=int64, shape=(1,)],
onnx::Reshape_3168 [dtype=int64, shape=(4,)],
onnx::Transpose_3169 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_3171 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_3172 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_3177 [dtype=int64, shape=(1,)],
onnx::Reshape_3184 [dtype=int64, shape=(4,)],
onnx::Transpose_3185 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_3186 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_3187 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_3189 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_3190 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_3191 [dtype=float16, shape=(16, 4, 96, 64)],
v.39 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_3193 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_3194 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_3196 [dtype=int64, shape=()],
onnx::Reshape_3198 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_3203 [dtype=int64, shape=(1,)],
onnx::Reshape_3210 [dtype=int64, shape=(4,)],
onnx::Transpose_3211 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_3212 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_3213 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_3214 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_3215 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_3216 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_3217 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_3218 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_3219 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_3220 [dtype=float16, shape=(16, 4, 16, 96)],
scores.39 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_3223 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_3225 [dtype=int64, shape=()],
onnx::Equal_3227 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_3229 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_3230 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_3232 [dtype=int64, shape=()],
onnx::Slice_3238 [dtype=int64, shape=(1,)],
onnx::Cast_3242 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_3243 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_3245 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_3246 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_3247 [dtype=bool, shape=(16, 1, 1, 96)],
input.799 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_3250 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_3251 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_3255 [dtype=int64, shape=(1,)],
onnx::Reshape_3260 [dtype=int64, shape=(3,)],
onnx::MatMul_3261 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3263 [dtype=float16, shape=(16, 16, 256)],
input.803 [dtype=float16, shape=(16, 16, 256)],
input.807 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3266 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3267 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3269 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3270 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3272 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3273 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3274 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3275 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_3276 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_3277 [dtype=float16, shape=(16, 256, 16)],
input.811 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_3283 [dtype=float16, shape=(16, 256, 7)],
x.39 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_3285 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_3286 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_3287 [dtype=float16, shape=(16, 256, 23)],
input.815 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_3289 [dtype=float16, shape=(16, 256, 16)],
input.819 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3291 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3292 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3294 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3295 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3297 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3298 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3299 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3300 [dtype=float16, shape=(16, 16, 256)],
input.823 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_3302 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_3303 [dtype=float16, shape=(16, 16, 256)],
input.827 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_3305 [dtype=float16, shape=(16, 256, 16)],
input.831 [dtype=float16, shape=(16, 16, 256)],
input.835 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3308 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3309 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3311 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3312 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3314 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3315 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3316 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3317 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_3318 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3320 [dtype=float16, shape=(16, 16, 2048)],
input.839 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_3322 [dtype=float16, shape=(16, 16, 2048)],
input.843 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_3325 [dtype=float16, shape=(16, 16, 256)],
input.847 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3328 [dtype=float16, shape=(16, 16, 256)],
input.851 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3330 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3331 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3333 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3334 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3336 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3337 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3338 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3339 [dtype=float16, shape=(16, 16, 256)],
input.855 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_3345 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_3347 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_3349 [dtype=float16, shape=(16, 1, 256, 7)],
tensor.43 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_3353 [dtype=float16, shape=(16, 256, 7)],
onnx::Sub_3354 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3355 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3357 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3358 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3360 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3361 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3362 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3363 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_3364 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3366 [dtype=float16, shape=(16, 16, 2048)],
input.859 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_3368 [dtype=float16, shape=(16, 16, 2048)],
input.863 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_3371 [dtype=float16, shape=(16, 16, 256)],
input.867 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3374 [dtype=float16, shape=(16, 16, 256)],
input.871 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3376 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3377 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3379 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3380 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3382 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3383 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3384 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3385 [dtype=float16, shape=(16, 16, 256)],
query.43 [dtype=float16, shape=(16, 16, 256)],
onnx::Gather_3387 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_3389 [dtype=int64, shape=()],
onnx::Add_3391 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_3392 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_3397 [dtype=int64, shape=(1,)],
onnx::Reshape_3404 [dtype=int64, shape=(4,)],
onnx::Add_3405 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_3407 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_3408 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_3413 [dtype=int64, shape=(1,)],
onnx::Reshape_3420 [dtype=int64, shape=(4,)],
onnx::Transpose_3421 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Add_3423 [dtype=float16, shape=(16, 16, 256)],
onnx::Reshape_3424 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_3429 [dtype=int64, shape=(1,)],
onnx::Reshape_3436 [dtype=int64, shape=(4,)],
onnx::Transpose_3437 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_3438 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_3439 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Concat_3441 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_3442 [dtype=float16, shape=(16, 4, 80, 64)],
onnx::Concat_3443 [dtype=float16, shape=(16, 4, 96, 64)],
v.43 [dtype=float16, shape=(16, 4, 96, 64)],
onnx::Slice_3445 [dtype=float16, shape=(16, 4, 96, 128)],
onnx::Gather_3446 [dtype=int64, shape=(3,)],
onnx::Unsqueeze_3448 [dtype=int64, shape=()],
onnx::Reshape_3450 [dtype=float16, shape=(16, 96, 256)],
onnx::Concat_3455 [dtype=int64, shape=(1,)],
onnx::Reshape_3462 [dtype=int64, shape=(4,)],
onnx::Transpose_3463 [dtype=float16, shape=(16, 96, 4, 64)],
onnx::Transpose_3464 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_3465 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Transpose_3466 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::MatMul_3467 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::MatMul_3468 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_3469 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::MatMul_3470 [dtype=float16, shape=(16, 4, 64, 96)],
onnx::Add_3471 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Div_3472 [dtype=float16, shape=(16, 4, 16, 96)],
scores.43 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Gather_3475 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_3477 [dtype=int64, shape=()],
onnx::Equal_3479 [dtype=float16, shape=(16, 1, 1, 96)],
onnx::Slice_3481 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Gather_3482 [dtype=int64, shape=(4,)],
onnx::Unsqueeze_3484 [dtype=int64, shape=()],
onnx::Slice_3490 [dtype=int64, shape=(1,)],
onnx::Cast_3494 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Where_3495 [dtype=bool, shape=(16, 1, 1, 96)],
onnx::Softmax_3497 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_3498 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Where_3499 [dtype=bool, shape=(16, 1, 1, 96)],
input.875 [dtype=float16, shape=(16, 4, 16, 96)],
onnx::Transpose_3502 [dtype=float16, shape=(16, 4, 16, 64)],
onnx::Reshape_3503 [dtype=float16, shape=(16, 16, 4, 64)],
onnx::Concat_3507 [dtype=int64, shape=(1,)],
onnx::Reshape_3512 [dtype=int64, shape=(3,)],
onnx::MatMul_3513 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3515 [dtype=float16, shape=(16, 16, 256)],
input.879 [dtype=float16, shape=(16, 16, 256)],
input.883 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3518 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3519 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3521 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3522 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3524 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3525 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3526 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3527 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_3528 [dtype=float16, shape=(16, 16, 256)],
onnx::Concat_3529 [dtype=float16, shape=(16, 256, 16)],
input.887 [dtype=float16, shape=(16, 256, 23)],
onnx::Unsqueeze_3535 [dtype=float16, shape=(16, 256, 7)],
x.43 [dtype=float16, shape=(16, 512, 23)],
onnx::Mul_3537 [dtype=float16, shape=(16, 256, 23)],
onnx::Sigmoid_3538 [dtype=float16, shape=(16, 256, 23)],
onnx::Mul_3539 [dtype=float16, shape=(16, 256, 23)],
input.891 [dtype=float16, shape=(16, 256, 23)],
onnx::Transpose_3541 [dtype=float16, shape=(16, 256, 16)],
input.895 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3543 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3544 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3546 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3547 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3549 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3550 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3551 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3552 [dtype=float16, shape=(16, 16, 256)],
input.899 [dtype=float16, shape=(16, 16, 256)],
onnx::Mul_3554 [dtype=float16, shape=(16, 16, 256)],
onnx::Transpose_3555 [dtype=float16, shape=(16, 16, 256)],
input.903 [dtype=float16, shape=(16, 256, 16)],
onnx::Transpose_3557 [dtype=float16, shape=(16, 256, 16)],
input.907 [dtype=float16, shape=(16, 16, 256)],
input.911 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3560 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3561 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3563 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3564 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3566 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3567 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3568 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3569 [dtype=float16, shape=(16, 16, 256)],
onnx::MatMul_3570 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3572 [dtype=float16, shape=(16, 16, 2048)],
input.915 [dtype=float16, shape=(16, 16, 2048)],
onnx::Mul_3574 [dtype=float16, shape=(16, 16, 2048)],
input.919 [dtype=float16, shape=(16, 16, 2048)],
onnx::Add_3577 [dtype=float16, shape=(16, 16, 256)],
input.923 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3580 [dtype=float16, shape=(16, 16, 256)],
input.927 [dtype=float16, shape=(16, 16, 256)],
onnx::Sub_3582 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3583 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3585 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3586 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3588 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3589 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3590 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3591 [dtype=float16, shape=(16, 16, 256)],
input.931 [dtype=float16, shape=(16, 16, 256)],
onnx::Unsqueeze_3597 [dtype=float16, shape=(16, 4, 80, 128)],
onnx::Concat_3599 [dtype=float16, shape=(16, 1, 4, 80, 128)],
onnx::Concat_3601 [dtype=float16, shape=(16, 1, 256, 7)],
onnx::Sub_3602 [dtype=float16, shape=(16, 16, 1)],
onnx::Pow_3603 [dtype=float16, shape=(16, 16, 256)],
onnx::ReduceMean_3605 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3606 [dtype=float16, shape=(16, 16, 1)],
onnx::Sqrt_3608 [dtype=float16, shape=(16, 16, 1)],
onnx::Div_3609 [dtype=float16, shape=(16, 16, 1)],
onnx::Mul_3610 [dtype=float16, shape=(16, 16, 256)],
onnx::Add_3611 [dtype=float16, shape=(16, 16, 256)],
chunk_out [dtype=float16, shape=(16, 16, 256)],
r_att_cache [dtype=float16, shape=(16, 12, 4, 80, 128)],
r_cnn_cache [dtype=float16, shape=(16, 12, 256, 7)],
onnx::Add_3616 [dtype=float16, shape=(16, 16, 5235)],
onnx::LogSoftmax_3617 [dtype=float16, shape=(16, 16, 5235)],
onnx::TopK_3618 [dtype=float16, shape=(16, 16, 5235)],
TopK_2439_output_cast_0 [dtype=float32, shape=(16, 16, 10)],
log_probs_idx [dtype=int64, shape=(16, 16, 10)],
log_probs [dtype=float16, shape=(16, 16, 10)],
onnx::Gather_3625 [dtype=int64, shape=(3,)],
onnx::Add_3627 [dtype=int64, shape=()],
onnx::Unsqueeze_3628 [dtype=int64, shape=(16,)],
onnx::Cast_3630 [dtype=int32, shape=(16,)],
onnx::Cast_3631 [dtype=int64, shape=(16,)],
chunk_out_lens [dtype=int32, shape=(16,)],
r_offset [dtype=int64, shape=(16, 1)],
TopK_2439_input_cast_0 [dtype=float32, shape=(16, 16, 5235)],
onnx::Cast_3622 [dtype=float16, shape=(16, 16, 10)]}
[I] onnxrt-runner-N0-04/18/23-11:19:47 | Completed 1 iteration(s) in 1194 ms | Average inference time: 1194 ms.
[I] trt-runner-N0-04/18/23-11:19:47 | Activating and starting inference
[V] [MemUsageChange] Init CUDA: CPU +446, GPU +0, now: CPU 919, GPU 1317 (MiB)
[V] [MemUsageChange] Init builder kernel library: CPU +420, GPU +72, now: CPU 1416, GPU 1389 (MiB)
[W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[V] ----------------------------------------------------------------
[V] Input filename: /mnt/samsung-t7/yuekai/benchmark/triton_speech_recognition_benchmark_new/model_repo_stateful_conformer_aishell2_wenet/encoder/1/encoder.onnx
[V] ONNX IR version: 0.0.7
[V] Opset version: 14
[V] Producer name: pytorch
[V] Producer version: 1.11.0
[V] Domain:
[V] Model version: 0
[V] Doc string:
[V] ----------------------------------------------------------------
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[V] Executing postprocessing step [ModifyNetworkOutputs]
[V] Marking 3382 tensors as outputs
[V] Setting TensorRT Optimization Profiles
[V] Input tensor: chunk_xs (dtype=DataType.HALF, shape=(-1, 67, 80)) | Setting input tensor shapes to: (min=[1, 67, 80], opt=[16, 67, 80], max=[32, 67, 80])
[V] Input tensor: chunk_lens (dtype=DataType.INT32, shape=(-1,)) | Setting input tensor shapes to: (min=[1], opt=[16], max=[32])
[V] Input tensor: offset (dtype=DataType.INT32, shape=(-1, 1)) | Setting input tensor shapes to: (min=[1, 1], opt=[16, 1], max=[32, 1])
[V] Input tensor: att_cache (dtype=DataType.HALF, shape=(-1, 12, 4, 80, 128)) | Setting input tensor shapes to: (min=[1, 12, 4, 80, 128], opt=[16, 12, 4, 80, 128], max=[32, 12, 4, 80, 128])
[V] Input tensor: cnn_cache (dtype=DataType.HALF, shape=(-1, 12, 256, 7)) | Setting input tensor shapes to: (min=[1, 12, 256, 7], opt=[16, 12, 256, 7], max=[32, 12, 256, 7])
[V] Input tensor: cache_mask (dtype=DataType.HALF, shape=(-1, 1, 80)) | Setting input tensor shapes to: (min=[1, 1, 80], opt=[16, 1, 80], max=[32, 1, 80])
[I] Configuring with profiles: [Profile().add('chunk_xs', min=[1, 67, 80], opt=[16, 67, 80], max=[32, 67, 80]).add('chunk_lens', min=[1], opt=[16], max=[32]).add('offset', min=[1, 1], opt=[16, 1], max=[32, 1]).add('att_cache', min=[1, 12, 4, 80, 128], opt=[16, 12, 4, 80, 128], max=[32, 12, 4, 80, 128]).add('cnn_cache', min=[1, 12, 256, 7], opt=[16, 12, 256, 7], max=[32, 12, 256, 7]).add('cache_mask', min=[1, 1, 80], opt=[16, 1, 80], max=[32, 1, 80])]
[I] Building engine with configuration:
Flags | [FP16]
Engine Capability | EngineCapability.DEFAULT
Memory Pools | [WORKSPACE: 953.67 MiB, TACTIC_DRAM: 32508.19 MiB]
Tactic Sources | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
Preview Features | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[W] Detected layernorm nodes in FP16: Sub_112, Pow_114, ReduceMean_115, Add_117, Sqrt_118, Div_119, Mul_120, Add_121, Sub_132, Pow_134, ReduceMean_135, Add_137, Sqrt_138, Div_139, Mul_140, Add_141, Sub_220, Pow_222, ReduceMean_223, Add_225, Sqrt_226, Div_227, Mul_228, Add_229, Sub_244, Pow_246, ReduceMean_247, Add_249, Sqrt_250, Div_251, Mul_252, Add_253, Sub_261, Pow_263, ReduceMean_264, Add_266, Sqrt_267, Div_268, Mul_269, Add_270, Sub_281, Pow_283, ReduceMean_284, Add_286, Sqrt_287, Div_288, Mul_289, Add_290, Sub_305, Pow_307, ReduceMean_308, Add_310, Sqrt_311, Div_312, Mul_313, Add_314, Sub_325, Pow_327, ReduceMean_328, Add_330, Sqrt_331, Div_332, Mul_333, Add_334, Sub_413, Pow_415, ReduceMean_416, Add_418, Sqrt_419, Div_420, Mul_421, Add_422, Sub_437, Pow_439, ReduceMean_440, Add_442, Sqrt_443, Div_444, Mul_445, Add_446, Sub_454, Pow_456, ReduceMean_457, Add_459, Sqrt_460, Div_461, Mul_462, Add_463, Sub_474, Pow_476, ReduceMean_477, Add_479, Sqrt_480, Div_481, Mul_482, Add_483, Sub_498, Pow_500, ReduceMean_501, Add_503, Sqrt_504, Div_505, Mul_506, Add_507, Sub_518, Pow_520, ReduceMean_521, Add_523, Sqrt_524, Div_525, Mul_526, Add_527, Sub_606, Pow_608, ReduceMean_609, Add_611, Sqrt_612, Div_613, Mul_614, Add_615, Sub_630, Pow_632, ReduceMean_633, Add_635, Sqrt_636, Div_637, Mul_638, Add_639, Sub_647, Pow_649, ReduceMean_650, Add_652, Sqrt_653, Div_654, Mul_655, Add_656, Sub_667, Pow_669, ReduceMean_670, Add_672, Sqrt_673, Div_674, Mul_675, Add_676, Sub_691, Pow_693, ReduceMean_694, Add_696, Sqrt_697, Div_698, Mul_699, Add_700, Sub_711, Pow_713, ReduceMean_714, Add_716, Sqrt_717, Div_718, Mul_719, Add_720, Sub_799, Pow_801, ReduceMean_802, Add_804, Sqrt_805, Div_806, Mul_807, Add_808, Sub_823, Pow_825, ReduceMean_826, Add_828, Sqrt_829, Div_830, Mul_831, Add_832, Sub_840, Pow_842, ReduceMean_843, Add_845, Sqrt_846, Div_847, Mul_848, Add_849, Sub_860, Pow_862, ReduceMean_863, Add_865, Sqrt_866, Div_867, Mul_868, Add_869, Sub_884, Pow_886, ReduceMean_887, Add_889, Sqrt_890, Div_891, Mul_892, Add_893, Sub_904, Pow_906, ReduceMean_907, Add_909, Sqrt_910, Div_911, Mul_912, Add_913, Sub_992, Pow_994, ReduceMean_995, Add_997, Sqrt_998, Div_999, Mul_1000, Add_1001, Sub_1016, Pow_1018, ReduceMean_1019, Add_1021, Sqrt_1022, Div_1023, Mul_1024, Add_1025, Sub_1033, Pow_1035, ReduceMean_1036, Add_1038, Sqrt_1039, Div_1040, Mul_1041, Add_1042, Sub_1053, Pow_1055, ReduceMean_1056, Add_1058, Sqrt_1059, Div_1060, Mul_1061, Add_1062, Sub_1077, Pow_1079, ReduceMean_1080, Add_1082, Sqrt_1083, Div_1084, Mul_1085, Add_1086, Sub_1097, Pow_1099, ReduceMean_1100, Add_1102, Sqrt_1103, Div_1104, Mul_1105, Add_1106, Sub_1185, Pow_1187, ReduceMean_1188, Add_1190, Sqrt_1191, Div_1192, Mul_1193, Add_1194, Sub_1209, Pow_1211, ReduceMean_1212, Add_1214, Sqrt_1215, Div_1216, Mul_1217, Add_1218, Sub_1226, Pow_1228, ReduceMean_1229, Add_1231, Sqrt_1232, Div_1233, Mul_1234, Add_1235, Sub_1246, Pow_1248, ReduceMean_1249, Add_1251, Sqrt_1252, Div_1253, Mul_1254, Add_1255, Sub_1270, Pow_1272, ReduceMean_1273, Add_1275, Sqrt_1276, Div_1277, Mul_1278, Add_1279, Sub_1290, Pow_1292, ReduceMean_1293, Add_1295, Sqrt_1296, Div_1297, Mul_1298, Add_1299, Sub_1378, Pow_1380, ReduceMean_1381, Add_1383, Sqrt_1384, Div_1385, Mul_1386, Add_1387, Sub_1402, Pow_1404, ReduceMean_1405, Add_1407, Sqrt_1408, Div_1409, Mul_1410, Add_1411, Sub_1419, Pow_1421, ReduceMean_1422, Add_1424, Sqrt_1425, Div_1426, Mul_1427, Add_1428, Sub_1439, Pow_1441, ReduceMean_1442, Add_1444, Sqrt_1445, Div_1446, Mul_1447, Add_1448, Sub_1463, Pow_1465, ReduceMean_1466, Add_1468, Sqrt_1469, Div_1470, Mul_1471, Add_1472, Sub_1483, Pow_1485, ReduceMean_1486, Add_1488, Sqrt_1489, Div_1490, Mul_1491, Add_1492, Sub_1571, Pow_1573, ReduceMean_1574, Add_1576, Sqrt_1577, Div_1578, Mul_1579, Add_1580, Sub_1595, Pow_1597, ReduceMean_1598, Add_1600, Sqrt_1601, Div_1602, Mul_1603, Add_1604, Sub_1612, Pow_1614, ReduceMean_1615, Add_1617, Sqrt_1618, Div_1619, Mul_1620, Add_1621, Sub_1632, Pow_1634, ReduceMean_1635, Add_1637, Sqrt_1638, Div_1639, Mul_1640, Add_1641, Sub_1656, Pow_1658, ReduceMean_1659, Add_1661, Sqrt_1662, Div_1663, Mul_1664, Add_1665, Sub_1676, Pow_1678, ReduceMean_1679, Add_1681, Sqrt_1682, Div_1683, Mul_1684, Add_1685, Sub_1764, Pow_1766, ReduceMean_1767, Add_1769, Sqrt_1770, Div_1771, Mul_1772, Add_1773, Sub_1788, Pow_1790, ReduceMean_1791, Add_1793, Sqrt_1794, Div_1795, Mul_1796, Add_1797, Sub_1805, Pow_1807, ReduceMean_1808, Add_1810, Sqrt_1811, Div_1812, Mul_1813, Add_1814, Sub_1825, Pow_1827, ReduceMean_1828, Add_1830, Sqrt_1831, Div_1832, Mul_1833, Add_1834, Sub_1849, Pow_1851, ReduceMean_1852, Add_1854, Sqrt_1855, Div_1856, Mul_1857, Add_1858, Sub_1869, Pow_1871, ReduceMean_1872, Add_1874, Sqrt_1875, Div_1876, Mul_1877, Add_1878, Sub_1957, Pow_1959, ReduceMean_1960, Add_1962, Sqrt_1963, Div_1964, Mul_1965, Add_1966, Sub_1981, Pow_1983, ReduceMean_1984, Add_1986, Sqrt_1987, Div_1988, Mul_1989, Add_1990, Sub_1998, Pow_2000, ReduceMean_2001, Add_2003, Sqrt_2004, Div_2005, Mul_2006, Add_2007, Sub_2018, Pow_2020, ReduceMean_2021, Add_2023, Sqrt_2024, Div_2025, Mul_2026, Add_2027, Sub_2042, Pow_2044, ReduceMean_2045, Add_2047, Sqrt_2048, Div_2049, Mul_2050, Add_2051, Sub_2062, Pow_2064, ReduceMean_2065, Add_2067, Sqrt_2068, Div_2069, Mul_2070, Add_2071, Sub_2150, Pow_2152, ReduceMean_2153, Add_2155, Sqrt_2156, Div_2157, Mul_2158, Add_2159, Sub_2174, Pow_2176, ReduceMean_2177, Add_2179, Sqrt_2180, Div_2181, Mul_2182, Add_2183, Sub_2191, Pow_2193, ReduceMean_2194, Add_2196, Sqrt_2197, Div_2198, Mul_2199, Add_2200, Sub_2211, Pow_2213, ReduceMean_2214, Add_2216, Sqrt_2217, Div_2218, Mul_2219, Add_2220, Sub_2235, Pow_2237, ReduceMean_2238, Add_2240, Sqrt_2241, Div_2242, Mul_2243, Add_2244, Sub_2255, Pow_2257, ReduceMean_2258, Add_2260, Sqrt_2261, Div_2262, Mul_2263, Add_2264, Sub_2343, Pow_2345, ReduceMean_2346, Add_2348, Sqrt_2349, Div_2350, Mul_2351, Add_2352, Sub_2367, Pow_2369, ReduceMean_2370, Add_2372, Sqrt_2373, Div_2374, Mul_2375, Add_2376, Sub_2384, Pow_2386, ReduceMean_2387, Add_2389, Sqrt_2390, Div_2391, Mul_2392, Add_2393, Sub_2404, Pow_2406, ReduceMean_2407, Add_2409, Sqrt_2410, Div_2411, Mul_2412, Add_2413, Sub_2424, Pow_2426, ReduceMean_2427, Add_2429, Sqrt_2430, Div_2431, Mul_2432, Add_2433
[W] Running layernorm after self-attention in FP16 may cause overflow. Forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
[E] 2: [myelinBuilderUtils.cpp::operator()::737] Error Code 2: Internal Error ([ShapeHostToDeviceCopy 0] requires bool or uint8 I/O but node can not be handled by Myelin. Operation is not supported.)
[E] FAILED | Runtime: 13.559s | Command: /usr/local/bin/polygraphy run ./encoder.onnx --fp16 --onnxrt --trt --atol 1e-3 --rtol 1e-3 --pool-limit workspace:1000000000 --save-engine=./encoder1_fp16.plan --verbose --onnx-outputs mark all --trt-outputs mark all --trt-min-shapes chunk_xs:[1,67,80] chunk_lens:[1] offset:[1,1] att_cache:[1,12,4,80,128] cnn_cache:[1,12,256,7] cache_mask:[1,1,80] --trt-opt-shapes chunk_xs:[16,67,80] chunk_lens:[16] offset:[16,1] att_cache:[16,12,4,80,128] cnn_cache:[16,12,256,7] cache_mask:[16,1,80] --trt-max-shapes chunk_xs:[32,67,80] chunk_lens:[32] offset:[32,1] att_cache:[32,12,4,80,128] cnn_cache:[32,12,256,7] cache_mask:[32,1,80] --input-shapes chunk_xs:[16,67,80] chunk_lens:[16] offset:[16,1] att_cache:[16,12,4,80,128] cnn_cache:[16,12,256,7] cache_mask:[16,1,80] --validate