DEIMv2 / deimv2_dinov3_s_coco.log
carpedm20's picture
Upload folder using huggingface_hub
ca956b1 verified
&&&& RUNNING TensorRT.trtexec [TensorRT v101401] [b48] # trtexec --onnx=checkpoints/deimv2_dinov3_s_coco.onnx --saveEngine=checkpoints/deimv2_dinov3_s_coco.engine --fp16 --optShapes=images:1x3x640x640,orig_target_sizes:1x2 --memPoolSize=workspace:4096 --builderOptimizationLevel=3
[01/20/2026-06:55:08] [W] optShapes is being broadcasted to minShapes for tensor orig_target_sizes
[01/20/2026-06:55:08] [W] optShapes is being broadcasted to maxShapes for tensor orig_target_sizes
[01/20/2026-06:55:08] [W] optShapes is being broadcasted to minShapes for tensor images
[01/20/2026-06:55:08] [W] optShapes is being broadcasted to maxShapes for tensor images
[01/20/2026-06:55:08] [W] Weakly-typed networks have been deprecated in TensorRT. You can use the AutoCast tool (https://nvidia.github.io/TensorRT-Model-Optimizer/guides/8_autocast.html) to convert the network to be strongly typed.
[01/20/2026-06:55:08] [I] === Model Options ===
[01/20/2026-06:55:08] [I] Format: ONNX
[01/20/2026-06:55:08] [I] Model: checkpoints/deimv2_dinov3_s_coco.onnx
[01/20/2026-06:55:08] [I] Output:
[01/20/2026-06:55:08] [I] === Build Options ===
[01/20/2026-06:55:08] [I] Memory Pools: workspace: 4096 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/20/2026-06:55:08] [I] avgTiming: 8
[01/20/2026-06:55:08] [I] Precision: FP32+FP16
[01/20/2026-06:55:08] [I] LayerPrecisions:
[01/20/2026-06:55:08] [I] Layer Device Types:
[01/20/2026-06:55:08] [I] Decomposable Attentions:
[01/20/2026-06:55:08] [I] Calibration:
[01/20/2026-06:55:08] [I] Refit: Disabled
[01/20/2026-06:55:08] [I] Strip weights: Disabled
[01/20/2026-06:55:08] [I] Version Compatible: Disabled
[01/20/2026-06:55:08] [I] ONNX Plugin InstanceNorm: Disabled
[01/20/2026-06:55:08] [I] ONNX kENABLE_UINT8_AND_ASYMMETRIC_QUANTIZATION_DLA flag: Disabled
[01/20/2026-06:55:08] [I] TensorRT runtime: full
[01/20/2026-06:55:08] [I] Lean DLL Path:
[01/20/2026-06:55:08] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/20/2026-06:55:08] [I] Exclude Lean Runtime: Disabled
[01/20/2026-06:55:08] [I] Sparsity: Disabled
[01/20/2026-06:55:08] [I] Safe mode: Disabled
[01/20/2026-06:55:08] [I] Build DLA standalone loadable: Disabled
[01/20/2026-06:55:08] [I] Allow GPU fallback for DLA: Disabled
[01/20/2026-06:55:08] [I] DirectIO mode: Disabled
[01/20/2026-06:55:08] [I] Restricted mode: Disabled
[01/20/2026-06:55:08] [I] Skip inference: Disabled
[01/20/2026-06:55:08] [I] Save engine: checkpoints/deimv2_dinov3_s_coco.engine
[01/20/2026-06:55:08] [I] Load engine:
[01/20/2026-06:55:08] [I] Profiling verbosity: 0
[01/20/2026-06:55:08] [I] Tactic sources: Using default tactic sources
[01/20/2026-06:55:08] [I] timingCacheMode: local
[01/20/2026-06:55:08] [I] timingCacheFile:
[01/20/2026-06:55:08] [I] Enable Compilation Cache: Enabled
[01/20/2026-06:55:08] [I] Enable Monitor Memory: Disabled
[01/20/2026-06:55:08] [I] errorOnTimingCacheMiss: Disabled
[01/20/2026-06:55:08] [I] Preview Features: Use default preview flags.
[01/20/2026-06:55:08] [I] MaxAuxStreams: -1
[01/20/2026-06:55:08] [I] BuilderOptimizationLevel: 3
[01/20/2026-06:55:08] [I] MaxTactics: -1
[01/20/2026-06:55:08] [I] Calibration Profile Index: 0
[01/20/2026-06:55:08] [I] Weight Streaming: Disabled
[01/20/2026-06:55:08] [I] Runtime Platform: Same As Build
[01/20/2026-06:55:08] [I] Debug Tensors:
[01/20/2026-06:55:08] [I] Distributive Independence: Disabled
[01/20/2026-06:55:08] [I] Mark Unfused Tensors As Debug Tensors: Disabled
[01/20/2026-06:55:08] [I] Input(s)s format: fp32:CHW
[01/20/2026-06:55:08] [I] Output(s)s format: fp32:CHW
[01/20/2026-06:55:08] [I] Input build shape (profile 0): images=1x3x640x640+1x3x640x640+1x3x640x640
[01/20/2026-06:55:08] [I] Input build shape (profile 0): orig_target_sizes=1x2+1x2+1x2
[01/20/2026-06:55:08] [I] Input calibration shapes: model
[01/20/2026-06:55:08] [I] === System Options ===
[01/20/2026-06:55:08] [I] Device: 0
[01/20/2026-06:55:08] [I] DLACore:
[01/20/2026-06:55:08] [I] Plugins:
[01/20/2026-06:55:08] [I] setPluginsToSerialize:
[01/20/2026-06:55:08] [I] dynamicPlugins:
[01/20/2026-06:55:08] [I] ignoreParsedPluginLibs: 0
[01/20/2026-06:55:08] [I]
[01/20/2026-06:55:08] [I] === Inference Options ===
[01/20/2026-06:55:08] [I] Batch: Explicit
[01/20/2026-06:55:08] [I] Input inference shape : orig_target_sizes=1x2
[01/20/2026-06:55:08] [I] Input inference shape : images=1x3x640x640
[01/20/2026-06:55:08] [I] Iterations: 10
[01/20/2026-06:55:08] [I] Duration: 3s (+ 200ms warm up)
[01/20/2026-06:55:08] [I] Sleep time: 0ms
[01/20/2026-06:55:08] [I] Idle time: 0ms
[01/20/2026-06:55:08] [I] Inference Streams: 1
[01/20/2026-06:55:08] [I] ExposeDMA: Disabled
[01/20/2026-06:55:08] [I] Data transfers: Enabled
[01/20/2026-06:55:08] [I] Spin-wait: Disabled
[01/20/2026-06:55:08] [I] Multithreading: Disabled
[01/20/2026-06:55:08] [I] CUDA Graph: Disabled
[01/20/2026-06:55:08] [I] Separate profiling: Disabled
[01/20/2026-06:55:08] [I] Time Deserialize: Disabled
[01/20/2026-06:55:08] [I] Time Refit: Disabled
[01/20/2026-06:55:08] [I] NVTX verbosity: 0
[01/20/2026-06:55:08] [I] Persistent Cache Ratio: 0
[01/20/2026-06:55:08] [I] Optimization Profile Index: 0
[01/20/2026-06:55:08] [I] Weight Streaming Budget: 100.000000%
[01/20/2026-06:55:08] [I] Inputs:
[01/20/2026-06:55:08] [I] Debug Tensor Save Destinations:
[01/20/2026-06:55:08] [I] Dump All Debug Tensor in Formats:
[01/20/2026-06:55:08] [I] === Reporting Options ===
[01/20/2026-06:55:08] [I] Verbose: Disabled
[01/20/2026-06:55:08] [I] Averages: 10 inferences
[01/20/2026-06:55:08] [I] Percentiles: 90,95,99
[01/20/2026-06:55:08] [I] Dump refittable layers:Disabled
[01/20/2026-06:55:08] [I] Dump output: Disabled
[01/20/2026-06:55:08] [I] Profile: Disabled
[01/20/2026-06:55:08] [I] Export timing to JSON file:
[01/20/2026-06:55:08] [I] Export output to JSON file:
[01/20/2026-06:55:08] [I] Export profile to JSON file:
[01/20/2026-06:55:08] [I]
[01/20/2026-06:55:08] [I] === Device Information ===
[01/20/2026-06:55:08] [I] Available Devices:
[01/20/2026-06:55:08] [I] Device 0: "NVIDIA GeForce RTX 4090" UUID: GPU-55c23db9-433c-0d6c-46e7-9387266e5ddb
[01/20/2026-06:55:08] [I] Selected Device: NVIDIA GeForce RTX 4090
[01/20/2026-06:55:08] [I] Selected Device ID: 0
[01/20/2026-06:55:08] [I] Selected Device UUID: GPU-55c23db9-433c-0d6c-46e7-9387266e5ddb
[01/20/2026-06:55:08] [I] Compute Capability: 8.9
[01/20/2026-06:55:08] [I] SMs: 128
[01/20/2026-06:55:08] [I] Device Global Memory: 24071 MiB
[01/20/2026-06:55:08] [I] Shared Memory per SM: 100 KiB
[01/20/2026-06:55:08] [I] Memory Bus Width: 384 bits (ECC disabled)
[01/20/2026-06:55:08] [I] Application Compute Clock Rate: 2.52 GHz
[01/20/2026-06:55:08] [I] Application Memory Clock Rate: 10.501 GHz
[01/20/2026-06:55:08] [I]
[01/20/2026-06:55:08] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/20/2026-06:55:08] [I]
[01/20/2026-06:55:08] [I] TensorRT version: 10.14.1
[01/20/2026-06:55:08] [I] Loading standard plugins
[01/20/2026-06:55:08] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 29, GPU 10549 (MiB)
[01/20/2026-06:55:08] [I] Start parsing network model.
[01/20/2026-06:55:08] [I] [TRT] ----------------------------------------------------------------
[01/20/2026-06:55:08] [I] [TRT] Input filename: checkpoints/deimv2_dinov3_s_coco.onnx
[01/20/2026-06:55:08] [I] [TRT] ONNX IR version: 0.0.8
[01/20/2026-06:55:08] [I] [TRT] Opset version: 17
[01/20/2026-06:55:08] [I] [TRT] Producer name: pytorch
[01/20/2026-06:55:08] [I] [TRT] Producer version: 2.10.0
[01/20/2026-06:55:08] [I] [TRT] Domain:
[01/20/2026-06:55:08] [I] [TRT] Model version: 0
[01/20/2026-06:55:08] [I] [TRT] Doc string:
[01/20/2026-06:55:08] [I] [TRT] ----------------------------------------------------------------
[01/20/2026-06:55:08] [W] [TRT] ModelImporter.cpp:661: Make sure input orig_target_sizes has Int64 binding.
[01/20/2026-06:55:09] [W] [TRT] ModelImporter.cpp:908: Make sure output labels has Int64 binding.
[01/20/2026-06:55:09] [I] Finished parsing network model. Parse time: 0.0945442
[01/20/2026-06:55:09] [I] Set shape of input tensor images for optimization profile 0 to: MIN=1x3x640x640 OPT=1x3x640x640 MAX=1x3x640x640
[01/20/2026-06:55:09] [I] Set shape of input tensor orig_target_sizes for optimization profile 0 to: MIN=1x2 OPT=1x2 MAX=1x2
[01/20/2026-06:55:09] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +204, GPU +4, now: CPU 571, GPU 10553 (MiB)
[01/20/2026-06:55:09] [W] [TRT] Detected layernorm nodes in FP16.
[01/20/2026-06:55:09] [W] [TRT] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.
[01/20/2026-06:55:09] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[01/20/2026-06:55:52] [I] [TRT] Compiler backend is used during engine build.
[01/20/2026-06:57:39] [I] [TRT] Detected 2 inputs and 3 output network tensors.
[01/20/2026-06:57:39] [I] [TRT] Total Host Persistent Memory: 281504 bytes
[01/20/2026-06:57:39] [I] [TRT] Total Device Persistent Memory: 3072 bytes
[01/20/2026-06:57:39] [I] [TRT] Max Scratch Memory: 9665024 bytes
[01/20/2026-06:57:39] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 91 steps to complete.
[01/20/2026-06:57:39] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 2.89015ms to assign 11 blocks to 91 nodes requiring 21496320 bytes.
[01/20/2026-06:57:39] [I] [TRT] Total Activation Memory: 21496320 bytes
[01/20/2026-06:57:39] [I] [TRT] Total Weights Memory: 19740416 bytes
[01/20/2026-06:57:40] [I] [TRT] Compiler backend is used during engine execution.
[01/20/2026-06:57:40] [I] [TRT] Engine generation completed in 150.685 seconds.
[01/20/2026-06:57:40] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 11 MiB, GPU 93 MiB
[01/20/2026-06:57:40] [I] Created engine with size: 25.2802 MiB
[01/20/2026-06:57:40] [I] Engine built in 151.034 sec.
[01/20/2026-06:57:40] [I] [TRT] Loaded engine size: 25 MiB
[01/20/2026-06:57:40] [I] Engine deserialized in 0.0153845 sec.
[01/20/2026-06:57:40] [I] [TRT] [MS] Running engine with multi stream info
[01/20/2026-06:57:40] [I] [TRT] [MS] Number of aux streams is 2
[01/20/2026-06:57:40] [I] [TRT] [MS] Number of total worker streams is 3
[01/20/2026-06:57:40] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[01/20/2026-06:57:40] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +21, now: CPU 0, GPU 39 (MiB)
[01/20/2026-06:57:40] [I] Setting persistentCacheLimit to 0 bytes.
[01/20/2026-06:57:40] [I] Created execution context with device memory size: 20.5005 MiB
[01/20/2026-06:57:40] [I] Using random values for input images
[01/20/2026-06:57:40] [I] Input binding for images with dimensions 1x3x640x640 is created.
[01/20/2026-06:57:40] [I] Using random values for input orig_target_sizes
[01/20/2026-06:57:40] [I] Input binding for orig_target_sizes with dimensions 1x2 is created.
[01/20/2026-06:57:40] [I] Output binding for labels with dimensions 1x300 is created.
[01/20/2026-06:57:40] [I] Output binding for boxes with dimensions 1x300x4 is created.
[01/20/2026-06:57:40] [I] Output binding for scores with dimensions 1x300 is created.
[01/20/2026-06:57:40] [I] Starting inference
[01/20/2026-06:57:43] [I] Warmup completed 146 queries over 200 ms
[01/20/2026-06:57:43] [I] Timing trace has 2199 queries over 3.00392 s
[01/20/2026-06:57:43] [I]
[01/20/2026-06:57:43] [I] === Trace details ===
[01/20/2026-06:57:43] [I] Trace averages of 10 runs:
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.35834 ms - Host latency: 1.58537 ms (enqueue 0.439809 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36411 ms - Host latency: 1.59102 ms (enqueue 0.443648 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36554 ms - Host latency: 1.59115 ms (enqueue 0.441405 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36572 ms - Host latency: 1.59173 ms (enqueue 0.450684 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36576 ms - Host latency: 1.59286 ms (enqueue 0.442368 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36451 ms - Host latency: 1.58991 ms (enqueue 0.441461 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36351 ms - Host latency: 1.58781 ms (enqueue 0.449054 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36335 ms - Host latency: 1.58888 ms (enqueue 0.447015 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3633 ms - Host latency: 1.59006 ms (enqueue 0.444852 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3643 ms - Host latency: 1.59003 ms (enqueue 0.444052 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36356 ms - Host latency: 1.58947 ms (enqueue 0.443555 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36461 ms - Host latency: 1.59058 ms (enqueue 0.444547 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36305 ms - Host latency: 1.58903 ms (enqueue 0.443799 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36307 ms - Host latency: 1.58965 ms (enqueue 0.44136 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36357 ms - Host latency: 1.58973 ms (enqueue 0.445804 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36357 ms - Host latency: 1.58859 ms (enqueue 0.439398 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36337 ms - Host latency: 1.58787 ms (enqueue 0.458829 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36386 ms - Host latency: 1.58777 ms (enqueue 0.470523 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36315 ms - Host latency: 1.58919 ms (enqueue 0.432932 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36291 ms - Host latency: 1.58851 ms (enqueue 0.44093 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3643 ms - Host latency: 1.59142 ms (enqueue 0.436078 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36438 ms - Host latency: 1.59136 ms (enqueue 0.441199 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36407 ms - Host latency: 1.58911 ms (enqueue 0.437918 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36408 ms - Host latency: 1.59089 ms (enqueue 0.447729 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36367 ms - Host latency: 1.59078 ms (enqueue 0.435645 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36422 ms - Host latency: 1.59025 ms (enqueue 0.434961 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3635 ms - Host latency: 1.591 ms (enqueue 0.440906 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36263 ms - Host latency: 1.58801 ms (enqueue 0.434052 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36351 ms - Host latency: 1.58947 ms (enqueue 0.462219 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36393 ms - Host latency: 1.59056 ms (enqueue 0.443243 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36391 ms - Host latency: 1.59028 ms (enqueue 0.438641 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36307 ms - Host latency: 1.58887 ms (enqueue 0.445636 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36342 ms - Host latency: 1.59011 ms (enqueue 0.444208 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36346 ms - Host latency: 1.59023 ms (enqueue 0.444598 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36316 ms - Host latency: 1.58997 ms (enqueue 0.445331 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36346 ms - Host latency: 1.59006 ms (enqueue 0.439972 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36443 ms - Host latency: 1.58953 ms (enqueue 0.478406 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36407 ms - Host latency: 1.58942 ms (enqueue 0.472168 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36367 ms - Host latency: 1.59003 ms (enqueue 0.450946 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36269 ms - Host latency: 1.58821 ms (enqueue 0.452094 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3634 ms - Host latency: 1.58981 ms (enqueue 0.436835 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36327 ms - Host latency: 1.58932 ms (enqueue 0.447278 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36411 ms - Host latency: 1.58897 ms (enqueue 0.771808 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36356 ms - Host latency: 1.58953 ms (enqueue 0.460181 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36378 ms - Host latency: 1.59033 ms (enqueue 0.450714 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36313 ms - Host latency: 1.58844 ms (enqueue 0.436713 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36343 ms - Host latency: 1.59 ms (enqueue 0.440601 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36338 ms - Host latency: 1.58936 ms (enqueue 0.438214 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36384 ms - Host latency: 1.58855 ms (enqueue 0.452051 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36315 ms - Host latency: 1.58967 ms (enqueue 0.441931 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36335 ms - Host latency: 1.58993 ms (enqueue 0.439587 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.364 ms - Host latency: 1.59079 ms (enqueue 0.441016 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3636 ms - Host latency: 1.59019 ms (enqueue 0.434497 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36348 ms - Host latency: 1.59001 ms (enqueue 0.440436 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36329 ms - Host latency: 1.58978 ms (enqueue 0.456458 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.363 ms - Host latency: 1.58828 ms (enqueue 0.451471 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36321 ms - Host latency: 1.58891 ms (enqueue 0.444556 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36294 ms - Host latency: 1.58796 ms (enqueue 0.443604 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36411 ms - Host latency: 1.59097 ms (enqueue 0.445068 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3638 ms - Host latency: 1.59024 ms (enqueue 0.443127 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36346 ms - Host latency: 1.58997 ms (enqueue 0.440167 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36389 ms - Host latency: 1.5899 ms (enqueue 0.445129 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36367 ms - Host latency: 1.59065 ms (enqueue 0.446313 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36328 ms - Host latency: 1.58978 ms (enqueue 0.444324 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36334 ms - Host latency: 1.59003 ms (enqueue 0.439246 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36407 ms - Host latency: 1.59083 ms (enqueue 0.438879 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36386 ms - Host latency: 1.59032 ms (enqueue 0.441052 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36317 ms - Host latency: 1.58983 ms (enqueue 0.438794 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36306 ms - Host latency: 1.58959 ms (enqueue 0.438867 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36301 ms - Host latency: 1.58944 ms (enqueue 0.438855 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36361 ms - Host latency: 1.59039 ms (enqueue 0.441309 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36296 ms - Host latency: 1.58978 ms (enqueue 0.44148 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3641 ms - Host latency: 1.59025 ms (enqueue 0.442981 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36499 ms - Host latency: 1.59066 ms (enqueue 0.446216 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36328 ms - Host latency: 1.58986 ms (enqueue 0.44093 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3645 ms - Host latency: 1.59019 ms (enqueue 0.435937 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36423 ms - Host latency: 1.59133 ms (enqueue 0.439709 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36431 ms - Host latency: 1.58954 ms (enqueue 0.442932 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36331 ms - Host latency: 1.58868 ms (enqueue 0.445911 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36373 ms - Host latency: 1.58958 ms (enqueue 0.438513 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3636 ms - Host latency: 1.5901 ms (enqueue 0.435034 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36407 ms - Host latency: 1.59045 ms (enqueue 0.4354 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36396 ms - Host latency: 1.59104 ms (enqueue 0.460461 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36338 ms - Host latency: 1.58949 ms (enqueue 0.454236 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36307 ms - Host latency: 1.58958 ms (enqueue 0.442126 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36404 ms - Host latency: 1.58927 ms (enqueue 0.439563 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36359 ms - Host latency: 1.59045 ms (enqueue 0.442273 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36388 ms - Host latency: 1.59047 ms (enqueue 0.442029 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36395 ms - Host latency: 1.58977 ms (enqueue 0.440356 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36344 ms - Host latency: 1.58864 ms (enqueue 0.445386 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36396 ms - Host latency: 1.58995 ms (enqueue 0.444177 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36354 ms - Host latency: 1.58967 ms (enqueue 0.442737 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36423 ms - Host latency: 1.58945 ms (enqueue 0.440112 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3634 ms - Host latency: 1.59 ms (enqueue 0.437964 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36364 ms - Host latency: 1.59041 ms (enqueue 0.438586 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36342 ms - Host latency: 1.58943 ms (enqueue 0.441638 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36395 ms - Host latency: 1.59001 ms (enqueue 0.438611 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36306 ms - Host latency: 1.58876 ms (enqueue 0.437866 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36351 ms - Host latency: 1.58987 ms (enqueue 0.441199 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36368 ms - Host latency: 1.59054 ms (enqueue 0.443579 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36366 ms - Host latency: 1.58575 ms (enqueue 0.514673 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36473 ms - Host latency: 1.58927 ms (enqueue 0.471899 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36323 ms - Host latency: 1.58971 ms (enqueue 0.443347 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36383 ms - Host latency: 1.59017 ms (enqueue 0.43667 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36367 ms - Host latency: 1.59031 ms (enqueue 0.436035 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36333 ms - Host latency: 1.58923 ms (enqueue 0.445215 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36353 ms - Host latency: 1.58886 ms (enqueue 0.43667 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36255 ms - Host latency: 1.58766 ms (enqueue 0.435669 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36307 ms - Host latency: 1.58983 ms (enqueue 0.440649 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36399 ms - Host latency: 1.58915 ms (enqueue 0.43988 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36412 ms - Host latency: 1.5907 ms (enqueue 0.446997 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3639 ms - Host latency: 1.59098 ms (enqueue 0.44856 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36451 ms - Host latency: 1.59078 ms (enqueue 0.437244 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36326 ms - Host latency: 1.58932 ms (enqueue 0.445728 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36398 ms - Host latency: 1.59054 ms (enqueue 0.439539 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3637 ms - Host latency: 1.59021 ms (enqueue 0.442529 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36335 ms - Host latency: 1.58967 ms (enqueue 0.438489 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36267 ms - Host latency: 1.58927 ms (enqueue 0.439697 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36331 ms - Host latency: 1.58962 ms (enqueue 0.440845 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36307 ms - Host latency: 1.58955 ms (enqueue 0.440918 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36354 ms - Host latency: 1.59021 ms (enqueue 0.434119 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36375 ms - Host latency: 1.58995 ms (enqueue 0.448096 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36323 ms - Host latency: 1.59006 ms (enqueue 0.442773 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3635 ms - Host latency: 1.58994 ms (enqueue 0.443115 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36307 ms - Host latency: 1.58976 ms (enqueue 0.442371 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36318 ms - Host latency: 1.58975 ms (enqueue 0.439624 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36328 ms - Host latency: 1.59004 ms (enqueue 0.44209 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36339 ms - Host latency: 1.58943 ms (enqueue 0.447375 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36345 ms - Host latency: 1.5887 ms (enqueue 0.446582 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3639 ms - Host latency: 1.59076 ms (enqueue 0.447876 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36449 ms - Host latency: 1.59119 ms (enqueue 0.446411 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36377 ms - Host latency: 1.59092 ms (enqueue 0.443127 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36384 ms - Host latency: 1.58966 ms (enqueue 0.437134 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36396 ms - Host latency: 1.59044 ms (enqueue 0.439563 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36298 ms - Host latency: 1.58964 ms (enqueue 0.436792 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36285 ms - Host latency: 1.58849 ms (enqueue 0.454407 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3644 ms - Host latency: 1.59087 ms (enqueue 0.446143 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36292 ms - Host latency: 1.58948 ms (enqueue 0.439722 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36367 ms - Host latency: 1.59043 ms (enqueue 0.439331 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36282 ms - Host latency: 1.58804 ms (enqueue 0.463892 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36533 ms - Host latency: 1.59038 ms (enqueue 0.445728 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36367 ms - Host latency: 1.58921 ms (enqueue 0.451514 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36377 ms - Host latency: 1.59084 ms (enqueue 0.444482 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36379 ms - Host latency: 1.58928 ms (enqueue 0.471216 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36372 ms - Host latency: 1.58584 ms (enqueue 1.03264 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36384 ms - Host latency: 1.59001 ms (enqueue 0.439404 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36367 ms - Host latency: 1.58979 ms (enqueue 0.432397 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36406 ms - Host latency: 1.59026 ms (enqueue 0.438867 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36399 ms - Host latency: 1.59099 ms (enqueue 0.441235 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36431 ms - Host latency: 1.58938 ms (enqueue 0.439136 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3637 ms - Host latency: 1.58992 ms (enqueue 0.435034 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36431 ms - Host latency: 1.59045 ms (enqueue 0.446167 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36372 ms - Host latency: 1.5905 ms (enqueue 0.439868 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36387 ms - Host latency: 1.59048 ms (enqueue 0.437817 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36426 ms - Host latency: 1.59072 ms (enqueue 0.437793 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36399 ms - Host latency: 1.5906 ms (enqueue 0.436206 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36384 ms - Host latency: 1.5897 ms (enqueue 0.457056 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36401 ms - Host latency: 1.58594 ms (enqueue 0.492017 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36375 ms - Host latency: 1.58823 ms (enqueue 0.459448 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36404 ms - Host latency: 1.5905 ms (enqueue 0.438794 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36399 ms - Host latency: 1.59077 ms (enqueue 0.441895 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36423 ms - Host latency: 1.59053 ms (enqueue 0.44104 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36379 ms - Host latency: 1.59041 ms (enqueue 0.439673 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36335 ms - Host latency: 1.58899 ms (enqueue 0.459937 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36431 ms - Host latency: 1.5896 ms (enqueue 0.444531 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36477 ms - Host latency: 1.59155 ms (enqueue 0.439697 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3637 ms - Host latency: 1.58909 ms (enqueue 0.438525 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3637 ms - Host latency: 1.58887 ms (enqueue 0.438477 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36343 ms - Host latency: 1.58784 ms (enqueue 0.444067 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36494 ms - Host latency: 1.59221 ms (enqueue 0.439429 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36348 ms - Host latency: 1.58997 ms (enqueue 0.446143 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36438 ms - Host latency: 1.59109 ms (enqueue 0.44978 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36401 ms - Host latency: 1.59106 ms (enqueue 0.446118 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36528 ms - Host latency: 1.59194 ms (enqueue 0.448413 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36492 ms - Host latency: 1.59028 ms (enqueue 0.45022 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36418 ms - Host latency: 1.59033 ms (enqueue 0.449512 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36418 ms - Host latency: 1.58845 ms (enqueue 0.497168 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36526 ms - Host latency: 1.59045 ms (enqueue 0.459277 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36362 ms - Host latency: 1.58826 ms (enqueue 0.460718 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3645 ms - Host latency: 1.59138 ms (enqueue 0.449902 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36389 ms - Host latency: 1.58931 ms (enqueue 0.452368 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3636 ms - Host latency: 1.58979 ms (enqueue 0.448291 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36406 ms - Host latency: 1.59009 ms (enqueue 0.449634 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3637 ms - Host latency: 1.58862 ms (enqueue 0.46543 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36409 ms - Host latency: 1.59097 ms (enqueue 0.450562 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36399 ms - Host latency: 1.5906 ms (enqueue 0.452661 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36404 ms - Host latency: 1.59038 ms (enqueue 0.44873 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36404 ms - Host latency: 1.59102 ms (enqueue 0.445532 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36331 ms - Host latency: 1.59001 ms (enqueue 0.447095 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3644 ms - Host latency: 1.59131 ms (enqueue 0.448535 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3634 ms - Host latency: 1.58972 ms (enqueue 0.448926 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36367 ms - Host latency: 1.58906 ms (enqueue 0.449585 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36404 ms - Host latency: 1.59087 ms (enqueue 0.441382 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36362 ms - Host latency: 1.59041 ms (enqueue 0.437012 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36333 ms - Host latency: 1.59031 ms (enqueue 0.441211 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36487 ms - Host latency: 1.59082 ms (enqueue 0.438501 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36321 ms - Host latency: 1.59004 ms (enqueue 0.435645 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36292 ms - Host latency: 1.58921 ms (enqueue 0.438892 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36433 ms - Host latency: 1.59072 ms (enqueue 0.434082 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36384 ms - Host latency: 1.58962 ms (enqueue 0.438867 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36387 ms - Host latency: 1.59011 ms (enqueue 0.4448 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36426 ms - Host latency: 1.59116 ms (enqueue 0.438916 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36318 ms - Host latency: 1.58914 ms (enqueue 0.440454 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36372 ms - Host latency: 1.59043 ms (enqueue 0.43689 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36462 ms - Host latency: 1.59084 ms (enqueue 0.449536 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36365 ms - Host latency: 1.5905 ms (enqueue 0.443726 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36409 ms - Host latency: 1.59077 ms (enqueue 0.439819 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3635 ms - Host latency: 1.58977 ms (enqueue 0.448389 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3634 ms - Host latency: 1.59011 ms (enqueue 0.440186 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36335 ms - Host latency: 1.58994 ms (enqueue 0.436816 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36379 ms - Host latency: 1.59033 ms (enqueue 0.43562 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36387 ms - Host latency: 1.59082 ms (enqueue 0.438452 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3634 ms - Host latency: 1.58987 ms (enqueue 0.439624 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3637 ms - Host latency: 1.59036 ms (enqueue 0.44209 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36389 ms - Host latency: 1.58984 ms (enqueue 0.443994 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36423 ms - Host latency: 1.59028 ms (enqueue 0.436646 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.3646 ms - Host latency: 1.59121 ms (enqueue 0.435034 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36331 ms - Host latency: 1.58965 ms (enqueue 0.443726 ms)
[01/20/2026-06:57:43] [I] Average on 10 runs - GPU latency: 1.36426 ms - Host latency: 1.58589 ms (enqueue 0.503564 ms)
[01/20/2026-06:57:43] [I]
[01/20/2026-06:57:43] [I] === Performance summary ===
[01/20/2026-06:57:43] [I] Throughput: 732.043 qps
[01/20/2026-06:57:43] [I] Latency: min = 1.57953 ms, max = 1.59692 ms, mean = 1.58984 ms, median = 1.59009 ms, percentile(90%) = 1.59253 ms, percentile(95%) = 1.59326 ms, percentile(99%) = 1.59448 ms
[01/20/2026-06:57:43] [I] Enqueue Time: min = 0.426849 ms, max = 1.68213 ms, mean = 0.449191 ms, median = 0.439941 ms, percentile(90%) = 0.460449 ms, percentile(95%) = 0.486328 ms, percentile(99%) = 0.567993 ms
[01/20/2026-06:57:43] [I] H2D Latency: min = 0.213867 ms, max = 0.227295 ms, mean = 0.22142 ms, median = 0.221924 ms, percentile(90%) = 0.222534 ms, percentile(95%) = 0.222717 ms, percentile(99%) = 0.223145 ms
[01/20/2026-06:57:43] [I] GPU Compute Time: min = 1.3568 ms, max = 1.36914 ms, mean = 1.36374 ms, median = 1.36389 ms, percentile(90%) = 1.36597 ms, percentile(95%) = 1.36621 ms, percentile(99%) = 1.36792 ms
[01/20/2026-06:57:43] [I] D2H Latency: min = 0.00415039 ms, max = 0.00634766 ms, mean = 0.00469234 ms, median = 0.0045166 ms, percentile(90%) = 0.00561523 ms, percentile(95%) = 0.00585938 ms, percentile(99%) = 0.00610352 ms
[01/20/2026-06:57:43] [I] Total Host Walltime: 3.00392 s
[01/20/2026-06:57:43] [I] Total GPU Compute Time: 2.99885 s
[01/20/2026-06:57:43] [I] Explanations of the performance metrics are printed in the verbose logs.
[01/20/2026-06:57:43] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v101401] [b48] # trtexec --onnx=checkpoints/deimv2_dinov3_s_coco.onnx --saveEngine=checkpoints/deimv2_dinov3_s_coco.engine --fp16 --optShapes=images:1x3x640x640,orig_target_sizes:1x2 --memPoolSize=workspace:4096 --builderOptimizationLevel=3