DEIMv2 / deimv2_dinov3_x_coco.log
carpedm20's picture
Upload folder using huggingface_hub
ca956b1 verified
&&&& RUNNING TensorRT.trtexec [TensorRT v101401] [b48] # trtexec --onnx=checkpoints/deimv2_dinov3_x_coco.onnx --saveEngine=checkpoints/deimv2_dinov3_x_coco.engine --fp16 --optShapes=images:1x3x640x640,orig_target_sizes:1x2 --memPoolSize=workspace:4096 --builderOptimizationLevel=3
[01/20/2026-07:02:51] [W] optShapes is being broadcasted to minShapes for tensor orig_target_sizes
[01/20/2026-07:02:51] [W] optShapes is being broadcasted to maxShapes for tensor orig_target_sizes
[01/20/2026-07:02:51] [W] optShapes is being broadcasted to minShapes for tensor images
[01/20/2026-07:02:51] [W] optShapes is being broadcasted to maxShapes for tensor images
[01/20/2026-07:02:51] [W] Weakly-typed networks have been deprecated in TensorRT. You can use the AutoCast tool (https://nvidia.github.io/TensorRT-Model-Optimizer/guides/8_autocast.html) to convert the network to be strongly typed.
[01/20/2026-07:02:51] [I] === Model Options ===
[01/20/2026-07:02:51] [I] Format: ONNX
[01/20/2026-07:02:51] [I] Model: checkpoints/deimv2_dinov3_x_coco.onnx
[01/20/2026-07:02:51] [I] Output:
[01/20/2026-07:02:51] [I] === Build Options ===
[01/20/2026-07:02:51] [I] Memory Pools: workspace: 4096 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/20/2026-07:02:51] [I] avgTiming: 8
[01/20/2026-07:02:51] [I] Precision: FP32+FP16
[01/20/2026-07:02:51] [I] LayerPrecisions:
[01/20/2026-07:02:51] [I] Layer Device Types:
[01/20/2026-07:02:51] [I] Decomposable Attentions:
[01/20/2026-07:02:51] [I] Calibration:
[01/20/2026-07:02:51] [I] Refit: Disabled
[01/20/2026-07:02:51] [I] Strip weights: Disabled
[01/20/2026-07:02:51] [I] Version Compatible: Disabled
[01/20/2026-07:02:51] [I] ONNX Plugin InstanceNorm: Disabled
[01/20/2026-07:02:51] [I] ONNX kENABLE_UINT8_AND_ASYMMETRIC_QUANTIZATION_DLA flag: Disabled
[01/20/2026-07:02:51] [I] TensorRT runtime: full
[01/20/2026-07:02:51] [I] Lean DLL Path:
[01/20/2026-07:02:51] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/20/2026-07:02:51] [I] Exclude Lean Runtime: Disabled
[01/20/2026-07:02:51] [I] Sparsity: Disabled
[01/20/2026-07:02:51] [I] Safe mode: Disabled
[01/20/2026-07:02:51] [I] Build DLA standalone loadable: Disabled
[01/20/2026-07:02:51] [I] Allow GPU fallback for DLA: Disabled
[01/20/2026-07:02:51] [I] DirectIO mode: Disabled
[01/20/2026-07:02:51] [I] Restricted mode: Disabled
[01/20/2026-07:02:51] [I] Skip inference: Disabled
[01/20/2026-07:02:51] [I] Save engine: checkpoints/deimv2_dinov3_x_coco.engine
[01/20/2026-07:02:51] [I] Load engine:
[01/20/2026-07:02:51] [I] Profiling verbosity: 0
[01/20/2026-07:02:51] [I] Tactic sources: Using default tactic sources
[01/20/2026-07:02:51] [I] timingCacheMode: local
[01/20/2026-07:02:51] [I] timingCacheFile:
[01/20/2026-07:02:51] [I] Enable Compilation Cache: Enabled
[01/20/2026-07:02:51] [I] Enable Monitor Memory: Disabled
[01/20/2026-07:02:51] [I] errorOnTimingCacheMiss: Disabled
[01/20/2026-07:02:51] [I] Preview Features: Use default preview flags.
[01/20/2026-07:02:51] [I] MaxAuxStreams: -1
[01/20/2026-07:02:51] [I] BuilderOptimizationLevel: 3
[01/20/2026-07:02:51] [I] MaxTactics: -1
[01/20/2026-07:02:51] [I] Calibration Profile Index: 0
[01/20/2026-07:02:51] [I] Weight Streaming: Disabled
[01/20/2026-07:02:51] [I] Runtime Platform: Same As Build
[01/20/2026-07:02:51] [I] Debug Tensors:
[01/20/2026-07:02:51] [I] Distributive Independence: Disabled
[01/20/2026-07:02:51] [I] Mark Unfused Tensors As Debug Tensors: Disabled
[01/20/2026-07:02:51] [I] Input(s)s format: fp32:CHW
[01/20/2026-07:02:51] [I] Output(s)s format: fp32:CHW
[01/20/2026-07:02:51] [I] Input build shape (profile 0): images=1x3x640x640+1x3x640x640+1x3x640x640
[01/20/2026-07:02:51] [I] Input build shape (profile 0): orig_target_sizes=1x2+1x2+1x2
[01/20/2026-07:02:51] [I] Input calibration shapes: model
[01/20/2026-07:02:51] [I] === System Options ===
[01/20/2026-07:02:51] [I] Device: 0
[01/20/2026-07:02:51] [I] DLACore:
[01/20/2026-07:02:51] [I] Plugins:
[01/20/2026-07:02:51] [I] setPluginsToSerialize:
[01/20/2026-07:02:51] [I] dynamicPlugins:
[01/20/2026-07:02:51] [I] ignoreParsedPluginLibs: 0
[01/20/2026-07:02:51] [I]
[01/20/2026-07:02:51] [I] === Inference Options ===
[01/20/2026-07:02:51] [I] Batch: Explicit
[01/20/2026-07:02:51] [I] Input inference shape : orig_target_sizes=1x2
[01/20/2026-07:02:51] [I] Input inference shape : images=1x3x640x640
[01/20/2026-07:02:51] [I] Iterations: 10
[01/20/2026-07:02:51] [I] Duration: 3s (+ 200ms warm up)
[01/20/2026-07:02:51] [I] Sleep time: 0ms
[01/20/2026-07:02:51] [I] Idle time: 0ms
[01/20/2026-07:02:51] [I] Inference Streams: 1
[01/20/2026-07:02:51] [I] ExposeDMA: Disabled
[01/20/2026-07:02:51] [I] Data transfers: Enabled
[01/20/2026-07:02:51] [I] Spin-wait: Disabled
[01/20/2026-07:02:51] [I] Multithreading: Disabled
[01/20/2026-07:02:51] [I] CUDA Graph: Disabled
[01/20/2026-07:02:51] [I] Separate profiling: Disabled
[01/20/2026-07:02:51] [I] Time Deserialize: Disabled
[01/20/2026-07:02:51] [I] Time Refit: Disabled
[01/20/2026-07:02:51] [I] NVTX verbosity: 0
[01/20/2026-07:02:51] [I] Persistent Cache Ratio: 0
[01/20/2026-07:02:51] [I] Optimization Profile Index: 0
[01/20/2026-07:02:51] [I] Weight Streaming Budget: 100.000000%
[01/20/2026-07:02:51] [I] Inputs:
[01/20/2026-07:02:51] [I] Debug Tensor Save Destinations:
[01/20/2026-07:02:51] [I] Dump All Debug Tensor in Formats:
[01/20/2026-07:02:51] [I] === Reporting Options ===
[01/20/2026-07:02:51] [I] Verbose: Disabled
[01/20/2026-07:02:51] [I] Averages: 10 inferences
[01/20/2026-07:02:51] [I] Percentiles: 90,95,99
[01/20/2026-07:02:51] [I] Dump refittable layers:Disabled
[01/20/2026-07:02:51] [I] Dump output: Disabled
[01/20/2026-07:02:51] [I] Profile: Disabled
[01/20/2026-07:02:51] [I] Export timing to JSON file:
[01/20/2026-07:02:51] [I] Export output to JSON file:
[01/20/2026-07:02:51] [I] Export profile to JSON file:
[01/20/2026-07:02:51] [I]
[01/20/2026-07:02:51] [I] === Device Information ===
[01/20/2026-07:02:51] [I] Available Devices:
[01/20/2026-07:02:51] [I] Device 0: "NVIDIA GeForce RTX 4090" UUID: GPU-55c23db9-433c-0d6c-46e7-9387266e5ddb
[01/20/2026-07:02:51] [I] Selected Device: NVIDIA GeForce RTX 4090
[01/20/2026-07:02:51] [I] Selected Device ID: 0
[01/20/2026-07:02:51] [I] Selected Device UUID: GPU-55c23db9-433c-0d6c-46e7-9387266e5ddb
[01/20/2026-07:02:51] [I] Compute Capability: 8.9
[01/20/2026-07:02:51] [I] SMs: 128
[01/20/2026-07:02:51] [I] Device Global Memory: 24071 MiB
[01/20/2026-07:02:51] [I] Shared Memory per SM: 100 KiB
[01/20/2026-07:02:51] [I] Memory Bus Width: 384 bits (ECC disabled)
[01/20/2026-07:02:51] [I] Application Compute Clock Rate: 2.52 GHz
[01/20/2026-07:02:51] [I] Application Memory Clock Rate: 10.501 GHz
[01/20/2026-07:02:51] [I]
[01/20/2026-07:02:51] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/20/2026-07:02:51] [I]
[01/20/2026-07:02:51] [I] TensorRT version: 10.14.1
[01/20/2026-07:02:51] [I] Loading standard plugins
[01/20/2026-07:02:51] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 29, GPU 10549 (MiB)
[01/20/2026-07:02:51] [I] Start parsing network model.
[01/20/2026-07:02:52] [I] [TRT] ----------------------------------------------------------------
[01/20/2026-07:02:52] [I] [TRT] Input filename: checkpoints/deimv2_dinov3_x_coco.onnx
[01/20/2026-07:02:52] [I] [TRT] ONNX IR version: 0.0.8
[01/20/2026-07:02:52] [I] [TRT] Opset version: 17
[01/20/2026-07:02:52] [I] [TRT] Producer name: pytorch
[01/20/2026-07:02:52] [I] [TRT] Producer version: 2.10.0
[01/20/2026-07:02:52] [I] [TRT] Domain:
[01/20/2026-07:02:52] [I] [TRT] Model version: 0
[01/20/2026-07:02:52] [I] [TRT] Doc string:
[01/20/2026-07:02:52] [I] [TRT] ----------------------------------------------------------------
[01/20/2026-07:02:52] [W] [TRT] ModelImporter.cpp:661: Make sure input orig_target_sizes has Int64 binding.
[01/20/2026-07:02:52] [W] [TRT] ModelImporter.cpp:908: Make sure output labels has Int64 binding.
[01/20/2026-07:02:52] [I] Finished parsing network model. Parse time: 0.237921
[01/20/2026-07:02:52] [I] Set shape of input tensor images for optimization profile 0 to: MIN=1x3x640x640 OPT=1x3x640x640 MAX=1x3x640x640
[01/20/2026-07:02:52] [I] Set shape of input tensor orig_target_sizes for optimization profile 0 to: MIN=1x2 OPT=1x2 MAX=1x2
[01/20/2026-07:02:52] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +203, GPU +4, now: CPU 735, GPU 10553 (MiB)
[01/20/2026-07:02:52] [W] [TRT] Detected layernorm nodes in FP16.
[01/20/2026-07:02:52] [W] [TRT] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.
[01/20/2026-07:02:52] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[01/20/2026-07:03:33] [I] [TRT] Compiler backend is used during engine build.
[01/20/2026-07:05:25] [I] [TRT] Detected 2 inputs and 3 output network tensors.
[01/20/2026-07:05:26] [I] [TRT] Total Host Persistent Memory: 376608 bytes
[01/20/2026-07:05:26] [I] [TRT] Total Device Persistent Memory: 1024 bytes
[01/20/2026-07:05:26] [I] [TRT] Max Scratch Memory: 20131840 bytes
[01/20/2026-07:05:26] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 107 steps to complete.
[01/20/2026-07:05:26] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 4.29572ms to assign 11 blocks to 107 nodes requiring 45025792 bytes.
[01/20/2026-07:05:26] [I] [TRT] Total Activation Memory: 45025792 bytes
[01/20/2026-07:05:26] [I] [TRT] Total Weights Memory: 100957696 bytes
[01/20/2026-07:05:26] [I] [TRT] Compiler backend is used during engine execution.
[01/20/2026-07:05:26] [I] [TRT] Engine generation completed in 154.472 seconds.
[01/20/2026-07:05:26] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 54 MiB, GPU 255 MiB
[01/20/2026-07:05:27] [I] Created engine with size: 100.555 MiB
[01/20/2026-07:05:27] [I] Engine built in 154.898 sec.
[01/20/2026-07:05:27] [I] [TRT] Loaded engine size: 100 MiB
[01/20/2026-07:05:27] [I] Engine deserialized in 0.0311196 sec.
[01/20/2026-07:05:27] [I] [TRT] [MS] Running engine with multi stream info
[01/20/2026-07:05:27] [I] [TRT] [MS] Number of aux streams is 3
[01/20/2026-07:05:27] [I] [TRT] [MS] Number of total worker streams is 4
[01/20/2026-07:05:27] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[01/20/2026-07:05:27] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +43, now: CPU 0, GPU 139 (MiB)
[01/20/2026-07:05:27] [I] Setting persistentCacheLimit to 0 bytes.
[01/20/2026-07:05:27] [I] Created execution context with device memory size: 42.9399 MiB
[01/20/2026-07:05:27] [I] Using random values for input images
[01/20/2026-07:05:27] [I] Input binding for images with dimensions 1x3x640x640 is created.
[01/20/2026-07:05:27] [I] Using random values for input orig_target_sizes
[01/20/2026-07:05:27] [I] Input binding for orig_target_sizes with dimensions 1x2 is created.
[01/20/2026-07:05:27] [I] Output binding for labels with dimensions 1x300 is created.
[01/20/2026-07:05:27] [I] Output binding for boxes with dimensions 1x300x4 is created.
[01/20/2026-07:05:27] [I] Output binding for scores with dimensions 1x300 is created.
[01/20/2026-07:05:27] [I] Starting inference
[01/20/2026-07:05:30] [I] Warmup completed 66 queries over 200 ms
[01/20/2026-07:05:30] [I] Timing trace has 1059 queries over 3.00739 s
[01/20/2026-07:05:30] [I]
[01/20/2026-07:05:30] [I] === Trace details ===
[01/20/2026-07:05:30] [I] Trace averages of 10 runs:
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83336 ms - Host latency: 3.07233 ms (enqueue 0.701578 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83168 ms - Host latency: 3.07287 ms (enqueue 0.6114 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83374 ms - Host latency: 3.07555 ms (enqueue 0.580649 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83187 ms - Host latency: 3.07226 ms (enqueue 0.58468 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83537 ms - Host latency: 3.07763 ms (enqueue 0.583014 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83669 ms - Host latency: 3.07737 ms (enqueue 0.57963 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83434 ms - Host latency: 3.07546 ms (enqueue 0.565479 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83679 ms - Host latency: 3.07714 ms (enqueue 0.915753 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83261 ms - Host latency: 3.07397 ms (enqueue 0.606302 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83498 ms - Host latency: 3.076 ms (enqueue 0.579776 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83251 ms - Host latency: 3.07282 ms (enqueue 0.568085 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83575 ms - Host latency: 3.07682 ms (enqueue 0.559 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83212 ms - Host latency: 3.07352 ms (enqueue 0.566913 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83415 ms - Host latency: 3.07585 ms (enqueue 0.554803 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83711 ms - Host latency: 3.07679 ms (enqueue 0.602917 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83541 ms - Host latency: 3.07644 ms (enqueue 0.566254 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83577 ms - Host latency: 3.07631 ms (enqueue 0.573004 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83618 ms - Host latency: 3.0764 ms (enqueue 0.565643 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83678 ms - Host latency: 3.07716 ms (enqueue 0.562018 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83444 ms - Host latency: 3.0758 ms (enqueue 0.561792 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83307 ms - Host latency: 3.07421 ms (enqueue 0.552844 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8335 ms - Host latency: 3.07266 ms (enqueue 0.841644 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83771 ms - Host latency: 3.07906 ms (enqueue 0.563574 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83376 ms - Host latency: 3.0748 ms (enqueue 0.557874 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83472 ms - Host latency: 3.07563 ms (enqueue 0.555933 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83764 ms - Host latency: 3.0792 ms (enqueue 0.560291 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83876 ms - Host latency: 3.08054 ms (enqueue 0.561816 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83506 ms - Host latency: 3.07485 ms (enqueue 0.546808 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83537 ms - Host latency: 3.07655 ms (enqueue 0.562311 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83401 ms - Host latency: 3.0748 ms (enqueue 0.553601 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83458 ms - Host latency: 3.07444 ms (enqueue 0.554346 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83433 ms - Host latency: 3.07482 ms (enqueue 0.549353 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8351 ms - Host latency: 3.07762 ms (enqueue 0.560791 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83547 ms - Host latency: 3.07562 ms (enqueue 0.546594 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8327 ms - Host latency: 3.07433 ms (enqueue 0.547229 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83512 ms - Host latency: 3.07487 ms (enqueue 0.569556 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.835 ms - Host latency: 3.07515 ms (enqueue 0.549707 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83547 ms - Host latency: 3.07701 ms (enqueue 0.586328 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.837 ms - Host latency: 3.07726 ms (enqueue 0.626392 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83579 ms - Host latency: 3.07651 ms (enqueue 0.560889 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83383 ms - Host latency: 3.07456 ms (enqueue 0.550244 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83232 ms - Host latency: 3.07349 ms (enqueue 0.55011 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83391 ms - Host latency: 3.0741 ms (enqueue 0.55332 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83215 ms - Host latency: 3.07223 ms (enqueue 0.570496 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83461 ms - Host latency: 3.07531 ms (enqueue 0.611377 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83903 ms - Host latency: 3.08014 ms (enqueue 0.648914 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83828 ms - Host latency: 3.07887 ms (enqueue 0.61167 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83721 ms - Host latency: 3.07781 ms (enqueue 0.598376 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83433 ms - Host latency: 3.07522 ms (enqueue 0.574182 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83784 ms - Host latency: 3.07988 ms (enqueue 0.57605 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8355 ms - Host latency: 3.07539 ms (enqueue 0.767432 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8395 ms - Host latency: 3.08134 ms (enqueue 0.605664 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83661 ms - Host latency: 3.0766 ms (enqueue 0.590234 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.836 ms - Host latency: 3.07712 ms (enqueue 0.579431 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8364 ms - Host latency: 3.07605 ms (enqueue 0.597498 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83751 ms - Host latency: 3.07806 ms (enqueue 0.824365 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.84104 ms - Host latency: 3.08406 ms (enqueue 0.769312 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83573 ms - Host latency: 3.07745 ms (enqueue 0.704712 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83588 ms - Host latency: 3.07727 ms (enqueue 0.593005 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83518 ms - Host latency: 3.07616 ms (enqueue 0.577429 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83756 ms - Host latency: 3.07815 ms (enqueue 0.736743 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8374 ms - Host latency: 3.07781 ms (enqueue 0.592505 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83972 ms - Host latency: 3.08015 ms (enqueue 0.628516 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.84054 ms - Host latency: 3.08188 ms (enqueue 0.596912 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83979 ms - Host latency: 3.08151 ms (enqueue 0.584668 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83789 ms - Host latency: 3.07964 ms (enqueue 0.565247 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83699 ms - Host latency: 3.07812 ms (enqueue 0.556909 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.84001 ms - Host latency: 3.08103 ms (enqueue 0.646191 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83679 ms - Host latency: 3.07761 ms (enqueue 0.59978 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8342 ms - Host latency: 3.07512 ms (enqueue 0.548999 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83394 ms - Host latency: 3.07529 ms (enqueue 0.549609 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83601 ms - Host latency: 3.07656 ms (enqueue 0.560278 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83389 ms - Host latency: 3.07397 ms (enqueue 0.571289 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83408 ms - Host latency: 3.07483 ms (enqueue 0.553686 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83159 ms - Host latency: 3.07146 ms (enqueue 0.548755 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83374 ms - Host latency: 3.07417 ms (enqueue 0.547095 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83606 ms - Host latency: 3.07732 ms (enqueue 0.540796 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83538 ms - Host latency: 3.07573 ms (enqueue 0.544482 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83579 ms - Host latency: 3.07656 ms (enqueue 0.556934 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83499 ms - Host latency: 3.07476 ms (enqueue 0.566772 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83511 ms - Host latency: 3.0771 ms (enqueue 0.565356 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83394 ms - Host latency: 3.07466 ms (enqueue 0.549658 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83574 ms - Host latency: 3.07695 ms (enqueue 0.558032 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83577 ms - Host latency: 3.0759 ms (enqueue 0.541553 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83303 ms - Host latency: 3.07288 ms (enqueue 0.549805 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8356 ms - Host latency: 3.07651 ms (enqueue 0.564941 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8386 ms - Host latency: 3.07976 ms (enqueue 0.546021 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83457 ms - Host latency: 3.07576 ms (enqueue 0.545605 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83643 ms - Host latency: 3.07664 ms (enqueue 0.535596 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83413 ms - Host latency: 3.07429 ms (enqueue 0.540967 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83479 ms - Host latency: 3.07485 ms (enqueue 0.557007 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83423 ms - Host latency: 3.07478 ms (enqueue 0.538745 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83469 ms - Host latency: 3.07603 ms (enqueue 0.535815 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8321 ms - Host latency: 3.07302 ms (enqueue 0.54021 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83604 ms - Host latency: 3.07712 ms (enqueue 0.540405 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83286 ms - Host latency: 3.07415 ms (enqueue 0.53977 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83679 ms - Host latency: 3.07815 ms (enqueue 0.541943 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83728 ms - Host latency: 3.07744 ms (enqueue 0.585742 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83982 ms - Host latency: 3.08137 ms (enqueue 0.593237 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83772 ms - Host latency: 3.07883 ms (enqueue 0.568213 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83499 ms - Host latency: 3.07654 ms (enqueue 0.556763 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83828 ms - Host latency: 3.07947 ms (enqueue 0.555713 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83884 ms - Host latency: 3.079 ms (enqueue 0.78623 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83235 ms - Host latency: 3.07083 ms (enqueue 1.00188 ms)
[01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83823 ms - Host latency: 3.08057 ms (enqueue 0.596899 ms)
[01/20/2026-07:05:30] [I]
[01/20/2026-07:05:30] [I] === Performance summary ===
[01/20/2026-07:05:30] [I] Throughput: 352.132 qps
[01/20/2026-07:05:30] [I] Latency: min = 2.91431 ms, max = 3.09717 ms, mean = 3.07624 ms, median = 3.07642 ms, percentile(90%) = 3.08423 ms, percentile(95%) = 3.08667 ms, percentile(99%) = 3.09033 ms
[01/20/2026-07:05:30] [I] Enqueue Time: min = 0.532471 ms, max = 1.69238 ms, mean = 0.590915 ms, median = 0.56012 ms, percentile(90%) = 0.625488 ms, percentile(95%) = 0.673096 ms, percentile(99%) = 1.30176 ms
[01/20/2026-07:05:30] [I] H2D Latency: min = 0.226257 ms, max = 0.238525 ms, mean = 0.233766 ms, median = 0.233887 ms, percentile(90%) = 0.235413 ms, percentile(95%) = 0.235901 ms, percentile(99%) = 0.237183 ms
[01/20/2026-07:05:30] [I] GPU Compute Time: min = 2.67676 ms, max = 2.85498 ms, mean = 2.83542 ms, median = 2.83545 ms, percentile(90%) = 2.84277 ms, percentile(95%) = 2.84473 ms, percentile(99%) = 2.8479 ms
[01/20/2026-07:05:30] [I] D2H Latency: min = 0.00415039 ms, max = 0.00939941 ms, mean = 0.00705897 ms, median = 0.00744629 ms, percentile(90%) = 0.00878906 ms, percentile(95%) = 0.0090332 ms, percentile(99%) = 0.00927734 ms
[01/20/2026-07:05:30] [I] Total Host Walltime: 3.00739 s
[01/20/2026-07:05:30] [I] Total GPU Compute Time: 3.00271 s
[01/20/2026-07:05:30] [I] Explanations of the performance metrics are printed in the verbose logs.
[01/20/2026-07:05:30] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v101401] [b48] # trtexec --onnx=checkpoints/deimv2_dinov3_x_coco.onnx --saveEngine=checkpoints/deimv2_dinov3_x_coco.engine --fp16 --optShapes=images:1x3x640x640,orig_target_sizes:1x2 --memPoolSize=workspace:4096 --builderOptimizationLevel=3