| &&&& RUNNING TensorRT.trtexec [TensorRT v101401] [b48] # trtexec --onnx=checkpoints/deimv2_dinov3_x_coco.onnx --saveEngine=checkpoints/deimv2_dinov3_x_coco.engine --fp16 --optShapes=images:1x3x640x640,orig_target_sizes:1x2 --memPoolSize=workspace:4096 --builderOptimizationLevel=3 |
| [01/20/2026-07:02:51] [W] optShapes is being broadcasted to minShapes for tensor orig_target_sizes |
| [01/20/2026-07:02:51] [W] optShapes is being broadcasted to maxShapes for tensor orig_target_sizes |
| [01/20/2026-07:02:51] [W] optShapes is being broadcasted to minShapes for tensor images |
| [01/20/2026-07:02:51] [W] optShapes is being broadcasted to maxShapes for tensor images |
| [01/20/2026-07:02:51] [W] Weakly-typed networks have been deprecated in TensorRT. You can use the AutoCast tool (https: |
| [01/20/2026-07:02:51] [I] === Model Options === |
| [01/20/2026-07:02:51] [I] Format: ONNX |
| [01/20/2026-07:02:51] [I] Model: checkpoints/deimv2_dinov3_x_coco.onnx |
| [01/20/2026-07:02:51] [I] Output: |
| [01/20/2026-07:02:51] [I] === Build Options === |
| [01/20/2026-07:02:51] [I] Memory Pools: workspace: 4096 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default |
| [01/20/2026-07:02:51] [I] avgTiming: 8 |
| [01/20/2026-07:02:51] [I] Precision: FP32+FP16 |
| [01/20/2026-07:02:51] [I] LayerPrecisions: |
| [01/20/2026-07:02:51] [I] Layer Device Types: |
| [01/20/2026-07:02:51] [I] Decomposable Attentions: |
| [01/20/2026-07:02:51] [I] Calibration: |
| [01/20/2026-07:02:51] [I] Refit: Disabled |
| [01/20/2026-07:02:51] [I] Strip weights: Disabled |
| [01/20/2026-07:02:51] [I] Version Compatible: Disabled |
| [01/20/2026-07:02:51] [I] ONNX Plugin InstanceNorm: Disabled |
| [01/20/2026-07:02:51] [I] ONNX kENABLE_UINT8_AND_ASYMMETRIC_QUANTIZATION_DLA flag: Disabled |
| [01/20/2026-07:02:51] [I] TensorRT runtime: full |
| [01/20/2026-07:02:51] [I] Lean DLL Path: |
| [01/20/2026-07:02:51] [I] Tempfile Controls: { in_memory: allow, temporary: allow } |
| [01/20/2026-07:02:51] [I] Exclude Lean Runtime: Disabled |
| [01/20/2026-07:02:51] [I] Sparsity: Disabled |
| [01/20/2026-07:02:51] [I] Safe mode: Disabled |
| [01/20/2026-07:02:51] [I] Build DLA standalone loadable: Disabled |
| [01/20/2026-07:02:51] [I] Allow GPU fallback for DLA: Disabled |
| [01/20/2026-07:02:51] [I] DirectIO mode: Disabled |
| [01/20/2026-07:02:51] [I] Restricted mode: Disabled |
| [01/20/2026-07:02:51] [I] Skip inference: Disabled |
| [01/20/2026-07:02:51] [I] Save engine: checkpoints/deimv2_dinov3_x_coco.engine |
| [01/20/2026-07:02:51] [I] Load engine: |
| [01/20/2026-07:02:51] [I] Profiling verbosity: 0 |
| [01/20/2026-07:02:51] [I] Tactic sources: Using default tactic sources |
| [01/20/2026-07:02:51] [I] timingCacheMode: local |
| [01/20/2026-07:02:51] [I] timingCacheFile: |
| [01/20/2026-07:02:51] [I] Enable Compilation Cache: Enabled |
| [01/20/2026-07:02:51] [I] Enable Monitor Memory: Disabled |
| [01/20/2026-07:02:51] [I] errorOnTimingCacheMiss: Disabled |
| [01/20/2026-07:02:51] [I] Preview Features: Use default preview flags. |
| [01/20/2026-07:02:51] [I] MaxAuxStreams: -1 |
| [01/20/2026-07:02:51] [I] BuilderOptimizationLevel: 3 |
| [01/20/2026-07:02:51] [I] MaxTactics: -1 |
| [01/20/2026-07:02:51] [I] Calibration Profile Index: 0 |
| [01/20/2026-07:02:51] [I] Weight Streaming: Disabled |
| [01/20/2026-07:02:51] [I] Runtime Platform: Same As Build |
| [01/20/2026-07:02:51] [I] Debug Tensors: |
| [01/20/2026-07:02:51] [I] Distributive Independence: Disabled |
| [01/20/2026-07:02:51] [I] Mark Unfused Tensors As Debug Tensors: Disabled |
| [01/20/2026-07:02:51] [I] Input(s)s format: fp32:CHW |
| [01/20/2026-07:02:51] [I] Output(s)s format: fp32:CHW |
| [01/20/2026-07:02:51] [I] Input build shape (profile 0): images=1x3x640x640+1x3x640x640+1x3x640x640 |
| [01/20/2026-07:02:51] [I] Input build shape (profile 0): orig_target_sizes=1x2+1x2+1x2 |
| [01/20/2026-07:02:51] [I] Input calibration shapes: model |
| [01/20/2026-07:02:51] [I] === System Options === |
| [01/20/2026-07:02:51] [I] Device: 0 |
| [01/20/2026-07:02:51] [I] DLACore: |
| [01/20/2026-07:02:51] [I] Plugins: |
| [01/20/2026-07:02:51] [I] setPluginsToSerialize: |
| [01/20/2026-07:02:51] [I] dynamicPlugins: |
| [01/20/2026-07:02:51] [I] ignoreParsedPluginLibs: 0 |
| [01/20/2026-07:02:51] [I] |
| [01/20/2026-07:02:51] [I] === Inference Options === |
| [01/20/2026-07:02:51] [I] Batch: Explicit |
| [01/20/2026-07:02:51] [I] Input inference shape : orig_target_sizes=1x2 |
| [01/20/2026-07:02:51] [I] Input inference shape : images=1x3x640x640 |
| [01/20/2026-07:02:51] [I] Iterations: 10 |
| [01/20/2026-07:02:51] [I] Duration: 3s (+ 200ms warm up) |
| [01/20/2026-07:02:51] [I] Sleep time: 0ms |
| [01/20/2026-07:02:51] [I] Idle time: 0ms |
| [01/20/2026-07:02:51] [I] Inference Streams: 1 |
| [01/20/2026-07:02:51] [I] ExposeDMA: Disabled |
| [01/20/2026-07:02:51] [I] Data transfers: Enabled |
| [01/20/2026-07:02:51] [I] Spin-wait: Disabled |
| [01/20/2026-07:02:51] [I] Multithreading: Disabled |
| [01/20/2026-07:02:51] [I] CUDA Graph: Disabled |
| [01/20/2026-07:02:51] [I] Separate profiling: Disabled |
| [01/20/2026-07:02:51] [I] Time Deserialize: Disabled |
| [01/20/2026-07:02:51] [I] Time Refit: Disabled |
| [01/20/2026-07:02:51] [I] NVTX verbosity: 0 |
| [01/20/2026-07:02:51] [I] Persistent Cache Ratio: 0 |
| [01/20/2026-07:02:51] [I] Optimization Profile Index: 0 |
| [01/20/2026-07:02:51] [I] Weight Streaming Budget: 100.000000% |
| [01/20/2026-07:02:51] [I] Inputs: |
| [01/20/2026-07:02:51] [I] Debug Tensor Save Destinations: |
| [01/20/2026-07:02:51] [I] Dump All Debug Tensor in Formats: |
| [01/20/2026-07:02:51] [I] === Reporting Options === |
| [01/20/2026-07:02:51] [I] Verbose: Disabled |
| [01/20/2026-07:02:51] [I] Averages: 10 inferences |
| [01/20/2026-07:02:51] [I] Percentiles: 90,95,99 |
| [01/20/2026-07:02:51] [I] Dump refittable layers:Disabled |
| [01/20/2026-07:02:51] [I] Dump output: Disabled |
| [01/20/2026-07:02:51] [I] Profile: Disabled |
| [01/20/2026-07:02:51] [I] Export timing to JSON file: |
| [01/20/2026-07:02:51] [I] Export output to JSON file: |
| [01/20/2026-07:02:51] [I] Export profile to JSON file: |
| [01/20/2026-07:02:51] [I] |
| [01/20/2026-07:02:51] [I] === Device Information === |
| [01/20/2026-07:02:51] [I] Available Devices: |
| [01/20/2026-07:02:51] [I] Device 0: "NVIDIA GeForce RTX 4090" UUID: GPU-55c23db9-433c-0d6c-46e7-9387266e5ddb |
| [01/20/2026-07:02:51] [I] Selected Device: NVIDIA GeForce RTX 4090 |
| [01/20/2026-07:02:51] [I] Selected Device ID: 0 |
| [01/20/2026-07:02:51] [I] Selected Device UUID: GPU-55c23db9-433c-0d6c-46e7-9387266e5ddb |
| [01/20/2026-07:02:51] [I] Compute Capability: 8.9 |
| [01/20/2026-07:02:51] [I] SMs: 128 |
| [01/20/2026-07:02:51] [I] Device Global Memory: 24071 MiB |
| [01/20/2026-07:02:51] [I] Shared Memory per SM: 100 KiB |
| [01/20/2026-07:02:51] [I] Memory Bus Width: 384 bits (ECC disabled) |
| [01/20/2026-07:02:51] [I] Application Compute Clock Rate: 2.52 GHz |
| [01/20/2026-07:02:51] [I] Application Memory Clock Rate: 10.501 GHz |
| [01/20/2026-07:02:51] [I] |
| [01/20/2026-07:02:51] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at. |
| [01/20/2026-07:02:51] [I] |
| [01/20/2026-07:02:51] [I] TensorRT version: 10.14.1 |
| [01/20/2026-07:02:51] [I] Loading standard plugins |
| [01/20/2026-07:02:51] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 29, GPU 10549 (MiB) |
| [01/20/2026-07:02:51] [I] Start parsing network model. |
| [01/20/2026-07:02:52] [I] [TRT] ---------------------------------------------------------------- |
| [01/20/2026-07:02:52] [I] [TRT] Input filename: checkpoints/deimv2_dinov3_x_coco.onnx |
| [01/20/2026-07:02:52] [I] [TRT] ONNX IR version: 0.0.8 |
| [01/20/2026-07:02:52] [I] [TRT] Opset version: 17 |
| [01/20/2026-07:02:52] [I] [TRT] Producer name: pytorch |
| [01/20/2026-07:02:52] [I] [TRT] Producer version: 2.10.0 |
| [01/20/2026-07:02:52] [I] [TRT] Domain: |
| [01/20/2026-07:02:52] [I] [TRT] Model version: 0 |
| [01/20/2026-07:02:52] [I] [TRT] Doc string: |
| [01/20/2026-07:02:52] [I] [TRT] ---------------------------------------------------------------- |
| [01/20/2026-07:02:52] [W] [TRT] ModelImporter.cpp:661: Make sure input orig_target_sizes has Int64 binding. |
| [01/20/2026-07:02:52] [W] [TRT] ModelImporter.cpp:908: Make sure output labels has Int64 binding. |
| [01/20/2026-07:02:52] [I] Finished parsing network model. Parse time: 0.237921 |
| [01/20/2026-07:02:52] [I] Set shape of input tensor images for optimization profile 0 to: MIN=1x3x640x640 OPT=1x3x640x640 MAX=1x3x640x640 |
| [01/20/2026-07:02:52] [I] Set shape of input tensor orig_target_sizes for optimization profile 0 to: MIN=1x2 OPT=1x2 MAX=1x2 |
| [01/20/2026-07:02:52] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +203, GPU +4, now: CPU 735, GPU 10553 (MiB) |
| [01/20/2026-07:02:52] [W] [TRT] Detected layernorm nodes in FP16. |
| [01/20/2026-07:02:52] [W] [TRT] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy. |
| [01/20/2026-07:02:52] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. |
| [01/20/2026-07:03:33] [I] [TRT] Compiler backend is used during engine build. |
| [01/20/2026-07:05:25] [I] [TRT] Detected 2 inputs and 3 output network tensors. |
| [01/20/2026-07:05:26] [I] [TRT] Total Host Persistent Memory: 376608 bytes |
| [01/20/2026-07:05:26] [I] [TRT] Total Device Persistent Memory: 1024 bytes |
| [01/20/2026-07:05:26] [I] [TRT] Max Scratch Memory: 20131840 bytes |
| [01/20/2026-07:05:26] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 107 steps to complete. |
| [01/20/2026-07:05:26] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 4.29572ms to assign 11 blocks to 107 nodes requiring 45025792 bytes. |
| [01/20/2026-07:05:26] [I] [TRT] Total Activation Memory: 45025792 bytes |
| [01/20/2026-07:05:26] [I] [TRT] Total Weights Memory: 100957696 bytes |
| [01/20/2026-07:05:26] [I] [TRT] Compiler backend is used during engine execution. |
| [01/20/2026-07:05:26] [I] [TRT] Engine generation completed in 154.472 seconds. |
| [01/20/2026-07:05:26] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 54 MiB, GPU 255 MiB |
| [01/20/2026-07:05:27] [I] Created engine with size: 100.555 MiB |
| [01/20/2026-07:05:27] [I] Engine built in 154.898 sec. |
| [01/20/2026-07:05:27] [I] [TRT] Loaded engine size: 100 MiB |
| [01/20/2026-07:05:27] [I] Engine deserialized in 0.0311196 sec. |
| [01/20/2026-07:05:27] [I] [TRT] [MS] Running engine with multi stream info |
| [01/20/2026-07:05:27] [I] [TRT] [MS] Number of aux streams is 3 |
| [01/20/2026-07:05:27] [I] [TRT] [MS] Number of total worker streams is 4 |
| [01/20/2026-07:05:27] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream |
| [01/20/2026-07:05:27] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +43, now: CPU 0, GPU 139 (MiB) |
| [01/20/2026-07:05:27] [I] Setting persistentCacheLimit to 0 bytes. |
| [01/20/2026-07:05:27] [I] Created execution context with device memory size: 42.9399 MiB |
| [01/20/2026-07:05:27] [I] Using random values for input images |
| [01/20/2026-07:05:27] [I] Input binding for images with dimensions 1x3x640x640 is created. |
| [01/20/2026-07:05:27] [I] Using random values for input orig_target_sizes |
| [01/20/2026-07:05:27] [I] Input binding for orig_target_sizes with dimensions 1x2 is created. |
| [01/20/2026-07:05:27] [I] Output binding for labels with dimensions 1x300 is created. |
| [01/20/2026-07:05:27] [I] Output binding for boxes with dimensions 1x300x4 is created. |
| [01/20/2026-07:05:27] [I] Output binding for scores with dimensions 1x300 is created. |
| [01/20/2026-07:05:27] [I] Starting inference |
| [01/20/2026-07:05:30] [I] Warmup completed 66 queries over 200 ms |
| [01/20/2026-07:05:30] [I] Timing trace has 1059 queries over 3.00739 s |
| [01/20/2026-07:05:30] [I] |
| [01/20/2026-07:05:30] [I] === Trace details === |
| [01/20/2026-07:05:30] [I] Trace averages of 10 runs: |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83336 ms - Host latency: 3.07233 ms (enqueue 0.701578 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83168 ms - Host latency: 3.07287 ms (enqueue 0.6114 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83374 ms - Host latency: 3.07555 ms (enqueue 0.580649 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83187 ms - Host latency: 3.07226 ms (enqueue 0.58468 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83537 ms - Host latency: 3.07763 ms (enqueue 0.583014 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83669 ms - Host latency: 3.07737 ms (enqueue 0.57963 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83434 ms - Host latency: 3.07546 ms (enqueue 0.565479 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83679 ms - Host latency: 3.07714 ms (enqueue 0.915753 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83261 ms - Host latency: 3.07397 ms (enqueue 0.606302 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83498 ms - Host latency: 3.076 ms (enqueue 0.579776 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83251 ms - Host latency: 3.07282 ms (enqueue 0.568085 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83575 ms - Host latency: 3.07682 ms (enqueue 0.559 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83212 ms - Host latency: 3.07352 ms (enqueue 0.566913 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83415 ms - Host latency: 3.07585 ms (enqueue 0.554803 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83711 ms - Host latency: 3.07679 ms (enqueue 0.602917 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83541 ms - Host latency: 3.07644 ms (enqueue 0.566254 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83577 ms - Host latency: 3.07631 ms (enqueue 0.573004 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83618 ms - Host latency: 3.0764 ms (enqueue 0.565643 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83678 ms - Host latency: 3.07716 ms (enqueue 0.562018 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83444 ms - Host latency: 3.0758 ms (enqueue 0.561792 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83307 ms - Host latency: 3.07421 ms (enqueue 0.552844 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8335 ms - Host latency: 3.07266 ms (enqueue 0.841644 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83771 ms - Host latency: 3.07906 ms (enqueue 0.563574 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83376 ms - Host latency: 3.0748 ms (enqueue 0.557874 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83472 ms - Host latency: 3.07563 ms (enqueue 0.555933 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83764 ms - Host latency: 3.0792 ms (enqueue 0.560291 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83876 ms - Host latency: 3.08054 ms (enqueue 0.561816 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83506 ms - Host latency: 3.07485 ms (enqueue 0.546808 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83537 ms - Host latency: 3.07655 ms (enqueue 0.562311 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83401 ms - Host latency: 3.0748 ms (enqueue 0.553601 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83458 ms - Host latency: 3.07444 ms (enqueue 0.554346 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83433 ms - Host latency: 3.07482 ms (enqueue 0.549353 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8351 ms - Host latency: 3.07762 ms (enqueue 0.560791 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83547 ms - Host latency: 3.07562 ms (enqueue 0.546594 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8327 ms - Host latency: 3.07433 ms (enqueue 0.547229 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83512 ms - Host latency: 3.07487 ms (enqueue 0.569556 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.835 ms - Host latency: 3.07515 ms (enqueue 0.549707 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83547 ms - Host latency: 3.07701 ms (enqueue 0.586328 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.837 ms - Host latency: 3.07726 ms (enqueue 0.626392 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83579 ms - Host latency: 3.07651 ms (enqueue 0.560889 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83383 ms - Host latency: 3.07456 ms (enqueue 0.550244 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83232 ms - Host latency: 3.07349 ms (enqueue 0.55011 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83391 ms - Host latency: 3.0741 ms (enqueue 0.55332 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83215 ms - Host latency: 3.07223 ms (enqueue 0.570496 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83461 ms - Host latency: 3.07531 ms (enqueue 0.611377 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83903 ms - Host latency: 3.08014 ms (enqueue 0.648914 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83828 ms - Host latency: 3.07887 ms (enqueue 0.61167 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83721 ms - Host latency: 3.07781 ms (enqueue 0.598376 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83433 ms - Host latency: 3.07522 ms (enqueue 0.574182 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83784 ms - Host latency: 3.07988 ms (enqueue 0.57605 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8355 ms - Host latency: 3.07539 ms (enqueue 0.767432 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8395 ms - Host latency: 3.08134 ms (enqueue 0.605664 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83661 ms - Host latency: 3.0766 ms (enqueue 0.590234 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.836 ms - Host latency: 3.07712 ms (enqueue 0.579431 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8364 ms - Host latency: 3.07605 ms (enqueue 0.597498 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83751 ms - Host latency: 3.07806 ms (enqueue 0.824365 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.84104 ms - Host latency: 3.08406 ms (enqueue 0.769312 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83573 ms - Host latency: 3.07745 ms (enqueue 0.704712 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83588 ms - Host latency: 3.07727 ms (enqueue 0.593005 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83518 ms - Host latency: 3.07616 ms (enqueue 0.577429 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83756 ms - Host latency: 3.07815 ms (enqueue 0.736743 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8374 ms - Host latency: 3.07781 ms (enqueue 0.592505 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83972 ms - Host latency: 3.08015 ms (enqueue 0.628516 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.84054 ms - Host latency: 3.08188 ms (enqueue 0.596912 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83979 ms - Host latency: 3.08151 ms (enqueue 0.584668 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83789 ms - Host latency: 3.07964 ms (enqueue 0.565247 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83699 ms - Host latency: 3.07812 ms (enqueue 0.556909 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.84001 ms - Host latency: 3.08103 ms (enqueue 0.646191 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83679 ms - Host latency: 3.07761 ms (enqueue 0.59978 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8342 ms - Host latency: 3.07512 ms (enqueue 0.548999 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83394 ms - Host latency: 3.07529 ms (enqueue 0.549609 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83601 ms - Host latency: 3.07656 ms (enqueue 0.560278 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83389 ms - Host latency: 3.07397 ms (enqueue 0.571289 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83408 ms - Host latency: 3.07483 ms (enqueue 0.553686 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83159 ms - Host latency: 3.07146 ms (enqueue 0.548755 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83374 ms - Host latency: 3.07417 ms (enqueue 0.547095 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83606 ms - Host latency: 3.07732 ms (enqueue 0.540796 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83538 ms - Host latency: 3.07573 ms (enqueue 0.544482 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83579 ms - Host latency: 3.07656 ms (enqueue 0.556934 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83499 ms - Host latency: 3.07476 ms (enqueue 0.566772 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83511 ms - Host latency: 3.0771 ms (enqueue 0.565356 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83394 ms - Host latency: 3.07466 ms (enqueue 0.549658 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83574 ms - Host latency: 3.07695 ms (enqueue 0.558032 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83577 ms - Host latency: 3.0759 ms (enqueue 0.541553 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83303 ms - Host latency: 3.07288 ms (enqueue 0.549805 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8356 ms - Host latency: 3.07651 ms (enqueue 0.564941 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8386 ms - Host latency: 3.07976 ms (enqueue 0.546021 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83457 ms - Host latency: 3.07576 ms (enqueue 0.545605 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83643 ms - Host latency: 3.07664 ms (enqueue 0.535596 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83413 ms - Host latency: 3.07429 ms (enqueue 0.540967 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83479 ms - Host latency: 3.07485 ms (enqueue 0.557007 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83423 ms - Host latency: 3.07478 ms (enqueue 0.538745 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83469 ms - Host latency: 3.07603 ms (enqueue 0.535815 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.8321 ms - Host latency: 3.07302 ms (enqueue 0.54021 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83604 ms - Host latency: 3.07712 ms (enqueue 0.540405 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83286 ms - Host latency: 3.07415 ms (enqueue 0.53977 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83679 ms - Host latency: 3.07815 ms (enqueue 0.541943 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83728 ms - Host latency: 3.07744 ms (enqueue 0.585742 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83982 ms - Host latency: 3.08137 ms (enqueue 0.593237 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83772 ms - Host latency: 3.07883 ms (enqueue 0.568213 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83499 ms - Host latency: 3.07654 ms (enqueue 0.556763 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83828 ms - Host latency: 3.07947 ms (enqueue 0.555713 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83884 ms - Host latency: 3.079 ms (enqueue 0.78623 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83235 ms - Host latency: 3.07083 ms (enqueue 1.00188 ms) |
| [01/20/2026-07:05:30] [I] Average on 10 runs - GPU latency: 2.83823 ms - Host latency: 3.08057 ms (enqueue 0.596899 ms) |
| [01/20/2026-07:05:30] [I] |
| [01/20/2026-07:05:30] [I] === Performance summary === |
| [01/20/2026-07:05:30] [I] Throughput: 352.132 qps |
| [01/20/2026-07:05:30] [I] Latency: min = 2.91431 ms, max = 3.09717 ms, mean = 3.07624 ms, median = 3.07642 ms, percentile(90%) = 3.08423 ms, percentile(95%) = 3.08667 ms, percentile(99%) = 3.09033 ms |
| [01/20/2026-07:05:30] [I] Enqueue Time: min = 0.532471 ms, max = 1.69238 ms, mean = 0.590915 ms, median = 0.56012 ms, percentile(90%) = 0.625488 ms, percentile(95%) = 0.673096 ms, percentile(99%) = 1.30176 ms |
| [01/20/2026-07:05:30] [I] H2D Latency: min = 0.226257 ms, max = 0.238525 ms, mean = 0.233766 ms, median = 0.233887 ms, percentile(90%) = 0.235413 ms, percentile(95%) = 0.235901 ms, percentile(99%) = 0.237183 ms |
| [01/20/2026-07:05:30] [I] GPU Compute Time: min = 2.67676 ms, max = 2.85498 ms, mean = 2.83542 ms, median = 2.83545 ms, percentile(90%) = 2.84277 ms, percentile(95%) = 2.84473 ms, percentile(99%) = 2.8479 ms |
| [01/20/2026-07:05:30] [I] D2H Latency: min = 0.00415039 ms, max = 0.00939941 ms, mean = 0.00705897 ms, median = 0.00744629 ms, percentile(90%) = 0.00878906 ms, percentile(95%) = 0.0090332 ms, percentile(99%) = 0.00927734 ms |
| [01/20/2026-07:05:30] [I] Total Host Walltime: 3.00739 s |
| [01/20/2026-07:05:30] [I] Total GPU Compute Time: 3.00271 s |
| [01/20/2026-07:05:30] [I] Explanations of the performance metrics are printed in the verbose logs. |
| [01/20/2026-07:05:30] [I] |
| &&&& PASSED TensorRT.trtexec [TensorRT v101401] [b48] # trtexec --onnx=checkpoints/deimv2_dinov3_x_coco.onnx --saveEngine=checkpoints/deimv2_dinov3_x_coco.engine --fp16 --optShapes=images:1x3x640x640,orig_target_sizes:1x2 --memPoolSize=workspace:4096 --builderOptimizationLevel=3 |
|
|