Environment

Platform: Google Colab
GPU: NVIDIA Tesla T4
Input size: 640×640
Batch size: 1
Warm-up runs: 30
Measured runs: 200

Results

Artifact	Mean Latency (ms)	Median Latency (ms)	P95 Latency (ms)	FPS (Median)
ONNX INT8	733.704	634.253	1196.094	1.58
TorchScript FP16	15.526	15.174	17.666	65.90
TensorRT INT8	12.956	12.774	14.836	78.28

TensorRT INT8 achieved the best latency and throughput on an NVIDIA Tesla T4 GPU. TorchScript FP16 delivered comparable performance, while the ONNX INT8 artifact showed substantially higher latency in this environment.