## Environment

- Platform: Google Colab
- GPU: NVIDIA Tesla T4
- Input size: 640×640
- Batch size: 1
- Warm-up runs: 30
- Measured runs: 200

## Results

| Artifact | Mean Latency (ms) | Median Latency (ms) | P95 Latency (ms) | FPS (Median) |
|------------|------------------:|--------------------:|-----------------:|--------------:|
| ONNX INT8 | 733.704 | 634.253 | 1196.094 | 1.58 |
| TorchScript FP16 | 15.526 | 15.174 | 17.666 | 65.90 |
| TensorRT INT8 | 12.956 | 12.774 | 14.836 | 78.28 |

TensorRT INT8 achieved the best latency and throughput on an NVIDIA Tesla T4 GPU. TorchScript FP16 delivered comparable performance, while the ONNX INT8 artifact showed substantially higher latency in this environment.