update the README file with the ONNX-TensorRT conversion

Browse files

Files changed (3) hide show

README.md +7 -6
configs/metadata.json +2 -1
docs/README.md +7 -6

README.md CHANGED Viewed

@@ -52,13 +52,12 @@ Dice score is used for evaluating the performance of the model. This model achie
 ![A graph showing the validation mean Dice over 1260 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_spleen_ct_segmentation_val.png)
 #### TensorRT speedup
-The `spleen_ct_segmentation` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
 | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
 | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
-| model computation | 6.48 | 4.48 | 6.40 | 6.30 | 1.45 | 1.01 | 1.03 | 0.71 |
-| model computation(onnx) | 6.46 | 4.48 | 2.52 | 1.96 | 1.44 | 2.56 | 3.30 | 2.29 |
-| end2end | 3900.73 | 3823.89 | 3887.37 | 3883.01 |	1.02 | 1.00 | 1.00 | 0.98 |
 Where:
 - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
@@ -68,13 +67,15 @@ Where:
 - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
 - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
 This result is benchmarked under:
  - TensorRT: 8.5.3+cuda11.8
  - Torch-TensorRT Version: 1.4.0
  - CPU Architecture: x86-64
  - OS: ubuntu 20.04
  - Python version:3.8.10
- - CUDA version: 12.0
  - GPU models and configuration: A100 80G
 ## MONAI Bundle Commands
@@ -117,7 +118,7 @@ python -m monai.bundle run --config_file configs/inference.json
 #### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
 ```
-python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --dynamic_batchsize "[1, 4, 8]"
 ```
 #### Execute inference with the TensorRT model:

 ![A graph showing the validation mean Dice over 1260 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_spleen_ct_segmentation_val.png)
 #### TensorRT speedup
+The `spleen_ct_segmentation` bundle supports acceleration with TensorRT through the ONNX-TensorRT method. The table below displays the speedup ratios observed on an A100 80G GPU.
 | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
 | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 6.46 | 4.48 | 2.52 | 1.96 | 1.44 | 2.56 | 3.30 | 2.29 |
+| end2end | 1268.03 | 1152.40 | 1137.40 | 1114.25 | 1.10 | 1.11 | 1.14 | 1.03 |
 Where:
 - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
 - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
 - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+Currently, the only available method to accelerate this model is through ONNX-TensorRT. However, the Torch-TensorRT method is under development and will be available in the near future.
 This result is benchmarked under:
  - TensorRT: 8.5.3+cuda11.8
  - Torch-TensorRT Version: 1.4.0
  - CPU Architecture: x86-64
  - OS: ubuntu 20.04
  - Python version:3.8.10
+ - CUDA version: 12.1
  - GPU models and configuration: A100 80G
 ## MONAI Bundle Commands
 #### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
 ```
+python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --dynamic_batchsize "[1, 4, 8]" --use_onnx "True" --use_trace "True"
 ```
 #### Execute inference with the TensorRT model:

configs/metadata.json CHANGED Viewed

@@ -1,7 +1,8 @@
 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
-    "version": "0.4.9",
     "changelog": {
         "0.4.9": "update TensorRT descriptions",
         "0.4.8": "update deterministic training results",
         "0.4.7": "update the TensorRT part in the README file",

 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
+    "version": "0.5.0",
     "changelog": {
+        "0.5.0": "update the README file with the ONNX-TensorRT conversion",
         "0.4.9": "update TensorRT descriptions",
         "0.4.8": "update deterministic training results",
         "0.4.7": "update the TensorRT part in the README file",

docs/README.md CHANGED Viewed

@@ -45,13 +45,12 @@ Dice score is used for evaluating the performance of the model. This model achie
 ![A graph showing the validation mean Dice over 1260 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_spleen_ct_segmentation_val.png)
 #### TensorRT speedup
-The `spleen_ct_segmentation` bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU.
 | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
 | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
-| model computation | 6.48 | 4.48 | 6.40 | 6.30 | 1.45 | 1.01 | 1.03 | 0.71 |
-| model computation(onnx) | 6.46 | 4.48 | 2.52 | 1.96 | 1.44 | 2.56 | 3.30 | 2.29 |
-| end2end | 3900.73 | 3823.89 | 3887.37 | 3883.01 |	1.02 | 1.00 | 1.00 | 0.98 |
 Where:
 - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
@@ -61,13 +60,15 @@ Where:
 - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
 - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
 This result is benchmarked under:
  - TensorRT: 8.5.3+cuda11.8
  - Torch-TensorRT Version: 1.4.0
  - CPU Architecture: x86-64
  - OS: ubuntu 20.04
  - Python version:3.8.10
- - CUDA version: 12.0
  - GPU models and configuration: A100 80G
 ## MONAI Bundle Commands
@@ -110,7 +111,7 @@ python -m monai.bundle run --config_file configs/inference.json
 #### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
 ```
-python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --dynamic_batchsize "[1, 4, 8]"
 ```
 #### Execute inference with the TensorRT model:

 ![A graph showing the validation mean Dice over 1260 epochs.](https://developer.download.nvidia.com/assets/Clara/Images/monai_spleen_ct_segmentation_val.png)
 #### TensorRT speedup
+The `spleen_ct_segmentation` bundle supports acceleration with TensorRT through the ONNX-TensorRT method. The table below displays the speedup ratios observed on an A100 80G GPU.
 | method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
 | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 6.46 | 4.48 | 2.52 | 1.96 | 1.44 | 2.56 | 3.30 | 2.29 |
+| end2end | 1268.03 | 1152.40 | 1137.40 | 1114.25 | 1.10 | 1.11 | 1.14 | 1.03 |
 Where:
 - `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
 - `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
 - `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+Currently, the only available method to accelerate this model is through ONNX-TensorRT. However, the Torch-TensorRT method is under development and will be available in the near future.
 This result is benchmarked under:
  - TensorRT: 8.5.3+cuda11.8
  - Torch-TensorRT Version: 1.4.0
  - CPU Architecture: x86-64
  - OS: ubuntu 20.04
  - Python version:3.8.10
+ - CUDA version: 12.1
  - GPU models and configuration: A100 80G
 ## MONAI Bundle Commands
 #### Export checkpoint to TensorRT based models with fp32 or fp16 precision:
 ```
+python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --dynamic_batchsize "[1, 4, 8]" --use_onnx "True" --use_trace "True"
 ```
 #### Execute inference with the TensorRT model: