# Digit Recognition With Dynamic Shapes In TensorRT **Table Of Contents** - [Description](#description) - [How does this sample work?](#how-does-this-sample-work) * [Creating the preprocessing network](#creating-the-preprocessing-network) * [Parsing the ONNX MNIST model](#parsing-the-onnx-mnist-model) * [Building engines](#building-engines) * [Running inference](#running-inference) * [TensorRT API layers and ops](#tensorrt-api-layers-and-ops) - [Preparing sample data](#preparing-sample-data) - [Running the sample](#running-the-sample) * [Sample `--help` options](#sample-help-options) - [Additional resources](#additional-resources) - [License](#license) - [Changelog](#changelog) - [Known issues](#known-issues) ## Description This sample, sampleDynamicReshape, demonstrates how to use dynamic input dimensions in TensorRT. It creates an engine that takes a dynamically shaped input and resizes it to be consumed by an ONNX MNIST model that expects a fixed size input. For more information, see [Working With Dynamic Shapes](https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#work_dynamic_shapes) in the TensorRT Developer Guide. ## How does this sample work? This sample creates an engine for resizing an input with dynamic dimensions to a size that an ONNX MNIST model can consume. Specifically, this sample: - Creates a network with dynamic input dimensions to act as a preprocessor for the model - Parses an ONNX MNIST model to create a second network - Builds engines for both networks and does calibration if running in int8 - Runs inference using both engines ### Creating the preprocessing network First, create a network with full dims support: `auto preprocessorNetwork = makeUnique(builder->createNetworkV2(0));` Next, add an input layer that accepts an input with a dynamic shape, followed by a resize layer that will reshape the input to the shape the model expects: ``` auto input = preprocessorNetwork->addInput("input", nvinfer1::DataType::kFLOAT, Dims4{-1, 1, -1, -1}); auto resizeLayer = preprocessorNetwork->addResize(*input); resizeLayer->setOutputDimensions(mPredictionInputDims); preprocessorNetwork->markOutput(*resizeLayer->getOutput(0)); ``` The -1 dimensions denote dimensions that will be supplied at runtime. ### Parsing the ONNX MNIST model First, create an empty full-dims network, and parser: ``` auto network = makeUnique(builder->createNetworkV2(0)); auto parser = nvonnxparser::createParser(*network, sample::gLogger.getTRTLogger()); ``` Next, parse the model file to populate the network: ``` parser->parseFromFile(locateFile(mParams.onnxFileName, mParams.dataDirs).c_str(), static_cast(sample::gLogger.getReportableSeverity())); ``` ### Building engines When building the preprocessor engine, also provide an optimization profile so that TensorRT knows which input shapes to optimize for: ``` auto preprocessorConfig = makeUnique(builder->createBuilderConfig()); auto profile = builder->createOptimizationProfile(); ``` `OptProfileSelector::kOPT` specifies the dimensions that the profile will be optimized for, whereas `OptProfileSelector::kMIN` and `OptProfileSelector::kMAX` specify the minimum and maximum dimensions for which the profile will be valid: ``` profile->setDimensions(input->getName(), OptProfileSelector::kMIN, Dims4{1, 1, 1, 1}); profile->setDimensions(input->getName(), OptProfileSelector::kOPT, Dims4{1, 1, 28, 28}); profile->setDimensions(input->getName(), OptProfileSelector::kMAX, Dims4{1, 1, 56, 56}); preprocessorConfig->addOptimizationProfile(profile); ``` Create an optimization profile for calibration: ``` auto profileCalib = builder->createOptimizationProfile(); const int calibBatchSize{256}; profileCalib->setDimensions(input->getName(), OptProfileSelector::kMIN, Dims4{calibBatchSize, 1, 28, 28}); profileCalib->setDimensions(input->getName(), OptProfileSelector::kOPT, Dims4{calibBatchSize, 1, 28, 28}); profileCalib->setDimensions(input->getName(), OptProfileSelector::kMAX, Dims4{calibBatchSize, 1, 28, 28}); preprocessorConfig->setCalibrationProfile(profileCalib); ``` Prepare and set int8 calibrator if running in int8 mode: ``` std::unique_ptr calibrator; if (mParams.int8) { preprocessorConfig->setFlag(BuilderFlag::kINT8); const int nCalibBatches{10}; MNISTBatchStream calibrationStream(calibBatchSize, nCalibBatches, "train-images-idx3-ubyte", "train-labels-idx1-ubyte", mParams.dataDirs); calibrator.reset(new Int8EntropyCalibrator2( calibrationStream, 0, "MNISTPreprocessor", "input")); preprocessorConfig->setInt8Calibrator(calibrator.get()); } ``` Run engine build with config: ``` SampleUniquePtr preprocessorPlan = makeUnique( builder->buildSerializedNetwork(*preprocessorNetwork, *preprocessorConfig)); if (!preprocessorPlan) { sample::gLogError << "Preprocessor serialized engine build failed." << std::endl; return false; } mPreprocessorEngine = makeUnique( runtime->deserializeCudaEngine(preprocessorPlan->data(), preprocessorPlan->size())); if (!mPreprocessorEngine) { sample::gLogError << "Preprocessor engine deserialization failed." << std::endl; return false; } ``` For the MNIST model, attach a Softmax layer to the end of the network, set softmax axis to 1 since network output has shape [1, 10] in full dims mode and replace the existing network output with the Softmax: ``` auto softmax = network->addSoftMax(*network->getOutput(0)); softmax->setAxes(1 << 1); network->unmarkOutput(*network->getOutput(0)); network->markOutput(*softmax->getOutput(0)); ``` A calibrator and a calibration profile are set the same way as above for the preprocessor engine config. `calibBatchSize` is set to 1 for the prediction engine as ONNX model has an explicit batch. Finally, build as normal: ``` SampleUniquePtr predictionPlan = makeUnique(builder->buildSerializedNetwork(*network, *config)); if (!predictionPlan) { sample::gLogError << "Prediction serialized engine build failed." << std::endl; return false; } mPredictionEngine = makeUnique( runtime->deserializeCudaEngine(predictionPlan->data(), predictionPlan->size())); if (!mPredictionEngine) { sample::gLogError << "Prediction engine deserialization failed." << std::endl; return false; } ``` ### Running inference During inference, first copy the input buffer to the device: ``` CHECK(cudaMemcpy(mInput.deviceBuffer.data(), mInput.hostBuffer.data(), mInput.hostBuffer.nbBytes(), cudaMemcpyHostToDevice)); ``` Since the preprocessor engine accepts dynamic shapes, specify the actual shape of the current input to the execution context: `mPreprocessorContext->setInputShape(inputTensorName, inputDims);`, where inputTensorName is the name of the input tensor on binding index 0. Next, run the preprocessor using the `executeV2` function. The example writes the output of the preprocessor engine directly to the input device buffer of the MNIST engine: ``` std::vector preprocessorBindings = {mInput.deviceBuffer.data(), mPredictionInput.data()}; bool status = mPreprocessorContext->executeV2(preprocessorBindings.data()); ``` Then, run the MNIST engine: ``` std::vector predicitonBindings = {mPredictionInput.data(), mOutput.deviceBuffer.data()}; status = mPredictionContext->executeV2(predicitonBindings.data()); ``` Finally, copy the output back to the host: ``` CHECK(cudaMemcpy(mOutput.hostBuffer.data(), mOutput.deviceBuffer.data(), mOutput.deviceBuffer.nbBytes(), cudaMemcpyDeviceToHost)); ``` ### TensorRT API layers and ops In this sample, the following layers are used. For more information about these layers, see the [TensorRT Developer Guide: Layers](https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#layers) documentation. [Resize layer](https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#resize-layer) The IResizeLayer implements the resize operation on an input tensor. ## Preparing sample data 1. Download the sample data from [TensorRT release tarball](https://developer.nvidia.com/nvidia-tensorrt-download#), if not already mounted under `/usr/src/tensorrt/data` (NVIDIA NGC containers) and set it to `$TRT_DATADIR`. ```bash export TRT_DATADIR=/usr/src/tensorrt/data pushd $TRT_DATADIR/mnist pip3 install Pillow popd ``` ## Running the sample 1. Compile the sample by following build instructions in [TensorRT README](https://github.com/NVIDIA/TensorRT/). 2. Run the sample. ```bash ./sample_dynamic_reshape [-h or --help] [-d or --datadir=] [--useDLACore=] [--int8 or --fp16] ``` For example: ```bash ./sample_dynamic_reshape --datadir $TRT_DATADIR/mnist --fp16 ``` 3. Verify that the sample ran successfully. If the sample runs successfully you should see output similar to the following: ``` &&&& RUNNING TensorRT.sample_dynamic_reshape # ./sample_dynamic_reshape ---------------------------------------------------------------- Input filename: ../../../../../data/samples/mnist/mnist.onnx ONNX IR version: 0.0.3 Opset version: 8 Producer name: CNTK Producer version: 2.5.1 Domain: ai.cntk Model version: 1 Doc string: ---------------------------------------------------------------- [W] [TRT] onnx2trt_utils.cpp:214: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [W] [TRT] onnx2trt_utils.cpp:214: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [I] [TRT] Detected 1 inputs and 1 output network tensors. [I] [TRT] Detected 1 inputs and 1 output network tensors. [I] Profile dimensions in preprocessor engine: [I] Minimum = (1, 1, 1, 1) [I] Optimum = (1, 1, 28, 28) [I] Maximum = (1, 1, 56, 56) [I] Input: @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@*. .*@@@@@@@@@@@ @@@@@@@@@@*. +@@@@@@@@@@ @@@@@@@@@@. :#+ %@@@@@@@@@ @@@@@@@@@@.:@@@+ +@@@@@@@@@ @@@@@@@@@@.:@@@@: +@@@@@@@@@ @@@@@@@@@@=%@@@@: +@@@@@@@@@ @@@@@@@@@@@@@@@@# +@@@@@@@@@ @@@@@@@@@@@@@@@@* +@@@@@@@@@ @@@@@@@@@@@@@@@@: +@@@@@@@@@ @@@@@@@@@@@@@@@@: +@@@@@@@@@ @@@@@@@@@@@@@@@* .@@@@@@@@@@ @@@@@@@@@@%**%@. *@@@@@@@@@@ @@@@@@@@%+. .: .@@@@@@@@@@@ @@@@@@@@= .. :@@@@@@@@@@@ @@@@@@@@: *@@: :@@@@@@@@@@@ @@@@@@@% %@* *@@@@@@@@@@ @@@@@@@% ++ ++ .%@@@@@@@@@ @@@@@@@@- +@@- +@@@@@@@@@ @@@@@@@@= :*@@@# .%@@@@@@@@ @@@@@@@@@+*@@@@@%. %@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ [I] Output: [I] Prob 0 0.0000 Class 0: [I] Prob 1 0.0000 Class 1: [I] Prob 2 1.0000 Class 2: ********** [I] Prob 3 0.0000 Class 3: [I] Prob 4 0.0000 Class 4: [I] Prob 5 0.0000 Class 5: [I] Prob 6 0.0000 Class 6: [I] Prob 7 0.0000 Class 7: [I] Prob 8 0.0000 Class 8: [I] Prob 9 0.0000 Class 9: &&&& PASSED TensorRT.sample_dynamic_reshape # ./sample_dynamic_reshape ``` This output shows that the sample ran successfully; `PASSED`. ### Sample `--help` options To see the full list of available options and their descriptions, use the `-h` or `--help` command line option. # Additional resources The following resources provide a deeper understanding of dynamic shapes. **ONNX** - [GitHub: ONNX](https://github.com/onnx/onnx) - [GitHub: ONNX-TensorRT open source parser](https://github.com/onnx/onnx-tensorrt) **Models** - [MNIST - Handwritten Digit Recognition](https://github.com/onnx/models/tree/master/mnist) - [GitHub: ONNX Models](https://github.com/onnx/models) **Documentation** - [Introduction To NVIDIA’s TensorRT Samples](https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#samples) - [Working With TensorRT Using The Python API](https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#python_topics) - [NVIDIA’s TensorRT Documentation Library](https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/index.html) # License For terms and conditions for use, reproduction, and distribution, see the [TensorRT Software License Agreement](https://docs.nvidia.com/deeplearning/sdk/tensorrt-sla/index.html) documentation. # Changelog February 2020 This is the second release of the `README.md` file and sample. # Known issues There are no known issues in this sample.