| Exporting NeMo Models | |
| ===================== | |
| Exporting Models | |
| ---------------- | |
| Most of the NeMo models can be exported to ONNX or TorchScript to be deployed for inference in optimized execution environments, such as Riva or Triton Inference Server. | |
| Export interface is provided by the :class:`~nemo.core.classes.exportable.Exportable` mix-in class. If a model extends :class:`~nemo.core.classes.exportable.Exportable`, it can be exported by: | |
| .. code-block:: Python | |
| from nemo.core.classes import ModelPT, Exportable | |
| # deriving from Exportable | |
| class MyExportableModel(ModelPT, Exportable): | |
| ... | |
| mymodel = MyExportableModel.from_pretrained(model_name="MyModelName") | |
| model.eval() | |
| model.to('cuda') # or to('cpu') if you don't have GPU | |
| # exporting pre-trained model to ONNX file for deployment. | |
| mymodel.export('mymodel.onnx', [options]) | |
| How to Use Model Export | |
| ----------------------- | |
| The following arguments are for :meth:`~nemo.core.classes.exportable.Exportable.export`. In most cases, you should only supply the name of the output file and use all defaults: | |
| .. code-block:: Python | |
| def export( | |
| self, | |
| output: str, | |
| input_example=None, | |
| verbose=False, | |
| do_constant_folding=True, | |
| onnx_opset_version=None, | |
| check_trace: Union[bool, List[torch.Tensor]] = False, | |
| dynamic_axes=None, | |
| check_tolerance=0.01, | |
| export_modules_as_functions=False, | |
| keep_initializers_as_inputs=None, | |
| ): | |
| The ``output``, ``input_example``, ``verbose``, ``do_constant_folding``, ``onnx_opset_version`` options have the same semantics as in Pytorch ``onnx.export()`` and ``jit.trace()`` functions and are passed through. For more information about Pytorch's``onnx.export()``, refer to the `torch.onnx functions documentation | |
| <https://pytorch.org/docs/stable/onnx.html#functions>`_. Note that if ``input_example`` is None, ``Exportable.input_example()`` is called. | |
| The file extension of the ``output`` parameter determines export format: | |
| * ``.onnx->ONNX`` | |
| * ``.pt`` or ``.ts`` -> ``TorchScript``. | |
| **TorchScript-specific**: By default, the module will undergo ``jit.trace()``. You may require to explicitly pass some modules under ``jit.script()`` so that they are correctly traced.The ``check_trace`` arg is passed through to ``jit.trace()``. | |
| **ONNX-specific**: If ``use_dynamic_axes`` is True, ``onnx.export()`` is called with dynamic axes. If ``dynamic_axes`` is ``None``, they are inferred from the model's ``input_types`` definition (batch dimension is dynamic, and so is duration etc). | |
| If ``check_trace`` is ``True``, the resulting ONNX also runs on ``input_example`` and the results compared to the exported model's output, using the ``check_tolerance`` argument. Note the higher tolerance default. | |
| How to Make Model Exportable | |
| ---------------------------- | |
| If you are simply using NeMo models, the previous example is all you need to know. | |
| If you write your own models, this section highlights the things you need to be aware of after extending ``Exportable``. | |
| Exportable Hooks and Overrides | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| You should not normally need to override ``Exportable`` default methods. However, ``Exportable.export()`` relies on the assumptions that certain methods are available in your class. | |
| .. code-block:: Python | |
| @property | |
| def input_example(self) # => Tuple(input, [(input, ...], [Dict]) | |
| """ | |
| Generates input examples for tracing etc. | |
| Returns: | |
| A tuple of input examples. | |
| """ | |
| This function should return a tuple of (normally) Tensors - one per each of model inputs (args to ``forward()``). The last element may be a ``Dict`` to specify non-positional arguments by name, as per Torch ``export()`` convention. For more information, refer to the `Using dictionaries to handle Named Arguments as model inputs | |
| <https://pytorch.org/docs/stable/onnx.html#using-dictionaries-to-handle-named-arguments-as-model-inputs>`_. | |
| .. Note: ``Dict`` currently does not work with Torchscript ``trace()``. | |
| .. code-block:: Python | |
| @property | |
| def input_types(self): | |
| @property | |
| def output_types(self): | |
| Those are needed for inferring in/out names and dynamic axes. If your model derives from ``ModulePT``, those are already there. Another common scenario is that your model contains one or more modules that processes input and generates output. Then, you should override ``Exportable`` methods ``input_module()`` and ``output_module()`` to point to them, like in this example: | |
| .. code-block:: Python | |
| @property | |
| def input_module(self): | |
| return self.fastpitch | |
| @property | |
| def output_module(self): | |
| return self.fastpitch | |
| Your model should also have an export-friendly ``forward()`` method - that can mean different things for ONNX ant TorchScript. For ONNX, you can't have forced named parameters without default, like ``forward(self, *, text)``. For TorchScript, you should avoid ``None`` and use ``Optional`` instead. The criteria are highly volatile and may change with every PyTorch version, so it's a trial-and-error process. There is also the general issue that in many cases, ``forward()`` for inference can be simplified and even use less inputs/outputs. To address this, ``Exportable`` looks for ``forward_for_export()`` method in your model and uses that instead of ``forward()`` to export: | |
| .. code-block:: Python | |
| # Uses forced named args, many default parameters. | |
| def forward( | |
| self, | |
| *, | |
| text, | |
| durs=None, | |
| pitch=None, | |
| speaker=0, | |
| pace=1.0, | |
| spec=None, | |
| attn_prior=None, | |
| mel_lens=None, | |
| input_lens=None, | |
| ): | |
| # Passes through all self.fastpitch outputs | |
| return self.fastpitch( | |
| text=text, | |
| durs=durs, | |
| pitch=pitch, | |
| speaker=speaker, | |
| pace=pace, | |
| spec=spec, | |
| attn_prior=attn_prior, | |
| mel_lens=mel_lens, | |
| input_lens=input_lens, | |
| ) | |
| # Uses less inputs, no '*', returns less outputs: | |
| def forward_for_export(self, text): | |
| ( | |
| spect, | |
| durs_predicted, | |
| log_durs_predicted, | |
| pitch_predicted, | |
| attn_soft, | |
| attn_logprob, | |
| attn_hard, | |
| attn_hard_dur, | |
| pitch, | |
| ) = self.fastpitch(text=text) | |
| return spect, durs_predicted, log_durs_predicted, pitch_predicted | |
| To stay consistent with input_types()/output_types(), there are also those hooks in ``Exportable`` that let you exclude particular inputs/outputs from the export process: | |
| .. code-block:: Python | |
| @property | |
| def disabled_deployment_input_names(self): | |
| """Implement this method to return a set of input names disabled for export""" | |
| return set(["durs", "pitch", "speaker", "pace", "spec", "attn_prior", "mel_lens", "input_lens"]) | |
| @property | |
| def disabled_deployment_output_names(self): | |
| Another common requirement for models that are being exported is to run certain net modifications for inference efficiency before exporting - like disabling masks in some convolutions or removing batch normalizations. A better style is to make those happen on ``ModelPT.eval()`` (and reversed on ``.train()``), but it's not always feasible so the following hook is provided in ``Exportable`` to run those: | |
| .. code-block:: Python | |
| def _prepare_for_export(self, **kwargs): | |
| """ | |
| Override this method to prepare module for export. This is in-place operation. | |
| Base version does common necessary module replacements (Apex etc) | |
| """ | |
| # do graph modifications specific for this model | |
| replace_1D_2D = kwargs.get('replace_1D_2D', False) | |
| replace_for_export(self, replace_1D_2D) | |
| # call base method for common set of modifications | |
| Exportable._prepare_for_export(self, **kwargs) | |
| Exportable Model Code | |
| ~~~~~~~~~~~~~~~~~~~~~ | |
| Most importantly, the actual Torch code in your model should be ONNX or TorchScript - compatible (ideally, both). | |
| #. Ensure the code is written in Torch - avoid bare `Numpy or Python operands <https://pytorch.org/docs/stable/onnx.html#write-pytorch-model-in-torch-way>`_. | |
| #. Create your model ``Exportable`` and add an export unit test, to catch any operation/construct not supported in ONNX/TorchScript, immediately. | |
| For more information, refer to the PyTorch documentation: | |
| - `List of supported operators <https://pytorch.org/docs/stable/onnx.html#supported-operators>`_ | |
| - `Tracing vs. scripting <https://pytorch.org/docs/stable/onnx.html#tracing-vs-scripting>`_ | |
| - `AlexNet example <https://pytorch.org/docs/stable/onnx.html#example-end-to-end-alexnet-from-pytorch-to-onnx>`_ | |