ranjit-task-logs-analysis / docs /source /model-export /export-ncnn-conv-emformer.rst

Upload icefall experiment results and logs

d596074 verified 21 days ago

28.4 kB

	.. _export_conv_emformer_transducer_models_to_ncnn:

	Export ConvEmformer transducer models to ncnn
	=============================================

	We use the pre-trained model from the following repository as an example:

	- `<https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_

	We will show you step by step how to export it to `ncnn`_ and run it with `sherpa-ncnn`_.

	.. hint::

	We use ``Ubuntu 18.04``, ``torch 1.13``, and ``Python 3.8`` for testing.

	.. caution::

	``torch > 2.0`` may not work. If you get errors while building pnnx, please switch
	to ``torch < 2.0``.

	1. Download the pre-trained model
	---------------------------------

	.. hint::

	You can also refer to `<https://k2-fsa.github.io/sherpa/cpp/pretrained_models/online_transducer.html#icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_ to download the pre-trained model.

	You have to install `git-lfs`_ before you continue.

	.. code-block:: bash

	cd egs/librispeech/ASR

	GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
	cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05

	git lfs pull --include "exp/pretrained-epoch-30-avg-10-averaged.pt"
	git lfs pull --include "data/lang_bpe_500/bpe.model"

	cd ..

	.. note::

	We downloaded ``exp/pretrained-xxx.pt``, not ``exp/cpu-jit_xxx.pt``.


	In the above code, we downloaded the pre-trained model into the directory
	``egs/librispeech/ASR/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05``.

	.. _export_for_ncnn_install_ncnn_and_pnnx:

	2. Install ncnn and pnnx
	------------------------

	.. code-block:: bash

	# We put ncnn into $HOME/open-source/ncnn
	# You can change it to anywhere you like

	cd $HOME
	mkdir -p open-source
	cd open-source

	git clone https://github.com/csukuangfj/ncnn
	cd ncnn
	git submodule update --recursive --init

	# Note: We don't use "python setup.py install" or "pip install ." here

	mkdir -p build-wheel
	cd build-wheel

	cmake \
	-DCMAKE_BUILD_TYPE=Release \
	-DNCNN_PYTHON=ON \
	-DNCNN_BUILD_BENCHMARK=OFF \
	-DNCNN_BUILD_EXAMPLES=OFF \
	-DNCNN_BUILD_TOOLS=ON \
	..

	make -j4

	cd ..

	# Note: $PWD here is $HOME/open-source/ncnn

	export PYTHONPATH=$PWD/python:$PYTHONPATH
	export PATH=$PWD/tools/pnnx/build/src:$PATH
	export PATH=$PWD/build-wheel/tools/quantize:$PATH

	# Now build pnnx
	cd tools/pnnx
	mkdir build
	cd build
	cmake ..
	make -j4

	./src/pnnx

	Congratulations! You have successfully installed the following components:

	- ``pnnx``, which is an executable located in
	``$HOME/open-source/ncnn/tools/pnnx/build/src``. We will use
	it to convert models exported by ``torch.jit.trace()``.
	- ``ncnn2int8``, which is an executable located in
	``$HOME/open-source/ncnn/build-wheel/tools/quantize``. We will use
	it to quantize our models to ``int8``.
	- ``ncnn.cpython-38-x86_64-linux-gnu.so``, which is a Python module located
	in ``$HOME/open-source/ncnn/python/ncnn``.

	.. note::

	I am using ``Python 3.8``, so it
	is ``ncnn.cpython-38-x86_64-linux-gnu.so``. If you use a different
	version, say, ``Python 3.9``, the name would be
	``ncnn.cpython-39-x86_64-linux-gnu.so``.

	Also, if you are not using Linux, the file name would also be different.
	But that does not matter. As long as you can compile it, it should work.

	We have set up ``PYTHONPATH`` so that you can use ``import ncnn`` in your
	Python code. We have also set up ``PATH`` so that you can use
	``pnnx`` and ``ncnn2int8`` later in your terminal.

	.. caution::

	Please don't use `<https://github.com/tencent/ncnn>`_.
	We have made some modifications to the official `ncnn`_.

	We will synchronize `<https://github.com/csukuangfj/ncnn>`_ periodically
	with the official one.

	3. Export the model via torch.jit.trace()
	-----------------------------------------

	First, let us rename our pre-trained model:

	.. code-block::

	cd egs/librispeech/ASR

	cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp

	ln -s pretrained-epoch-30-avg-10-averaged.pt epoch-30.pt

	cd ../..

	Next, we use the following code to export our model:

	.. code-block:: bash

	dir=./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/

	./conv_emformer_transducer_stateless2/export-for-ncnn.py \
	--exp-dir $dir/exp \
	--tokens $dir/data/lang_bpe_500/tokens.txt \
	--epoch 30 \
	--avg 1 \
	--use-averaged-model 0 \
	--num-encoder-layers 12 \
	--chunk-length 32 \
	--cnn-module-kernel 31 \
	--left-context-length 32 \
	--right-context-length 8 \
	--memory-size 32 \
	--encoder-dim 512

	.. caution::

	If your model has different configuration parameters, please change them accordingly.

	.. hint::

	We have renamed our model to ``epoch-30.pt`` so that we can use ``--epoch 30``.
	There is only one pre-trained model, so we use ``--avg 1 --use-averaged-model 0``.

	If you have trained a model by yourself and if you have all checkpoints
	available, please first use ``decode.py`` to tune ``--epoch --avg``
	and select the best combination with with ``--use-averaged-model 1``.

	.. note::

	You will see the following log output:

	.. literalinclude:: ./code/export-conv-emformer-transducer-for-ncnn-output.txt

	The log shows the model has ``75490012`` parameters, i.e., ``~75 M``.

	.. code-block::

	ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pretrained-epoch-30-avg-10-averaged.pt

	-rw-r--r-- 1 kuangfangjun root 289M Jan 11 12:05 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pretrained-epoch-30-avg-10-averaged.pt

	You can see that the file size of the pre-trained model is ``289 MB``, which
	is roughly equal to ``75490012*4/1024/1024 = 287.97 MB``.

	After running ``conv_emformer_transducer_stateless2/export-for-ncnn.py``,
	we will get the following files:

	.. code-block:: bash

	ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pnnx

	-rw-r--r-- 1 kuangfangjun root 1010K Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.pt
	-rw-r--r-- 1 kuangfangjun root 283M Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.pt
	-rw-r--r-- 1 kuangfangjun root 3.0M Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.pt


	.. _conv-emformer-step-4-export-torchscript-model-via-pnnx:

	4. Export torchscript model via pnnx
	------------------------------------

	.. hint::

	Make sure you have set up the ``PATH`` environment variable. Otherwise,
	it will throw an error saying that ``pnnx`` could not be found.

	Now, it's time to export our models to `ncnn`_ via ``pnnx``.

	.. code-block::

	cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/

	pnnx ./encoder_jit_trace-pnnx.pt
	pnnx ./decoder_jit_trace-pnnx.pt
	pnnx ./joiner_jit_trace-pnnx.pt

	It will generate the following files:

	.. code-block:: bash

	ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/ncnn{bin,param}

	-rw-r--r-- 1 kuangfangjun root 503K Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin
	-rw-r--r-- 1 kuangfangjun root 437 Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param
	-rw-r--r-- 1 kuangfangjun root 142M Jan 11 12:36 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin
	-rw-r--r-- 1 kuangfangjun root 79K Jan 11 12:36 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param
	-rw-r--r-- 1 kuangfangjun root 1.5M Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin
	-rw-r--r-- 1 kuangfangjun root 488 Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param

	There are two types of files:

	- ``param``: It is a text file containing the model architectures. You can
	use a text editor to view its content.
	- ``bin``: It is a binary file containing the model parameters.

	We compare the file sizes of the models below before and after converting via ``pnnx``:

	.. see https://tableconvert.com/restructuredtext-generator

	+----------------------------------+------------+
	\| File name \| File size \|
	+==================================+============+
	\| encoder_jit_trace-pnnx.pt \| 283 MB \|
	+----------------------------------+------------+
	\| decoder_jit_trace-pnnx.pt \| 1010 KB \|
	+----------------------------------+------------+
	\| joiner_jit_trace-pnnx.pt \| 3.0 MB \|
	+----------------------------------+------------+
	\| encoder_jit_trace-pnnx.ncnn.bin \| 142 MB \|
	+----------------------------------+------------+
	\| decoder_jit_trace-pnnx.ncnn.bin \| 503 KB \|
	+----------------------------------+------------+
	\| joiner_jit_trace-pnnx.ncnn.bin \| 1.5 MB \|
	+----------------------------------+------------+

	You can see that the file sizes of the models after conversion are about one half
	of the models before conversion:

	- encoder: 283 MB vs 142 MB
	- decoder: 1010 KB vs 503 KB
	- joiner: 3.0 MB vs 1.5 MB

	The reason is that by default ``pnnx`` converts ``float32`` parameters
	to ``float16``. A ``float32`` parameter occupies 4 bytes, while it is 2 bytes
	for ``float16``. Thus, it is ``twice smaller`` after conversion.

	.. hint::

	If you use ``pnnx ./encoder_jit_trace-pnnx.pt fp16=0``, then ``pnnx``
	won't convert ``float32`` to ``float16``.

	5. Test the exported models in icefall
	--------------------------------------

	.. note::

	We assume you have set up the environment variable ``PYTHONPATH`` when
	building `ncnn`_.

	Now we have successfully converted our pre-trained model to `ncnn`_ format.
	The generated 6 files are what we need. You can use the following code to
	test the converted models:

	.. code-block:: bash

	./conv_emformer_transducer_stateless2/streaming-ncnn-decode.py \
	--tokens ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/data/lang_bpe_500/tokens.txt \
	--encoder-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param \
	--encoder-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin \
	--decoder-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param \
	--decoder-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin \
	--joiner-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param \
	--joiner-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin \
	./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/test_wavs/1089-134686-0001.wav

	.. hint::

	`ncnn`_ supports only ``batch size == 1``, so ``streaming-ncnn-decode.py`` accepts
	only 1 wave file as input.

	The output is given below:

	.. literalinclude:: ./code/test-streaming-ncnn-decode-conv-emformer-transducer-libri.txt

	Congratulations! You have successfully exported a model from PyTorch to `ncnn`_!


	.. _conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn:

	6. Modify the exported encoder for sherpa-ncnn
	----------------------------------------------

	In order to use the exported models in `sherpa-ncnn`_, we have to modify
	``encoder_jit_trace-pnnx.ncnn.param``.

	Let us have a look at the first few lines of ``encoder_jit_trace-pnnx.ncnn.param``:

	.. code-block::

	7767517
	1060 1342
	Input in0 0 1 in0

	Explanation of the above three lines:

	1. ``7767517``, it is a magic number and should not be changed.
	2. ``1060 1342``, the first number ``1060`` specifies the number of layers
	in this file, while ``1342`` specifies the number of intermediate outputs
	of this file
	3. ``Input in0 0 1 in0``, ``Input`` is the layer type of this layer; ``in0``
	is the layer name of this layer; ``0`` means this layer has no input;
	``1`` means this layer has one output; ``in0`` is the output name of
	this layer.

	We need to add 1 extra line and also increment the number of layers.
	The result looks like below:

	.. code-block:: bash

	7767517
	1061 1342
	SherpaMetaData sherpa_meta_data1 0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512
	Input in0 0 1 in0

	Explanation

	1. ``7767517``, it is still the same
	2. ``1061 1342``, we have added an extra layer, so we need to update ``1060`` to ``1061``.
	We don't need to change ``1342`` since the newly added layer has no inputs or outputs.
	3. ``SherpaMetaData sherpa_meta_data1 0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512``
	This line is newly added. Its explanation is given below:

	- ``SherpaMetaData`` is the type of this layer. Must be ``SherpaMetaData``.
	- ``sherpa_meta_data1`` is the name of this layer. Must be ``sherpa_meta_data1``.
	- ``0 0`` means this layer has no inputs or output. Must be ``0 0``
	- ``0=1``, 0 is the key and 1 is the value. MUST be ``0=1``
	- ``1=12``, 1 is the key and 12 is the value of the
	parameter ``--num-encoder-layers`` that you provided when running
	``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
	- ``2=32``, 2 is the key and 32 is the value of the
	parameter ``--memory-size`` that you provided when running
	``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
	- ``3=31``, 3 is the key and 31 is the value of the
	parameter ``--cnn-module-kernel`` that you provided when running
	``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
	- ``4=8``, 4 is the key and 8 is the value of the
	parameter ``--left-context-length`` that you provided when running
	``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
	- ``5=32``, 5 is the key and 32 is the value of the
	parameter ``--chunk-length`` that you provided when running
	``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
	- ``6=8``, 6 is the key and 8 is the value of the
	parameter ``--right-context-length`` that you provided when running
	``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
	- ``7=512``, 7 is the key and 512 is the value of the
	parameter ``--encoder-dim`` that you provided when running
	``conv_emformer_transducer_stateless2/export-for-ncnn.py``.

	For ease of reference, we list the key-value pairs that you need to add
	in the following table. If your model has a different setting, please
	change the values for ``SherpaMetaData`` accordingly. Otherwise, you
	will be ``SAD``.

	+------+-----------------------------+
	\| key \| value \|
	+======+=============================+
	\| 0 \| 1 (fixed) \|
	+------+-----------------------------+
	\| 1 \| ``--num-encoder-layers`` \|
	+------+-----------------------------+
	\| 2 \| ``--memory-size`` \|
	+------+-----------------------------+
	\| 3 \| ``--cnn-module-kernel`` \|
	+------+-----------------------------+
	\| 4 \| ``--left-context-length`` \|
	+------+-----------------------------+
	\| 5 \| ``--chunk-length`` \|
	+------+-----------------------------+
	\| 6 \| ``--right-context-length`` \|
	+------+-----------------------------+
	\| 7 \| ``--encoder-dim`` \|
	+------+-----------------------------+

	4. ``Input in0 0 1 in0``. No need to change it.

	.. caution::

	When you add a new layer ``SherpaMetaData``, please remember to update the
	number of layers. In our case, update ``1060`` to ``1061``. Otherwise,
	you will be SAD later.

	.. hint::

	After adding the new layer ``SherpaMetaData``, you cannot use this model
	with ``streaming-ncnn-decode.py`` anymore since ``SherpaMetaData`` is
	supported only in `sherpa-ncnn`_.

	.. hint::

	`ncnn`_ is very flexible. You can add new layers to it just by text-editing
	the ``param`` file! You don't need to change the ``bin`` file.

	Now you can use this model in `sherpa-ncnn`_.
	Please refer to the following documentation:

	- Linux/macOS/Windows/arm/aarch64: `<https://k2-fsa.github.io/sherpa/ncnn/install/index.html>`_
	- ``Android``: `<https://k2-fsa.github.io/sherpa/ncnn/android/index.html>`_
	- ``iOS``: `<https://k2-fsa.github.io/sherpa/ncnn/ios/index.html>`_
	- Python: `<https://k2-fsa.github.io/sherpa/ncnn/python/index.html>`_

	We have a list of pre-trained models that have been exported for `sherpa-ncnn`_:

	- `<https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html>`_

	You can find more usages there.

	7. (Optional) int8 quantization with sherpa-ncnn
	------------------------------------------------

	This step is optional.

	In this step, we describe how to quantize our model with ``int8``.

	Change :ref:`conv-emformer-step-4-export-torchscript-model-via-pnnx` to
	disable ``fp16`` when using ``pnnx``:

	.. code-block::

	cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/

	pnnx ./encoder_jit_trace-pnnx.pt fp16=0
	pnnx ./decoder_jit_trace-pnnx.pt
	pnnx ./joiner_jit_trace-pnnx.pt fp16=0

	.. note::

	We add ``fp16=0`` when exporting the encoder and joiner. `ncnn`_ does not
	support quantizing the decoder model yet. We will update this documentation
	once `ncnn`_ supports it. (Maybe in this year, 2023).

	It will generate the following files

	.. code-block:: bash

	ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*_jit_trace-pnnx.ncnn.{param,bin}

	-rw-r--r-- 1 kuangfangjun root 503K Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin
	-rw-r--r-- 1 kuangfangjun root 437 Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param
	-rw-r--r-- 1 kuangfangjun root 283M Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin
	-rw-r--r-- 1 kuangfangjun root 79K Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param
	-rw-r--r-- 1 kuangfangjun root 3.0M Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin
	-rw-r--r-- 1 kuangfangjun root 488 Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param

	Let us compare again the file sizes:

	+----------------------------------------+------------+
	\| File name \| File size \|
	+----------------------------------------+------------+
	\| encoder_jit_trace-pnnx.pt \| 283 MB \|
	+----------------------------------------+------------+
	\| decoder_jit_trace-pnnx.pt \| 1010 KB \|
	+----------------------------------------+------------+
	\| joiner_jit_trace-pnnx.pt \| 3.0 MB \|
	+----------------------------------------+------------+
	\| encoder_jit_trace-pnnx.ncnn.bin (fp16) \| 142 MB \|
	+----------------------------------------+------------+
	\| decoder_jit_trace-pnnx.ncnn.bin (fp16) \| 503 KB \|
	+----------------------------------------+------------+
	\| joiner_jit_trace-pnnx.ncnn.bin (fp16) \| 1.5 MB \|
	+----------------------------------------+------------+
	\| encoder_jit_trace-pnnx.ncnn.bin (fp32) \| 283 MB \|
	+----------------------------------------+------------+
	\| joiner_jit_trace-pnnx.ncnn.bin (fp32) \| 3.0 MB \|
	+----------------------------------------+------------+

	You can see that the file sizes are doubled when we disable ``fp16``.

	.. note::

	You can again use ``streaming-ncnn-decode.py`` to test the exported models.

	Next, follow :ref:`conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn`
	to modify ``encoder_jit_trace-pnnx.ncnn.param``.

	Change

	.. code-block:: bash

	7767517
	1060 1342
	Input in0 0 1 in0

	to

	.. code-block:: bash

	7767517
	1061 1342
	SherpaMetaData sherpa_meta_data1 0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512
	Input in0 0 1 in0

	.. caution::

	Please follow :ref:`conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn`
	to change the values for ``SherpaMetaData`` if your model uses a different setting.


	Next, let us compile `sherpa-ncnn`_ since we will quantize our models within
	`sherpa-ncnn`_.

	.. code-block:: bash

	# We will download sherpa-ncnn to $HOME/open-source/
	# You can change it to anywhere you like.
	cd $HOME
	mkdir -p open-source

	cd open-source
	git clone https://github.com/k2-fsa/sherpa-ncnn
	cd sherpa-ncnn
	mkdir build
	cd build
	cmake ..
	make -j 4

	./bin/generate-int8-scale-table

	export PATH=$HOME/open-source/sherpa-ncnn/build/bin:$PATH

	The output of the above commands are:

	.. code-block:: bash

	(py38) kuangfangjun:build$ generate-int8-scale-table
	Please provide 10 arg. Currently given: 1
	Usage:
	generate-int8-scale-table encoder.param encoder.bin decoder.param decoder.bin joiner.param joiner.bin encoder-scale-table.txt joiner-scale-table.txt wave_filenames.txt

	Each line in wave_filenames.txt is a path to some 16k Hz mono wave file.

	We need to create a file ``wave_filenames.txt``, in which we need to put
	some calibration wave files. For testing purpose, we put the ``test_wavs``
	from the pre-trained model repository `<https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_

	.. code-block:: bash

	cd egs/librispeech/ASR
	cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/

	cat <<EOF > wave_filenames.txt
	../test_wavs/1089-134686-0001.wav
	../test_wavs/1221-135766-0001.wav
	../test_wavs/1221-135766-0002.wav
	EOF

	Now we can calculate the scales needed for quantization with the calibration data:

	.. code-block:: bash

	cd egs/librispeech/ASR
	cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/

	generate-int8-scale-table \
	./encoder_jit_trace-pnnx.ncnn.param \
	./encoder_jit_trace-pnnx.ncnn.bin \
	./decoder_jit_trace-pnnx.ncnn.param \
	./decoder_jit_trace-pnnx.ncnn.bin \
	./joiner_jit_trace-pnnx.ncnn.param \
	./joiner_jit_trace-pnnx.ncnn.bin \
	./encoder-scale-table.txt \
	./joiner-scale-table.txt \
	./wave_filenames.txt

	The output logs are in the following:

	.. literalinclude:: ./code/generate-int-8-scale-table-for-conv-emformer.txt

	It generates the following two files:

	.. code-block:: bash

	$ ls -lh encoder-scale-table.txt joiner-scale-table.txt
	-rw-r--r-- 1 kuangfangjun root 955K Jan 11 17:28 encoder-scale-table.txt
	-rw-r--r-- 1 kuangfangjun root 18K Jan 11 17:28 joiner-scale-table.txt

	.. caution::

	Definitely, you need more calibration data to compute the scale table.

	Finally, let us use the scale table to quantize our models into ``int8``.

	.. code-block:: bash

	ncnn2int8

	usage: ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration table]

	First, we quantize the encoder model:

	.. code-block:: bash

	cd egs/librispeech/ASR
	cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/

	ncnn2int8 \
	./encoder_jit_trace-pnnx.ncnn.param \
	./encoder_jit_trace-pnnx.ncnn.bin \
	./encoder_jit_trace-pnnx.ncnn.int8.param \
	./encoder_jit_trace-pnnx.ncnn.int8.bin \
	./encoder-scale-table.txt

	Next, we quantize the joiner model:

	.. code-block:: bash

	ncnn2int8 \
	./joiner_jit_trace-pnnx.ncnn.param \
	./joiner_jit_trace-pnnx.ncnn.bin \
	./joiner_jit_trace-pnnx.ncnn.int8.param \
	./joiner_jit_trace-pnnx.ncnn.int8.bin \
	./joiner-scale-table.txt

	The above two commands generate the following 4 files:

	.. code-block:: bash

	-rw-r--r-- 1 kuangfangjun root 99M Jan 11 17:34 encoder_jit_trace-pnnx.ncnn.int8.bin
	-rw-r--r-- 1 kuangfangjun root 78K Jan 11 17:34 encoder_jit_trace-pnnx.ncnn.int8.param
	-rw-r--r-- 1 kuangfangjun root 774K Jan 11 17:35 joiner_jit_trace-pnnx.ncnn.int8.bin
	-rw-r--r-- 1 kuangfangjun root 496 Jan 11 17:35 joiner_jit_trace-pnnx.ncnn.int8.param

	Congratulations! You have successfully quantized your model from ``float32`` to ``int8``.

	.. caution::

	``ncnn.int8.param`` and ``ncnn.int8.bin`` must be used in pairs.

	You can replace ``ncnn.param`` and ``ncnn.bin`` with ``ncnn.int8.param``
	and ``ncnn.int8.bin`` in `sherpa-ncnn`_ if you like.

	For instance, to use only the ``int8`` encoder in ``sherpa-ncnn``, you can
	replace the following invocation:

	.. code-block:: bash

	cd egs/librispeech/ASR
	cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/

	sherpa-ncnn \
	../data/lang_bpe_500/tokens.txt \
	./encoder_jit_trace-pnnx.ncnn.param \
	./encoder_jit_trace-pnnx.ncnn.bin \
	./decoder_jit_trace-pnnx.ncnn.param \
	./decoder_jit_trace-pnnx.ncnn.bin \
	./joiner_jit_trace-pnnx.ncnn.param \
	./joiner_jit_trace-pnnx.ncnn.bin \
	../test_wavs/1089-134686-0001.wav

	with

	.. code-block::

	cd egs/librispeech/ASR
	cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/

	sherpa-ncnn \
	../data/lang_bpe_500/tokens.txt \
	./encoder_jit_trace-pnnx.ncnn.int8.param \
	./encoder_jit_trace-pnnx.ncnn.int8.bin \
	./decoder_jit_trace-pnnx.ncnn.param \
	./decoder_jit_trace-pnnx.ncnn.bin \
	./joiner_jit_trace-pnnx.ncnn.param \
	./joiner_jit_trace-pnnx.ncnn.bin \
	../test_wavs/1089-134686-0001.wav


	The following table compares again the file sizes:


	+----------------------------------------+------------+
	\| File name \| File size \|
	+----------------------------------------+------------+
	\| encoder_jit_trace-pnnx.pt \| 283 MB \|
	+----------------------------------------+------------+
	\| decoder_jit_trace-pnnx.pt \| 1010 KB \|
	+----------------------------------------+------------+
	\| joiner_jit_trace-pnnx.pt \| 3.0 MB \|
	+----------------------------------------+------------+
	\| encoder_jit_trace-pnnx.ncnn.bin (fp16) \| 142 MB \|
	+----------------------------------------+------------+
	\| decoder_jit_trace-pnnx.ncnn.bin (fp16) \| 503 KB \|
	+----------------------------------------+------------+
	\| joiner_jit_trace-pnnx.ncnn.bin (fp16) \| 1.5 MB \|
	+----------------------------------------+------------+
	\| encoder_jit_trace-pnnx.ncnn.bin (fp32) \| 283 MB \|
	+----------------------------------------+------------+
	\| joiner_jit_trace-pnnx.ncnn.bin (fp32) \| 3.0 MB \|
	+----------------------------------------+------------+
	\| encoder_jit_trace-pnnx.ncnn.int8.bin \| 99 MB \|
	+----------------------------------------+------------+
	\| joiner_jit_trace-pnnx.ncnn.int8.bin \| 774 KB \|
	+----------------------------------------+------------+

	You can see that the file sizes of the model after ``int8`` quantization
	are much smaller.

	.. hint::

	Currently, only linear layers and convolutional layers are quantized
	with ``int8``, so you don't see an exact ``4x`` reduction in file sizes.

	.. note::

	You need to test the recognition accuracy after ``int8`` quantization.

	You can find the speed comparison at `<https://github.com/k2-fsa/sherpa-ncnn/issues/44>`_.


	That's it! Have fun with `sherpa-ncnn`_!