bookbot
/

zipformer-streaming-robust-es-v0

Automatic Speech Recognition

icefall

phoneme-recognition

Model card Files Files and versions

xet

Community

Davidsamuel101 commited on Nov 19, 2025

Commit

e63b79b

verified ·

1 Parent(s): f9839a1

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +23 -3

README.md CHANGED Viewed

@@ -21,11 +21,14 @@ Instead of being trained to predict sequences of words, this model was trained t
 This model was trained using [icefall](https://github.com/k2-fsa/icefall) framework. All training was done on 2 NVIDIA RTX 4090 GPUs. All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/bookbot/zipformer-streaming-robust-es-v0/tree/main) tab, as well as the [Training metrics](https://huggingface.co/bookbot/zipformer-streaming-robust-es-v0/tensorboard) logged via Tensorboard.
 ## Setup
 To set up all the necessary packages, please follow the installation instructions from the official icefall [documentation](https://icefall.readthedocs.io/en/latest/installation/index.html).
 When cloning the icefall repo, make sure to clone our fork of icefall `git clone https://github.com/bookbot-hive/icefall` instead of the original.
 ### Download Pre-trained Model
 Once you've installed all the necessary packages, follow the steps below
 ```sh
 cd egs/bookbot_es/ASR
 mkdir tmp
@@ -43,7 +46,7 @@ cd ..
 for m in greedy_search fast_beam_search modified_beam_search; do
   ./zipformer/streaming_decode.py \
     --epoch 80 \
-    --avg 3 \
     --causal 1 \
     --num-encoder-layers 2,2,2,2,2,2 \
     --feedforward-dim 512,768,768,768,768,768 \
@@ -165,11 +168,13 @@ export CUDA_VISIBLE_DEVICES="0,1"
 ```
 ### Exporting to ONNX
 To export the trained model to onnx run:
 ```
 ./zipformer/export-onnx-streaming.py \
     --tokens data/lang_phone/tokens.txt \
-    --avg 3 \
     --causal 1 \
     --exp-dir tmp/zipformer-streaming-robust-es-v0 \
     --num-encoder-layers 2,2,2,2,2,2 \
@@ -181,14 +186,29 @@ To export the trained model to onnx run:
     --use-transducer True \
     --epoch 80 \
 ```
 It will store the ONNX files inside the specified `exp-dir`.
 ### Converting ONNX to ORT
 ```
 cd tmp/zipformer-streaming-robust-es-v0
 python -m onnxruntime.tools.convert_onnx_models_to_ort --optimization_style=Fixed .
 ```
-Upon running the code above, it will store the model in the ORT format along with the efficient int8 quantized version of the model.
 ## Frameworks

 This model was trained using [icefall](https://github.com/k2-fsa/icefall) framework. All training was done on 2 NVIDIA RTX 4090 GPUs. All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/bookbot/zipformer-streaming-robust-es-v0/tree/main) tab, as well as the [Training metrics](https://huggingface.co/bookbot/zipformer-streaming-robust-es-v0/tensorboard) logged via Tensorboard.
 ## Setup
 To set up all the necessary packages, please follow the installation instructions from the official icefall [documentation](https://icefall.readthedocs.io/en/latest/installation/index.html).
 When cloning the icefall repo, make sure to clone our fork of icefall `git clone https://github.com/bookbot-hive/icefall` instead of the original.
 ### Download Pre-trained Model
 Once you've installed all the necessary packages, follow the steps below
 ```sh
 cd egs/bookbot_es/ASR
 mkdir tmp
 for m in greedy_search fast_beam_search modified_beam_search; do
   ./zipformer/streaming_decode.py \
     --epoch 80 \
+    --avg 5 \
     --causal 1 \
     --num-encoder-layers 2,2,2,2,2,2 \
     --feedforward-dim 512,768,768,768,768,768 \
 ```
 ### Exporting to ONNX
 To export the trained model to onnx run:
 ```
 ./zipformer/export-onnx-streaming.py \
     --tokens data/lang_phone/tokens.txt \
+    --avg 5 \
     --causal 1 \
     --exp-dir tmp/zipformer-streaming-robust-es-v0 \
     --num-encoder-layers 2,2,2,2,2,2 \
     --use-transducer True \
     --epoch 80 \
 ```
 It will store the ONNX files inside the specified `exp-dir`.
 ### Converting ONNX to ORT
 ```
 cd tmp/zipformer-streaming-robust-es-v0
 python -m onnxruntime.tools.convert_onnx_models_to_ort --optimization_style=Fixed .
 ```
+Upon running the code above, it will convert the ONNX files to the ORT format along with the efficient int8 quantized versions. The following files will be generated:
+**Standard ORT files:**
+- `encoder-epoch-80-avg-5-chunk-16-left-128.ort`
+- `decoder-epoch-80-avg-5-chunk-16-left-128.ort`
+- `joiner-epoch-80-avg-5-chunk-16-left-128.ort`
+**INT8 Quantized ORT files:**
+- `encoder-epoch-80-avg-5-chunk-16-left-128.int8.ort`
+- `decoder-epoch-80-avg-5-chunk-16-left-128.int8.ort`
+- `joiner-epoch-80-avg-5-chunk-16-left-128.int8.ort`
 ## Frameworks