| # Vietnamese Telephony ASR Portable Bundle |
|
|
| This directory is a self-contained inference bundle for long-form Vietnamese telephony speech recognition. |
| It includes the runtime code, the exported ASR checkpoint, the punctuation/casing restoration checkpoint, and the Ocany domain hotwords file required for decoding. |
|
|
| The bundle is intended for deployment or handoff to another machine with minimal setup. |
|
|
| ## What Is Included |
|
|
| - Vietnamese telephony ASR model checkpoint: |
| `outputs/results64k_wav2vec2_cleaned_v2_telaug30_longer/final` |
| - Tourmii punctuation and casing restoration checkpoint: |
| `tourmii_pnc_finetune/outputs/tourmii_pnc_domain_v1/final` |
| - Minimal inference scripts: |
| `run_infer.sh`, `scripts/run_longform_infer.py`, `scripts/infer_ctc.py` |
| - Supporting runtime package: |
| `src/asr_vi_wav2vec2_telephony/` |
| - Domain hotwords: |
| `configs/ocany_hotwords.txt` |
|
|
| ## Directory Layout |
|
|
| ```text |
| asr_vi_wav2vec2_telephony_portable/ |
| ├── configs/ |
| │ └── ocany_hotwords.txt |
| ├── outputs/ |
| │ └── results64k_wav2vec2_cleaned_v2_telaug30_longer/ |
| │ └── final/ |
| ├── requirements.infer.txt |
| ├── run_infer.sh |
| ├── scripts/ |
| │ ├── infer_ctc.py |
| │ └── run_longform_infer.py |
| ├── src/ |
| │ └── asr_vi_wav2vec2_telephony/ |
| └── tourmii_pnc_finetune/ |
| └── outputs/ |
| └── tourmii_pnc_domain_v1/ |
| └── final/ |
| ``` |
|
|
| ## System Requirements |
|
|
| - Python 3.10 or newer |
| - `ffmpeg` available in the system `PATH` |
| - Linux or another environment that can run `bash` |
| - Optional GPU support: |
| install a `torch` and `torchaudio` build compatible with the target machine's CUDA runtime |
|
|
| ## Setup |
|
|
| Create a fresh virtual environment inside the bundle and install the inference dependencies: |
|
|
| ```bash |
| cd /path/to/asr_vi_wav2vec2_telephony_portable |
| python3 -m venv .venv |
| source .venv/bin/activate |
| pip install --upgrade pip |
| pip install -r requirements.infer.txt |
| ``` |
|
|
| If you plan to run on GPU, make sure the installed `torch` and `torchaudio` packages match the target machine's CUDA version. Do not assume a virtual environment created on one machine will work correctly on another. |
|
|
| ## Quick Start |
|
|
| The shortest command is: |
|
|
| ```bash |
| cd /path/to/asr_vi_wav2vec2_telephony_portable |
| bash run_infer.sh /path/to/audio.wav |
| ``` |
|
|
| You can also explicitly choose the Python interpreter: |
|
|
| ```bash |
| cd /path/to/asr_vi_wav2vec2_telephony_portable |
| PYTHON_BIN=/usr/bin/python3 bash run_infer.sh /path/to/audio.wav |
| ``` |
|
|
| ## Default Runtime Behavior |
|
|
| By default, the bundle will: |
|
|
| - load the ASR model from |
| `outputs/results64k_wav2vec2_cleaned_v2_telaug30_longer/final` |
| - enable beam-search decoding |
| - load hotwords from `configs/ocany_hotwords.txt` |
| - use `--hotword-weight 4` |
| - enable punctuation and casing restoration |
| - load the local Tourmii PnC model from |
| `tourmii_pnc_finetune/outputs/tourmii_pnc_domain_v1/final` |
| - write outputs to `tmp/<audio_stem>_guard_test` |
|
|
| For an input file named `call_2332.wav`, the output directory will be: |
|
|
| ```text |
| tmp/call_2332_guard_test |
| ``` |
|
|
| The output bundle typically contains: |
|
|
| - `result.json`: structured inference output and metadata |
| - `result.txt`: best final text output |
| - `result.png`: waveform visualization with transcript context |
| - `<audio_stem>.prepared.wav`: converted and normalized working audio |
|
|
| ## Advanced Usage |
|
|
| Use the Python entry point if you need to override defaults: |
|
|
| ```bash |
| cd /path/to/asr_vi_wav2vec2_telephony_portable |
| python3 scripts/run_longform_infer.py \ |
| --audio /data/call_2332.wav \ |
| --output-dir ./tmp/custom_out \ |
| --hotword-weight 2 \ |
| --pnc-device cpu |
| ``` |
|
|
| Useful options include: |
|
|
| - `--device {auto,cpu,cuda}`: ASR inference device |
| - `--pnc-device {auto,cpu,cuda}`: punctuation model device |
| - `--disable-beam-search`: use greedy decoding only |
| - `--disable-vad`: disable VAD-based chunking |
| - `--hotwords-file /path/to/file.txt`: override the hotwords list |
| - `--model-path /path/to/final`: override the ASR checkpoint |
| - `--pnc-model-id /path/to/final`: override the punctuation checkpoint |
| - `--output-dir /path/to/out`: write outputs to a custom directory |
|
|
| ## Running on Another Machine |
|
|
| This bundle is portable because the default model paths are resolved relative to the bundle root. If you copy the entire directory, the default commands continue to work without editing any paths. |
|
|
| Recommended handoff workflow: |
|
|
| 1. Copy the whole `asr_vi_wav2vec2_telephony_portable/` directory to the target machine. |
| 2. Do not rely on an existing `.venv` copied from the source machine. |
| 3. Install system-level prerequisites on the target machine: |
| Python, `ffmpeg`, and GPU drivers if needed. |
| 4. Create a new virtual environment on the target machine. |
| 5. Install dependencies from `requirements.infer.txt`. |
| 6. Run inference with `bash run_infer.sh /path/to/audio.wav`. |
|
|
| Example: |
|
|
| ```bash |
| scp -r asr_vi_wav2vec2_telephony_portable user@target:/opt/ |
| |
| ssh user@target |
| cd /opt/asr_vi_wav2vec2_telephony_portable |
| python3 -m venv .venv |
| source .venv/bin/activate |
| pip install --upgrade pip |
| pip install -r requirements.infer.txt |
| bash run_infer.sh /data/sample_call.wav |
| ``` |
|
|
| Important portability notes: |
|
|
| - Copy the entire bundle, not just `model.safetensors`. |
| - Hugging Face checkpoints require the full exported directory, including files such as `config.json`, tokenizer files, and preprocessing metadata. |
| - Rebuild the Python environment on the target machine instead of copying one from another host. |
| - If the target machine has no GPU, run on CPU by default or pass `--device cpu --pnc-device cpu`. |
| - If the target machine has restricted internet access, prepare the Python wheels in advance; the model files themselves are already included locally in this bundle. |
|
|
| ## Minimum Files Required |
|
|
| If someone asks whether a single checkpoint file is enough, the answer is no. |
|
|
| At minimum, you need: |
|
|
| - the full ASR exported model directory under `final/` |
| - the full PnC exported model directory under `final/` |
| - the inference scripts |
| - the `src/` runtime package |
| - the hotwords file if beam-search decoding should preserve domain terms |
|
|
| If the destination machine already has the exact same inference code and directory conventions, you may copy only the ASR and PnC model directories. In practice, shipping the full bundle is safer and reduces setup errors. |
|
|
| ## Troubleshooting |
|
|
| `ffmpeg: command not found` |
|
|
| - Install `ffmpeg` and ensure it is available in the shell `PATH`. |
|
|
| `CUDA requested but no GPU is available` |
|
|
| - Run with `--device cpu --pnc-device cpu`, or install the correct GPU stack. |
|
|
| Import or wheel compatibility issues after copying from another machine |
|
|
| - Delete the copied virtual environment and create a new one on the target machine. |
|
|