asr-wav2vec2 / README.md
ducdatit2002's picture
Add files using upload-large-folder tool
fae0dd9 verified
# Vietnamese Telephony ASR Portable Bundle
This directory is a self-contained inference bundle for long-form Vietnamese telephony speech recognition.
It includes the runtime code, the exported ASR checkpoint, the punctuation/casing restoration checkpoint, and the Ocany domain hotwords file required for decoding.
The bundle is intended for deployment or handoff to another machine with minimal setup.
## What Is Included
- Vietnamese telephony ASR model checkpoint:
`outputs/results64k_wav2vec2_cleaned_v2_telaug30_longer/final`
- Tourmii punctuation and casing restoration checkpoint:
`tourmii_pnc_finetune/outputs/tourmii_pnc_domain_v1/final`
- Minimal inference scripts:
`run_infer.sh`, `scripts/run_longform_infer.py`, `scripts/infer_ctc.py`
- Supporting runtime package:
`src/asr_vi_wav2vec2_telephony/`
- Domain hotwords:
`configs/ocany_hotwords.txt`
## Directory Layout
```text
asr_vi_wav2vec2_telephony_portable/
├── configs/
│ └── ocany_hotwords.txt
├── outputs/
│ └── results64k_wav2vec2_cleaned_v2_telaug30_longer/
│ └── final/
├── requirements.infer.txt
├── run_infer.sh
├── scripts/
│ ├── infer_ctc.py
│ └── run_longform_infer.py
├── src/
│ └── asr_vi_wav2vec2_telephony/
└── tourmii_pnc_finetune/
└── outputs/
└── tourmii_pnc_domain_v1/
└── final/
```
## System Requirements
- Python 3.10 or newer
- `ffmpeg` available in the system `PATH`
- Linux or another environment that can run `bash`
- Optional GPU support:
install a `torch` and `torchaudio` build compatible with the target machine's CUDA runtime
## Setup
Create a fresh virtual environment inside the bundle and install the inference dependencies:
```bash
cd /path/to/asr_vi_wav2vec2_telephony_portable
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.infer.txt
```
If you plan to run on GPU, make sure the installed `torch` and `torchaudio` packages match the target machine's CUDA version. Do not assume a virtual environment created on one machine will work correctly on another.
## Quick Start
The shortest command is:
```bash
cd /path/to/asr_vi_wav2vec2_telephony_portable
bash run_infer.sh /path/to/audio.wav
```
You can also explicitly choose the Python interpreter:
```bash
cd /path/to/asr_vi_wav2vec2_telephony_portable
PYTHON_BIN=/usr/bin/python3 bash run_infer.sh /path/to/audio.wav
```
## Default Runtime Behavior
By default, the bundle will:
- load the ASR model from
`outputs/results64k_wav2vec2_cleaned_v2_telaug30_longer/final`
- enable beam-search decoding
- load hotwords from `configs/ocany_hotwords.txt`
- use `--hotword-weight 4`
- enable punctuation and casing restoration
- load the local Tourmii PnC model from
`tourmii_pnc_finetune/outputs/tourmii_pnc_domain_v1/final`
- write outputs to `tmp/<audio_stem>_guard_test`
For an input file named `call_2332.wav`, the output directory will be:
```text
tmp/call_2332_guard_test
```
The output bundle typically contains:
- `result.json`: structured inference output and metadata
- `result.txt`: best final text output
- `result.png`: waveform visualization with transcript context
- `<audio_stem>.prepared.wav`: converted and normalized working audio
## Advanced Usage
Use the Python entry point if you need to override defaults:
```bash
cd /path/to/asr_vi_wav2vec2_telephony_portable
python3 scripts/run_longform_infer.py \
--audio /data/call_2332.wav \
--output-dir ./tmp/custom_out \
--hotword-weight 2 \
--pnc-device cpu
```
Useful options include:
- `--device {auto,cpu,cuda}`: ASR inference device
- `--pnc-device {auto,cpu,cuda}`: punctuation model device
- `--disable-beam-search`: use greedy decoding only
- `--disable-vad`: disable VAD-based chunking
- `--hotwords-file /path/to/file.txt`: override the hotwords list
- `--model-path /path/to/final`: override the ASR checkpoint
- `--pnc-model-id /path/to/final`: override the punctuation checkpoint
- `--output-dir /path/to/out`: write outputs to a custom directory
## Running on Another Machine
This bundle is portable because the default model paths are resolved relative to the bundle root. If you copy the entire directory, the default commands continue to work without editing any paths.
Recommended handoff workflow:
1. Copy the whole `asr_vi_wav2vec2_telephony_portable/` directory to the target machine.
2. Do not rely on an existing `.venv` copied from the source machine.
3. Install system-level prerequisites on the target machine:
Python, `ffmpeg`, and GPU drivers if needed.
4. Create a new virtual environment on the target machine.
5. Install dependencies from `requirements.infer.txt`.
6. Run inference with `bash run_infer.sh /path/to/audio.wav`.
Example:
```bash
scp -r asr_vi_wav2vec2_telephony_portable user@target:/opt/
ssh user@target
cd /opt/asr_vi_wav2vec2_telephony_portable
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.infer.txt
bash run_infer.sh /data/sample_call.wav
```
Important portability notes:
- Copy the entire bundle, not just `model.safetensors`.
- Hugging Face checkpoints require the full exported directory, including files such as `config.json`, tokenizer files, and preprocessing metadata.
- Rebuild the Python environment on the target machine instead of copying one from another host.
- If the target machine has no GPU, run on CPU by default or pass `--device cpu --pnc-device cpu`.
- If the target machine has restricted internet access, prepare the Python wheels in advance; the model files themselves are already included locally in this bundle.
## Minimum Files Required
If someone asks whether a single checkpoint file is enough, the answer is no.
At minimum, you need:
- the full ASR exported model directory under `final/`
- the full PnC exported model directory under `final/`
- the inference scripts
- the `src/` runtime package
- the hotwords file if beam-search decoding should preserve domain terms
If the destination machine already has the exact same inference code and directory conventions, you may copy only the ASR and PnC model directories. In practice, shipping the full bundle is safer and reduces setup errors.
## Troubleshooting
`ffmpeg: command not found`
- Install `ffmpeg` and ensure it is available in the shell `PATH`.
`CUDA requested but no GPU is available`
- Run with `--device cpu --pnc-device cpu`, or install the correct GPU stack.
Import or wheel compatibility issues after copying from another machine
- Delete the copied virtual environment and create a new one on the target machine.