# NeMo Forced Aligner (NFA)

Try it out: HuggingFace Space 🎤 | Tutorial: "How to use NFA?" 🚀 | Blog post: "How does forced alignment work?" 📚

NFA is a tool for generating token-, word- and segment-level timestamps of speech in audio using NeMo's CTC-based Automatic Speech Recognition models. You can provide your own reference text, or use ASR-generated transcription. You can use NeMo's ASR Model checkpoints out of the box in [14+ languages](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/results.html#speech-recognition-languages), or train your own model. NFA can be used on long audio files of 1+ hours duration (subject to your hardware and the ASR model used). ## Quickstart 1. Install [NeMo](https://github.com/NVIDIA/NeMo#installation). 2. Prepare a NeMo-style manifest containing the paths of audio files you would like to process, and (optionally) their text. 3. Run NFA's `align.py` script with the desired config, e.g.: ``` bash python /tools/nemo_forced_aligner/align.py \ pretrained_name="stt_en_fastconformer_hybrid_large_pc" \ manifest_filepath= \ output_dir= ```

## Documentation More documentation is available [here](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tools/nemo_forced_aligner.html).