| Introduction | |
| ============ | |
| .. # define a hard line break for html | |
| .. |br| raw:: html | |
| <br /> | |
| .. _dummy_header: | |
| `NVIDIA NeMo <https://github.com/NVIDIA/NeMo>`_, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art | |
| conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), | |
| Natural Language Processing (NLP), and Text-to-Speech (TTS) synthesis models. Each collection consists of | |
| prebuilt modules that include everything needed to train on your data. | |
| Every module can easily be customized, extended, and composed to create new conversational AI | |
| model architectures. | |
| Conversational AI architectures are typically large and require a lot of data and compute | |
| for training. NeMo uses `PyTorch Lightning <https://www.pytorchlightning.ai/>`_ for easy and performant multi-GPU/multi-node | |
| mixed-precision training. | |
| `Pre-trained NeMo models. <https://catalog.ngc.nvidia.com/models?query=nemo&orderBy=weightPopularDESC>`_ | |
| .. raw:: html | |
| <div style="position: relative; padding-bottom: 3%; height: 0; overflow: hidden; max-width: 100%; height: auto;"> | |
| <iframe width="560" height="315" src="https://www.youtube.com/embed/wBgpMf_KQVw" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> | |
| </div> | |
| For more information and questions, visit the `NVIDIA NeMo Discussion Board <https://github.com/NVIDIA/NeMo/discussions>`_. | |
| Prerequisites | |
| ------------- | |
| Before you begin using NeMo, it's assumed you meet the following prerequisites. | |
| #. You have Python version 3.6, 3.7 or 3.8. | |
| #. You have Pytorch version 1.8.1. | |
| #. You have access to an NVIDIA GPU for training. | |
| .. _quick_start_guide: | |
| Quick Start Guide | |
| ----------------- | |
| This NeMo Quick Start Guide is a starting point for users who want to try out NeMo; specifically, this guide enables users to quickly get started with the NeMo fundamentals by walking you through an example audio translator and voice swap. | |
| If you're new to NeMo, the best way to get started is to take a look at the following tutorials: | |
| * `Text Classification (Sentiment Analysis) <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/Text_Classification_Sentiment_Analysis>`__ - demonstrates the Text Classification model using the NeMo NLP collection. | |
| * `NeMo Primer <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/00_NeMo_Primer.ipynb>`__ - introduces NeMo, PyTorch Lightning, and OmegaConf, and shows how to use, modify, save, and restore NeMo models. | |
| * `NeMo Models <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/01_NeMo_Models.ipynb>`__ - explains the fundamental concepts of the NeMo model. | |
| * `NeMo voice swap demo <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/NeMo_voice_swap_app.ipynb>`__ - demonstrates how to swap a voice in the audio fragment with a computer generated one using NeMo. | |
| Below we is the code snippet of Audio Translator application. | |
| .. code-block:: python | |
| # Import NeMo and it's ASR, NLP and TTS collections | |
| import nemo | |
| # Import Speech Recognition collection | |
| import nemo.collections.asr as nemo_asr | |
| # Import Natural Language Processing colleciton | |
| import nemo.collections.nlp as nemo_nlp | |
| # Import Speech Synthesis collection | |
| import nemo.collections.tts as nemo_tts | |
| # Next, we instantiate all the necessary models directly from NVIDIA NGC | |
| # Speech Recognition model - QuartzNet trained on Russian part of MCV 6.0 | |
| quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="stt_ru_quartznet15x5").cuda() | |
| # Neural Machine Translation model | |
| nmt_model = nemo_nlp.models.MTEncDecModel.from_pretrained(model_name='nmt_ru_en_transformer6x6').cuda() | |
| # Spectrogram generator which takes text as an input and produces spectrogram | |
| spectrogram_generator = nemo_tts.models.FastPitchModel.from_pretrained(model_name="tts_en_fastpitch").cuda() | |
| # Vocoder model which takes spectrogram and produces actual audio | |
| vocoder = nemo_tts.models.HifiGanModel.from_pretrained(model_name="tts_en_hifigan").cuda() | |
| # Transcribe an audio file | |
| # IMPORTANT: The audio must be mono with 16Khz sampling rate | |
| # Get example from: https://nemo-public.s3.us-east-2.amazonaws.com/mcv-samples-ru/common_voice_ru_19034087.wav | |
| russian_text = quartznet.transcribe(['Path_to_audio_file']) | |
| print(russian_text) | |
| # You should see russian text here. Let's translate it to English | |
| english_text = nmt_model.translate(russian_text) | |
| print(english_text) | |
| # After this you should see English translation | |
| # Let's convert it into audio | |
| # A helper function which combines FastPitch and HiFiGAN to go directly from | |
| # text to audio | |
| def text_to_audio(text): | |
| parsed = spectrogram_generator.parse(text) | |
| spectrogram = spectrogram_generator.generate_spectrogram(tokens=parsed) | |
| audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram) | |
| return audio.to('cpu').numpy() | |
| audio = text_to_audio(english_text[0]) | |
| Installation | |
| ------------ | |
| Pip | |
| ~~~ | |
| Use this installation mode if you want the latest released version. | |
| .. code-block:: bash | |
| apt-get update && apt-get install -y libsndfile1 ffmpeg | |
| pip install Cython | |
| pip install nemo_toolkit[all] | |
| Pip from source | |
| ~~~~~~~~~~~~~~~ | |
| Use this installation mode if you want the version from a particular GitHub branch (for example, ``main``). | |
| .. code-block:: bash | |
| apt-get update && apt-get install -y libsndfile1 ffmpeg | |
| pip install Cython | |
| python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all] | |
| # For v1.0.2, replace {BRANCH} with v1.0.2 like so: | |
| # python -m pip install git+https://github.com/NVIDIA/NeMo.git@v1.0.2#egg=nemo_toolkit[all] | |
| From source | |
| ~~~~~~~~~~~ | |
| Use this installation mode if you are contributing to NeMo. | |
| .. code-block:: bash | |
| apt-get update && apt-get install -y libsndfile1 ffmpeg | |
| git clone https://github.com/NVIDIA/NeMo | |
| cd NeMo | |
| ./reinstall.sh | |
| Docker containers | |
| ~~~~~~~~~~~~~~~~~ | |
| To build a nemo container with Dockerfile from a branch, please run | |
| .. code-block:: bash | |
| DOCKER_BUILDKIT=1 docker build -f Dockerfile -t nemo:latest. | |
| If you chose to work with the ``main`` branch, we recommend using `NVIDIA's PyTorch container version 21.05-py3 <https://ngc.nvidia.com/containers/nvidia:pytorch/tags>`_, then install from GitHub. | |
| .. code-block:: bash | |
| docker run --gpus all -it --rm -v <nemo_github_folder>:/NeMo --shm-size=8g \ | |
| -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \ | |
| stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:21.05-py3 | |
| `FAQ <https://github.com/NVIDIA/NeMo/discussions>`_ | |
| --------------------------------------------------- | |
| Have a look at our `discussions board <https://github.com/NVIDIA/NeMo/discussions>`_ and feel free to post a question or start a discussion. | |
| Contributing | |
| ------------ | |
| We welcome community contributions! Refer to the `CONTRIBUTING.md <https://github.com/NVIDIA/NeMo/blob/stable/CONTRIBUTING.md>`_ file for the process. | |
| License | |
| ------- | |
| NeMo is under `Apache 2.0 license <https://github.com/NVIDIA/NeMo/blob/stable/LICENSE>`_. |