| Checkpoints |
| =========== |
|
|
| There are two main ways to load pretrained checkpoints in NeMo: |
|
|
| * Using the :code:`restore_from()` method to load a local checkpoint file (`.nemo`), or |
| * Using the :code:`from_pretrained()` method to download and set up a checkpoint from NGC. |
|
|
| See the following sections for instructions and examples for each. |
|
|
| Note that these instructions are for loading fully trained checkpoints for evaluation or fine-tuning. |
| For resuming an unfinished training experiment, please use the experiment manager to do so by setting the |
| ``resume_if_exists`` flag to True. |
|
|
| Loading Local Checkpoints |
| ------------------------- |
|
|
| NeMo will automatically save checkpoints of a model you are training in a `.nemo` format. |
| You can also manually save your models at any point using :code:`model.save_to(<checkpoint_path>.nemo)`. |
|
|
| If you have a local ``.nemo`` checkpoint that you'd like to load, simply use the :code:`restore_from()` method: |
|
|
| .. code-block:: python |
|
|
| import nemo.collections.asr as nemo_asr |
| model = nemo_asr.models.<MODEL_BASE_CLASS>.restore_from(restore_path="<path/to/checkpoint/file.nemo>") |
|
|
| Where the model base class is the ASR model class of the original checkpoint, or the general `ASRModel` class. |
|
|
| Speaker Label Inference |
| ------------------------ |
|
|
| The goal of speaker label inference is to infer speaker labels using a speaker model with known speaker labels from enrollment set. We provide `speaker_identification_infer.py` script for this purpose under `<NeMo_root>/examples/speaker_tasks/recognition` folder. |
| Currently supported backends are cosine_similarity and neural classifier. |
|
|
| The audio files should be 16KHz mono channel wav files. |
|
|
| The script takes two manifest files: |
|
|
| * enrollment_manifest : This manifest contains enrollment data with known speaker labels. |
| * test_manifest: This manifest contains test data for which we map speaker labels captured from enrollment manifest using one of provided backend |
|
|
| sample format for each of these manifests is provided in `<NeMo_root>/examples/speaker_tasks/recognition/conf/speaker_identification_infer.yaml` config file. |
|
|
| To infer speaker labels using cosine_similarity backend |
|
|
| .. code-block:: bash |
| |
| python speaker_identification_infer.py data.enrollment_manifest=<path/to/enrollment_manifest> data.test_manifest=<path/to/test_manifest> backend.backend_model=cosine_similarity |
|
|
| |
| Speaker Embedding Extraction |
| ----------------------------- |
| Speaker Embedding Extraction, is to extract speaker embeddings for any wav file (from known or unknown speakers). We provide two ways to do this: |
|
|
| * single Python liner for extracting embeddings from a single file |
| * Python script for extracting embeddings from a bunch of files provided through manifest file |
|
|
| For extracting embeddings from a single file: |
|
|
| .. code-block:: python |
|
|
| speaker_model = EncDecSpeakerLabelModel.from_pretrained(model_name="<pretrained_model_name or path/to/nemo/file>") |
| embs = speaker_model.get_embedding('<audio_path>') |
|
|
| For extracting embeddings from a bunch of files: |
|
|
| The audio files should be 16KHz mono channel wav files. |
|
|
| Write audio files to a ``manifest.json`` file with lines as in format: |
|
|
| .. code-block:: json |
| |
| {"audio_filepath": "<absolute path to dataset>/audio_file.wav", "duration": "duration of file in sec", "label": "speaker_id"} |
| |
| This python call will download best pretrained model from NGC and writes embeddings pickle file to current working directory |
|
|
| .. code-block:: bash |
| |
| python examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json |
| |
| or you can run `batch_inference()` to perform inference on the manifest with seleted batch_size to get embeddings |
|
|
| .. code-block:: python |
|
|
| speaker_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained(model_name="<pretrained_model_name or path/to/nemo/file>") |
| embs, logits, gt_labels, trained_labels = speaker_model.batch_inference(manifest, batch_size=32) |
|
|
| Speaker Verification Inference |
| ------------------------------ |
|
|
| Speaker Verification is a task of verifying if two utterances are from the same speaker or not. |
|
|
| We provide a helper function to verify the audio files and return True if two provided audio files are from the same speaker, False otherwise. |
|
|
| The audio files should be 16KHz mono channel wav files. |
|
|
| .. code-block:: python |
|
|
| speaker_model = EncDecSpeakerLabelModel.from_pretrained(model_name="titanet_large") |
| decision = speaker_model.verify_speakers('path/to/one/audio_file','path/to/other/audio_file') |
|
|
|
|
| NGC Pretrained Checkpoints |
| -------------------------- |
|
|
| The SpeakerNet-ASR collection has checkpoints of several models trained on various datasets for a variety of tasks. |
| `TitaNet <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/titanet_large>`_ , `ECAPA_TDNN <https://ngc.nvidia.com/catalog/models/nvidia:nemo:ecapa_tdnn>`_ and `Speaker_Verification <https://ngc.nvidia.com/catalog/models/nvidia:nemo:speakerverification_speakernet>`_ model cards on NGC contain more information about each of the checkpoints available. |
|
|
| The tables below list the speaker embedding extractor models available from NGC, and the models can be accessed via the |
| :code:`from_pretrained()` method inside the EncDecSpeakerLabelModel Model class. |
|
|
| In general, you can load any of these models with code in the following format: |
|
|
| .. code-block:: python |
|
|
| import nemo.collections.asr as nemo_asr |
| model = nemo_asr.models.<MODEL_CLASS_NAME>.from_pretrained(model_name="<MODEL_NAME>") |
|
|
| where the model name is the value under "Model Name" entry in the tables below. |
|
|
| If you would like to programatically list the models available for a particular base class, you can use the |
| :code:`list_available_models()` method. |
|
|
| .. code-block:: python |
|
|
| nemo_asr.models.<MODEL_BASE_CLASS>.list_available_models() |
|
|
|
|
| Speaker Recognition Models |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| .. csv-table:: |
| :file: data/speaker_results.csv |
| :align: left |
| :widths: 30, 30, 40 |
| :header-rows: 1 |
|
|
|
|