chime-dasr
/

nemo_baseline_models

Model card Files Files and versions

Taejin commited on Feb 15, 2024

Commit

d0be7b6

·

verified ·

1 Parent(s): 889281b

Update README.md

Files changed (1) hide show

README.md +23 -2

README.md CHANGED Viewed

@@ -1,3 +1,24 @@
 ---
-{}
----

 ---
+library_name: nemo
+---
+# CHiME8 DASR NeMo Baseline Models
+## 1. Voice Activity Detection (VAD) Model:
+### **MarbleNet_frame_VAD_chime7_Acrobat.nemo**
+- This model is based on [NeMo MarbleNet VAD model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/models.html#marblenet-vad).
+- For validation, we use dataset comprises the CHiME-6 development subset as well as 50 hours of simulated audio data.
+- The simulated data is generated using the [NeMo multi-speaker data simulator](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Multispeaker_Simulator.ipynb)
+on [VoxCeleb1&2 datasets](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html)
+- The multi-speaker data simulation results in a total of 2,000 hours of audio, of which approximately 30% is silence.
+- The Model training incorporates [SpecAugment](https://arxiv.org/abs/1904.08779) and noise augmentation through [MUSAN noise dataset](https://arxiv.org/abs/1510.08484).
+## 2. Speaker Diarization Model: Multi-scale Diarization Decoder (MSDD-v2)
+### MSDD_v2_PALO_100ms_intrpl_3scales.nemo
+## 3. Automatic Speech Recognition (ASR) model
+### FastConformerXL-RNNT-chime7-GSS-finetuned.nemo
+## 4. Language Model for ASR Decoding: KenLM Model
+### ASR_LM_chime7_only.kenlm