| Self-Supervised Learning | |
| ================================= | |
| Self-Supervised Learning (SSL) refers to the problem of learning without explicit labels. As | |
| any learning process require feedback, without explit labels, SSL derives supervisory signals from | |
| the data itself. The general ideal of SSL is to predict any hidden part (or property) of the input | |
| from observed part of the input (e.g., filling in the blanks in a sentence or predicting whether | |
| an image is upright or inverted). | |
| SSL for speech/audio understanding broadly falls into either contrastive or reconstruction | |
| based approaches. In contrastive methods, models learn by distinguising between true and distractor | |
| tokens (or latents). Examples of contrastive approaches are Contrastive Predictive Coding (CPC), | |
| Masked Language Modeling (MLM) etc. In reconstruction methods, models learn by directly estimating | |
| the missing (intentionally leftout) portions of the input. Masked Reconstruction, Autoregressive | |
| Predictive Coding (APC) are few examples. | |
| In the recent past, SSL has been a major benefactor in improving Acoustic Modeling (AM), i.e., the | |
| encoder module of neural ASR models. Here too, majority of SSL effort is focused on improving AM. | |
| While it is common that AM is the focus of SSL in ASR, it can also be utilized in improving other parts of | |
| ASR models (e.g., predictor module in transducer based ASR models). | |
| The full documentation tree is as follows: | |
| .. toctree:: | |
| :maxdepth: 8 | |
| models | |
| datasets | |
| results | |
| configs | |
| api | |
| resources | |
| .. include:: resources.rst | |