camenduru
/

NeMo

Model card Files Files and versions

NeMo / docs /source /asr /ssl /intro.rst

camenduru's picture

thanks to NVIDIA ❤

7934b29 almost 3 years ago

history blame contribute delete

1.56 kB

	Self-Supervised Learning
	=================================

	Self-Supervised Learning (SSL) refers to the problem of learning without explicit labels. As
	any learning process require feedback, without explit labels, SSL derives supervisory signals from
	the data itself. The general ideal of SSL is to predict any hidden part (or property) of the input
	from observed part of the input (e.g., filling in the blanks in a sentence or predicting whether
	an image is upright or inverted).

	SSL for speech/audio understanding broadly falls into either contrastive or reconstruction
	based approaches. In contrastive methods, models learn by distinguising between true and distractor
	tokens (or latents). Examples of contrastive approaches are Contrastive Predictive Coding (CPC),
	Masked Language Modeling (MLM) etc. In reconstruction methods, models learn by directly estimating
	the missing (intentionally leftout) portions of the input. Masked Reconstruction, Autoregressive
	Predictive Coding (APC) are few examples.

	In the recent past, SSL has been a major benefactor in improving Acoustic Modeling (AM), i.e., the
	encoder module of neural ASR models. Here too, majority of SSL effort is focused on improving AM.
	While it is common that AM is the focus of SSL in ASR, it can also be utilized in improving other parts of
	ASR models (e.g., predictor module in transducer based ASR models).

	The full documentation tree is as follows:

	.. toctree::
	:maxdepth: 8

	models
	datasets
	results
	configs
	api
	resources

	.. include:: resources.rst