| Speech Classification | |
| ================================== | |
| Speech Classification refers to a set of tasks or problems of getting a program to automatically classify input utterance or audio segment into categories, | |
| such as Speech Command Recognition (multi-class), Voice Activity Detection (binary or multi-class), and Audio Sentiment Classification (typically multi-class), etc. | |
| **Speech Command Recognition** is the task of classifying an input audio pattern into a discrete set of classes. | |
| It is a subset of Automatic Speech Recognition (ASR), sometimes referred to as Key Word Spotting, in which a model is constantly analyzing speech patterns to detect certain "command" classes. | |
| Upon detection of these commands, a specific action can be taken by the system. | |
| It is often the objective of command recognition models to be small and efficient so that they can be deployed onto low-power sensors and remain active for long durations of time. | |
| **Voice Activity Detection (VAD)** also known as speech activity detection or speech detection, is the task of predicting which parts of input audio contain speech versus background noise. | |
| It is an essential first step for a variety of speech-based applications including Automatic Speech Recognition. | |
| It serves to determine which samples to be sent to the model and when to close the microphone. | |
| **Spoken Language Identification (Lang ID)** also known as spoken language recognition, is the task of recognizing the language of the spoken utterance automatically. | |
| It typically serves as the prepossessing of ASR, determining which ASR model would be activate based on the language. | |
| The full documentation tree is as follows: | |
| .. toctree:: | |
| :maxdepth: 8 | |
| models | |
| datasets | |
| results | |
| configs | |
| resources.rst | |
| .. include:: resources.rst | |