Buckets:

rtrm's picture
download
raw
7.13 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;What you’ll learn and what you’ll build&quot;,&quot;local&quot;:&quot;what-youll-learn-and-what-youll-build&quot;,&quot;sections&quot;:[],&quot;depth&quot;:1}">
<link href="/docs/audio-course/pr_201/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/entry/start.367c4d78.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/scheduler.f7e1785c.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/singletons.0d70d4cc.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/index.279db187.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/paths.274f629d.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/entry/app.4c54ebf9.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/index.9f8f0838.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/nodes/0.e329f606.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/nodes/34.11f4eea9.js">
<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/EditOnGithub.5a9bb8c5.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;What you’ll learn and what you’ll build&quot;,&quot;local&quot;:&quot;what-youll-learn-and-what-youll-build&quot;,&quot;sections&quot;:[],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="what-youll-learn-and-what-youll-build" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#what-youll-learn-and-what-youll-build"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>What you’ll learn and what you’ll build</span></h1> <p data-svelte-h="svelte-gm4czj">In this section, we’ll take a look at how Transformers can be used to convert spoken speech into text, a task known <em>speech recognition</em>.</p> <div class="flex justify-center" data-svelte-h="svelte-gouu61"><img src="https://huggingface.co/datasets/huggingface-course/audio-course-images/resolve/main/asr_diagram.png" alt="Diagram of speech to text"></div> <p data-svelte-h="svelte-xmgt0n">Speech recognition, also known as automatic speech recognition (ASR) or speech-to-text (STT), is one of the most popular
and exciting spoken language processing tasks. It’s used in a wide range of applications, including dictation, voice assistants,
video captioning and meeting transcriptions.</p> <p data-svelte-h="svelte-e3ezub">You’ve probably made use of a speech recognition system many times before without realising! Consider the digital
assistant in your smartphone device (Siri, Google Assistant, Alexa). When you use these assistants, the first thing that
they do is transcribe your spoken speech to written text, ready to be used for any downstream tasks (such as finding you
the weather 🌤️).</p> <p data-svelte-h="svelte-28ph8z">Have a play with the speech recognition demo below. You can either record yourself using your microphone, or drag and
drop an audio sample for transcription:</p> <iframe src="https://course-demos-whisper-small.hf.space" frameborder="0" width="850" height="450" data-svelte-h="svelte-aw0ubw"></iframe> <p data-svelte-h="svelte-9savi7">Speech recognition is a challenging task as it requires joint knowledge of audio and text. The input audio might have
lots of background noise and be spoken by speakers with different accents, making it difficult to pick out the spoken
speech. The written text might have characters which don’t have an acoustic sound, such as punctuation, which are difficult
to infer from audio alone. These are all hurdles we have to tackle when building effective speech recognition systems!</p> <p data-svelte-h="svelte-1klaps2">Now that we’ve defined our task, we can begin looking into speech recognition in more detail. By the end of this Unit,
you’ll have a good fundamental understanding of the different pre-trained speech recognition models available and how to
use them with the 🤗 Transformers library. You’ll also know the procedure for fine-tuning an ASR model on a domain or
language of choice, enabling you to build a performant system for whatever task you encounter. You’ll be able to showcase
your model to your friends and family by building a live demo, one that takes any spoken speech and converts it to text!</p> <p data-svelte-h="svelte-96ho0">Specifically, we’ll cover:</p> <ul data-svelte-h="svelte-t4qu3r"><li><a href="asr_models">Pre-trained models for speech recognition</a></li> <li><a href="choosing_dataset">Choosing a dataset</a></li> <li><a href="evaluation">Evaluation and metrics for speech recognition</a></li> <li><a href="fine-tuning">How to fine-tune an ASR system with the Trainer API</a></li> <li><a href="demo">Building a demo</a></li> <li><a href="hands_on">Hands-on exercise</a></li></ul> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/audio-transformers-course/blob/main/chapters/en/chapter5/introduction.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1">&lt;</span> <span data-svelte-h="svelte-x0xyl0">&gt;</span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>
<script>
{
__sveltekit_yq3w38 = {
assets: "/docs/audio-course/pr_201/en",
base: "/docs/audio-course/pr_201/en",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/audio-course/pr_201/en/_app/immutable/entry/start.367c4d78.js"),
import("/docs/audio-course/pr_201/en/_app/immutable/entry/app.4c54ebf9.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 34],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
7.13 kB
·
Xet hash:
8105d117c5e8f5bce75bfcb5f84f4f65d0ee75de1d80815801b230fcca55fdcc

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.