Buckets:

rtrm's picture
download
raw
11.3 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Hands-on exercise&quot;,&quot;local&quot;:&quot;hands-on-exercise&quot;,&quot;sections&quot;:[],&quot;depth&quot;:1}">
<link href="/docs/audio-course/pr_239/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/entry/start.1658692c.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/chunks/scheduler.cd324960.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/chunks/singletons.b42fc23b.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/chunks/index.a0c12d66.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/chunks/paths.cd0b54b2.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/entry/app.83f02103.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/chunks/preload-helper.7a3e7823.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/chunks/index.d5c3adcc.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/nodes/0.33fdfcd8.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/nodes/43.6fdbfaf6.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/chunks/Tip.889bec11.js">
<link rel="modulepreload" href="/docs/audio-course/pr_239/en/_app/immutable/chunks/MermaidChart.svelte_svelte_type_style_lang.f42929ed.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Hands-on exercise&quot;,&quot;local&quot;:&quot;hands-on-exercise&quot;,&quot;sections&quot;:[],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <div class="items-center shrink-0 min-w-[100px] max-sm:min-w-[50px] justify-end ml-auto flex" style="float: right; margin-left: 10px; display: inline-flex; position: relative; z-index: 10;"><div class="inline-flex rounded-md max-sm:rounded-sm"><button class="inline-flex items-center gap-1 h-7 max-sm:h-7 px-2 max-sm:px-1.5 text-sm font-medium text-gray-800 border border-r-0 rounded-l-md max-sm:rounded-l-sm border-gray-200 bg-white hover:shadow-inner dark:border-gray-850 dark:bg-gray-950 dark:text-gray-200 dark:hover:bg-gray-800" aria-live="polite"><span class="inline-flex items-center justify-center rounded-md p-0.5 max-sm:p-0 hover:text-gray-800 dark:hover:text-gray-200"><svg class="sm:size-3.5 size-3" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg></span> <span>Copy page</span></button> <button class="inline-flex items-center justify-center w-6 max-sm:w-5 h-7 max-sm:h-7 disabled:pointer-events-none text-sm text-gray-500 hover:text-gray-700 dark:hover:text-white rounded-r-md max-sm:rounded-r-sm border border-l transition border-gray-200 bg-white hover:shadow-inner dark:border-gray-850 dark:bg-gray-950 dark:text-gray-200 dark:hover:bg-gray-800" aria-haspopup="menu" aria-expanded="false" aria-label="Open copy menu"><svg class="transition-transform text-gray-400 overflow-visible sm:size-3.5 size-3 rotate-0" width="1em" height="1em" viewBox="0 0 12 7" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M1 1L6 6L11 1" stroke="currentColor"></path></svg></button></div> </div> <h1 class="relative group"><a id="hands-on-exercise" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#hands-on-exercise"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Hands-on exercise</span></h1> <p data-svelte-h="svelte-5fgwkf">In this Unit, we consolidated the material covered in the previous six units of the course to build three integrated
audio applications. As you’ve experienced, building more involved audio tools is fully within reach by using the
foundational skills you’ve acquired in this course.</p> <p data-svelte-h="svelte-1vndh8y">The hands-on exercise takes one of the applications covered in this Unit, and extends it with a few multilingual
tweaks 🌍 Your objective is to take the <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation" rel="nofollow">cascaded speech-to-speech translation Gradio demo</a>
from the first section in this Unit, and update it to translate to any <strong>non-English</strong> language. That is to say, the
demo should take speech in language X, and translate it to speech in language Y, where the target language Y is not
English. You should start by <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation?duplicate=true" rel="nofollow">duplicating</a>
the template under your Hugging Face namespace. There’s no requirement to use a GPU accelerator device - the free CPU
tier works just fine 🤗 However, you should ensure that the visibility of your demo is set to <strong>public</strong>. This is required
such that your demo is accessible to us and can thus be checked for correctness.</p> <p data-svelte-h="svelte-1xc515h">Tips for updating the speech translation function to perform multilingual speech translation are provided in the
section on <a href="speech-to-speech">speech-to-speech translation</a>. By following these instructions, you should be able
to update the demo to translate from speech in language X to text in language Y, which is half of the task!</p> <p data-svelte-h="svelte-e3qygk">To synthesise from text in language Y to speech in language Y, where Y is a multilingual language, you will need
to use a multilingual TTS checkpoint. For this, you can either use the SpeechT5 TTS checkpoint that you fine-tuned
in the previous hands-on exercise, or a pre-trained multilingual TTS checkpoint. There are two options for pre-trained
checkpoints, either the checkpoint <a href="https://huggingface.co/sanchit-gandhi/speecht5_tts_vox_nl" rel="nofollow">sanchit-gandhi/speecht5_tts_vox_nl</a>,
which is a SpeechT5 checkpoint fine-tuned on the Dutch split of the <a href="https://huggingface.co/datasets/facebook/voxpopuli" rel="nofollow">VoxPopuli</a>
dataset, or an MMS TTS checkpoint (see section on <a href="../chapter6/pre-trained_models">pretrained models for TTS</a>).</p> <blockquote class="tip">In our experience experimenting with the Dutch language, using an MMS TTS checkpoint results in better performance than a
fine-tuned SpeechT5 one, but you might find that your fine-tuned TTS checkpoint is preferable in your language.
If you decide to use an MMS TTS checkpoint, you will need to update the <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/a03175878f522df7445290d5508bfb5c5178f787/requirements.txt#L2" data-svelte-h="svelte-l0vtp8">requirements.txt</a>
file of your demo to install <code data-svelte-h="svelte-olzpwg">transformers</code> from the PR branch:
<p data-svelte-h="svelte-1oo2kle"><code>git+https://github.com/hollance/transformers.git@6900e8ba6532162a8613d2270ec2286c3f58f57b</code></p></blockquote> <p data-svelte-h="svelte-r6bgbz">Your demo should take as input an audio file, and return as output another audio file, matching the signature of the
<a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/3946ba6705a6632a63de8672ac52a482ab74b3fc/app.py#L35" rel="nofollow"><code>speech_to_speech_translation</code></a>
function in the template demo. Therefore, we recommend that you leave the main function <code>speech_to_speech_translation</code>
as is, and only update the <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/a03175878f522df7445290d5508bfb5c5178f787/app.py#L24" rel="nofollow"><code>translate</code></a>
and <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/a03175878f522df7445290d5508bfb5c5178f787/app.py#L29" rel="nofollow"><code>synthesise</code></a>
functions as required.</p> <p data-svelte-h="svelte-xkt7xj">Once you have built your demo as a Gradio demo on the Hugging Face Hub, you can submit it for assessment. Head to the
Space <a href="https://huggingface.co/spaces/huggingface-course/audio-course-u7-assessment" rel="nofollow">audio-course-u7-assessment</a> and
provide the repository id of your demo when prompted. This Space will check that your demo has been built correctly by
sending a sample audio file to your demo and checking that the returned audio file is indeed non-English. If your demo
works correctly, you’ll get a green tick next to your name on the overall <a href="https://huggingface.co/spaces/MariaK/Check-my-progress-Audio-Course" rel="nofollow">progress space</a></p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/audio-transformers-course/blob/main/chapters/en/chapter7/hands_on.mdx" target="_blank"><svg class="mr-1" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M31,16l-7,7l-1.41-1.41L28.17,16l-5.58-5.59L24,9l7,7z"></path><path d="M1,16l7-7l1.41,1.41L3.83,16l5.58,5.59L8,23l-7-7z"></path><path d="M12.419,25.484L17.639,6.552l1.932,0.518L14.351,26.002z"></path></svg> <span data-svelte-h="svelte-zjs2n5"><span class="underline">Update</span> on GitHub</span></a> <p></p>
<script>
{
__sveltekit_1pbp10e = {
assets: "/docs/audio-course/pr_239/en",
base: "/docs/audio-course/pr_239/en",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/audio-course/pr_239/en/_app/immutable/entry/start.1658692c.js"),
import("/docs/audio-course/pr_239/en/_app/immutable/entry/app.83f02103.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 43],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
11.3 kB
·
Xet hash:
e8f0df0babcf87c6c439326051e22741874191a4b7d6ddd442e8a77486652b58

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.