Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / audio-course /pr_201 /en /chapter7 /hands_on.html

rtrm

about 2 months ago

download

raw

9.17 kB

	<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Hands-on exercise","local":"hands-on-exercise","sections":[],"depth":1}">
	<link href="/docs/audio-course/pr_201/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/entry/start.367c4d78.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/scheduler.f7e1785c.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/singletons.0d70d4cc.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/index.279db187.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/paths.274f629d.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/entry/app.4c54ebf9.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/index.9f8f0838.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/nodes/0.e329f606.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/each.e59479a4.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/nodes/43.fdbeaa26.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/Tip.4575d9cf.js">
	<link rel="modulepreload" href="/docs/audio-course/pr_201/en/_app/immutable/chunks/EditOnGithub.5a9bb8c5.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Hands-on exercise","local":"hands-on-exercise","sections":[],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="hands-on-exercise" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#hands-on-exercise"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Hands-on exercise</span></h1> <p data-svelte-h="svelte-5fgwkf">In this Unit, we consolidated the material covered in the previous six units of the course to build three integrated
	audio applications. As you’ve experienced, building more involved audio tools is fully within reach by using the
	foundational skills you’ve acquired in this course.</p> <p data-svelte-h="svelte-1vndh8y">The hands-on exercise takes one of the applications covered in this Unit, and extends it with a few multilingual
	tweaks 🌍 Your objective is to take the <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation" rel="nofollow">cascaded speech-to-speech translation Gradio demo</a>
	from the first section in this Unit, and update it to translate to any <strong>non-English</strong> language. That is to say, the
	demo should take speech in language X, and translate it to speech in language Y, where the target language Y is not
	English. You should start by <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation?duplicate=true" rel="nofollow">duplicating</a>
	the template under your Hugging Face namespace. There’s no requirement to use a GPU accelerator device - the free CPU
	tier works just fine 🤗 However, you should ensure that the visibility of your demo is set to <strong>public</strong>. This is required
	such that your demo is accessible to us and can thus be checked for correctness.</p> <p data-svelte-h="svelte-1xc515h">Tips for updating the speech translation function to perform multilingual speech translation are provided in the
	section on <a href="speech-to-speech">speech-to-speech translation</a>. By following these instructions, you should be able
	to update the demo to translate from speech in language X to text in language Y, which is half of the task!</p> <p data-svelte-h="svelte-e3qygk">To synthesise from text in language Y to speech in language Y, where Y is a multilingual language, you will need
	to use a multilingual TTS checkpoint. For this, you can either use the SpeechT5 TTS checkpoint that you fine-tuned
	in the previous hands-on exercise, or a pre-trained multilingual TTS checkpoint. There are two options for pre-trained
	checkpoints, either the checkpoint <a href="https://huggingface.co/sanchit-gandhi/speecht5_tts_vox_nl" rel="nofollow">sanchit-gandhi/speecht5_tts_vox_nl</a>,
	which is a SpeechT5 checkpoint fine-tuned on the Dutch split of the <a href="https://huggingface.co/datasets/facebook/voxpopuli" rel="nofollow">VoxPopuli</a>
	dataset, or an MMS TTS checkpoint (see section on <a href="../chapter6/pre-trained_models">pretrained models for TTS</a>).</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400">In our experience experimenting with the Dutch language, using an MMS TTS checkpoint results in better performance than a
	fine-tuned SpeechT5 one, but you might find that your fine-tuned TTS checkpoint is preferable in your language.
	If you decide to use an MMS TTS checkpoint, you will need to update the <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/a03175878f522df7445290d5508bfb5c5178f787/requirements.txt#L2" data-svelte-h="svelte-l0vtp8">requirements.txt</a>
	file of your demo to install <code data-svelte-h="svelte-olzpwg">transformers</code> from the PR branch:
	<p data-svelte-h="svelte-1oo2kle"><code>git+https://github.com/hollance/transformers.git@6900e8ba6532162a8613d2270ec2286c3f58f57b</code></p></div> <p data-svelte-h="svelte-r6bgbz">Your demo should take as input an audio file, and return as output another audio file, matching the signature of the
	<a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/3946ba6705a6632a63de8672ac52a482ab74b3fc/app.py#L35" rel="nofollow"><code>speech_to_speech_translation</code></a>
	function in the template demo. Therefore, we recommend that you leave the main function <code>speech_to_speech_translation</code>
	as is, and only update the <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/a03175878f522df7445290d5508bfb5c5178f787/app.py#L24" rel="nofollow"><code>translate</code></a>
	and <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/a03175878f522df7445290d5508bfb5c5178f787/app.py#L29" rel="nofollow"><code>synthesise</code></a>
	functions as required.</p> <p data-svelte-h="svelte-xkt7xj">Once you have built your demo as a Gradio demo on the Hugging Face Hub, you can submit it for assessment. Head to the
	Space <a href="https://huggingface.co/spaces/huggingface-course/audio-course-u7-assessment" rel="nofollow">audio-course-u7-assessment</a> and
	provide the repository id of your demo when prompted. This Space will check that your demo has been built correctly by
	sending a sample audio file to your demo and checking that the returned audio file is indeed non-English. If your demo
	works correctly, you’ll get a green tick next to your name on the overall <a href="https://huggingface.co/spaces/MariaK/Check-my-progress-Audio-Course" rel="nofollow">progress space</a> ✅</p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/audio-transformers-course/blob/main/chapters/en/chapter7/hands_on.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>

	<script>
	{
	__sveltekit_yq3w38 = {
	assets: "/docs/audio-course/pr_201/en",
	base: "/docs/audio-course/pr_201/en",
	env: {}
	};

	const element = document.currentScript.parentElement;

	const data = [null,null];

	Promise.all([
	import("/docs/audio-course/pr_201/en/_app/immutable/entry/start.367c4d78.js"),
	import("/docs/audio-course/pr_201/en/_app/immutable/entry/app.4c54ebf9.js")
	]).then(([kit, app]) => {
	kit.start(app, element, {
	node_ids: [0, 43],
	data,
	form: null,
	error: null
	});
	});
	}
	</script>

Xet Storage Details

Size:: 9.17 kB
Xet hash:: c302b919e0a7a49b787b348d9859bb36fd15066f7c4efaee1595a1bc9492b904

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.