Buckets:
| # Understanding the Interface class[[understanding-the-interface-class]] | |
| <CourseFloatingBanner chapter={9} | |
| classNames="absolute z-10 right-0 top-0" | |
| notebooks={[ | |
| {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter9/section3.ipynb"}, | |
| {label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter9/section3.ipynb"}, | |
| ]} /> | |
| In this section, we will take a closer look at the `Interface` class, and understand the | |
| main parameters used to create one. | |
| ## How to create an Interface[[how-to-create-an-interface]] | |
| You'll notice that the `Interface` class has 3 required parameters: | |
| `Interface(fn, inputs, outputs, ...)` | |
| These parameters are: | |
| - `fn`: the prediction function that is wrapped by the Gradio interface. This function can take one or more parameters and return one or more values | |
| - `inputs`: the input component type(s). Gradio provides many pre-built components such as`"image"` or `"mic"`. | |
| - `outputs`: the output component type(s). Again, Gradio provides many pre-built components e.g. `"image"` or `"label"`. | |
| For a complete list of components, [see the Gradio docs ](https://gradio.app/docs). Each pre-built component can be customized by instantiating the class corresponding to the component. | |
| For example, as we saw in the [previous section](/course/chapter9/2), | |
| instead of passing in `"textbox"` to the `inputs` parameter, you can pass in a `Textbox(lines=7, label="Prompt")` component to create a textbox with 7 lines and a label. | |
| Let's take a look at another example, this time with an `Audio` component. | |
| ## A simple example with audio[[a-simple-example-with-audio]] | |
| As mentioned earlier, Gradio provides many different inputs and outputs. | |
| So let's build an `Interface` that works with audio. | |
| In this example, we'll build an audio-to-audio function that takes an | |
| audio file and simply reverses it. | |
| We will use for the input the `Audio` component. When using the `Audio` component, | |
| you can specify whether you want the `source` of the audio to be a file that the user | |
| uploads or a microphone that the user records their voice with. In this case, let's | |
| set it to a `"microphone"`. Just for fun, we'll add a label to our `Audio` that says | |
| "Speak here...". | |
| In addition, we'd like to receive the audio as a numpy array so that we can easily | |
| "reverse" it. So we'll set the `"type"` to be `"numpy"`, which passes the input | |
| data as a tuple of (`sample_rate`, `data`) into our function. | |
| We will also use the `Audio` output component which can automatically | |
| render a tuple with a sample rate and numpy array of data as a playable audio file. | |
| In this case, we do not need to do any customization, so we will use the string | |
| shortcut `"audio"`. | |
| ```py | |
| import numpy as np | |
| import gradio as gr | |
| def reverse_audio(audio): | |
| sr, data = audio | |
| reversed_audio = (sr, np.flipud(data)) | |
| return reversed_audio | |
| mic = gr.Audio(source="microphone", type="numpy", label="Speak here...") | |
| gr.Interface(reverse_audio, mic, "audio").launch() | |
| ``` | |
| The code above will produce an interface like the one below (if your browser doesn't | |
| ask you for microphone permissions, <a href="https://huggingface.co/spaces/course-demos/audio-reverse" target="_blank">open the demo in a separate tab</a>.) | |
| <iframe src="https://course-demos-audio-reverse.hf.space" frameBorder="0" height="250" title="Gradio app" class="container p-0 flex-grow space-iframe" allow="accelerometer; ambient-light-sensor; autoplay; battery; camera; document-domain; encrypted-media; fullscreen; geolocation; gyroscope; layout-animations; legacy-image-formats; magnetometer; microphone; midi; oversized-images; payment; picture-in-picture; publickey-credentials-get; sync-xhr; usb; vr ; wake-lock; xr-spatial-tracking" sandbox="allow-forms allow-modals allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-downloads"></iframe> | |
| You should now be able to record your voice and hear yourself speaking in reverse - spooky 👻! | |
| ## Handling multiple inputs and outputs[[handling-multiple-inputs-and-outputs]] | |
| Let's say we had a more complicated function, with multiple inputs and outputs. | |
| In the example below, we have a function that takes a dropdown index, a slider value, and number, | |
| and returns an audio sample of a musical tone. | |
| Take a look how we pass a list of input and output components, | |
| and see if you can follow along what's happening. | |
| The key here is that when you pass: | |
| * a list of input components, each component corresponds to a parameter in order. | |
| * a list of output coponents, each component corresponds to a returned value. | |
| The code snippet below shows how three input components line up with the three arguments of the `generate_tone()` function: | |
| ```py | |
| import numpy as np | |
| import gradio as gr | |
| notes = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"] | |
| def generate_tone(note, octave, duration): | |
| sr = 48000 | |
| a4_freq, tones_from_a4 = 440, 12 * (octave - 4) + (note - 9) | |
| frequency = a4_freq * 2 ** (tones_from_a4 / 12) | |
| duration = int(duration) | |
| audio = np.linspace(0, duration, duration * sr) | |
| audio = (20000 * np.sin(audio * (2 * np.pi * frequency))).astype(np.int16) | |
| return (sr, audio) | |
| gr.Interface( | |
| generate_tone, | |
| [ | |
| gr.Dropdown(notes, type="index"), | |
| gr.Slider(minimum=4, maximum=6, step=1), | |
| gr.Number(value=1, label="Duration in seconds"), | |
| ], | |
| "audio", | |
| ).launch() | |
| ``` | |
| <iframe src="https://course-demos-generate-tone.hf.space" frameBorder="0" height="450" title="Gradio app" class="container p-0 flex-grow space-iframe" allow="accelerometer; ambient-light-sensor; autoplay; battery; camera; document-domain; encrypted-media; fullscreen; geolocation; gyroscope; layout-animations; legacy-image-formats; magnetometer; microphone; midi; oversized-images; payment; picture-in-picture; publickey-credentials-get; sync-xhr; usb; vr ; wake-lock; xr-spatial-tracking" sandbox="allow-forms allow-modals allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-downloads"></iframe> | |
| ### The `launch()` method[[the-launch-method]] | |
| So far, we have used the `launch()` method to launch the interface, but we | |
| haven't really discussed what it does. | |
| By default, the `launch()` method will launch the demo in a web server that | |
| is running locally. If you are running your code in a Jupyter or Colab notebook, then | |
| Gradio will embed the demo GUI in the notebook so you can easily use it. | |
| You can customize the behavior of `launch()` through different parameters: | |
| - `inline` - whether to display the interface inline on Python notebooks. | |
| - `inbrowser` - whether to automatically launch the interface in a new tab on the default browser. | |
| - `share` - whether to create a publicly shareable link from your computer for the interface. Kind of like a Google Drive link! | |
| We'll cover the `share` parameter in a lot more detail in the next section! | |
| ## ✏️ Let's apply it![[lets-apply-it]] | |
| Let's build an interface that allows you to demo a **speech-recognition** model. | |
| To make it interesting, we will accept *either* a mic input or an uploaded file. | |
| As usual, we'll load our speech recognition model using the `pipeline()` function from 🤗 Transformers. | |
| If you need a quick refresher, you can go back to [that section in Chapter 1](/course/chapter1/3). Next, we'll implement a `transcribe_audio()` function that processes the audio and returns the transcription. Finally, we'll wrap this function in an `Interface` with the `Audio` components for the inputs and just text for the output. Altogether, the code for this application is the following: | |
| ```py | |
| from transformers import pipeline | |
| import gradio as gr | |
| model = pipeline("automatic-speech-recognition") | |
| def transcribe_audio(audio): | |
| transcription = model(audio)["text"] | |
| return transcription | |
| gr.Interface( | |
| fn=transcribe_audio, | |
| inputs=gr.Audio(type="filepath"), | |
| outputs="text", | |
| ).launch() | |
| ``` | |
| If your browser doesn't ask you for microphone permissions, <a href="https://huggingface.co/spaces/course-demos/audio-reverse" target="_blank">open the demo in a separate tab</a>. | |
| <iframe src="https://course-demos-asr.hf.space" frameBorder="0" height="550" title="Gradio app" class="container p-0 flex-grow space-iframe" allow="accelerometer; ambient-light-sensor; autoplay; battery; camera; document-domain; encrypted-media; fullscreen; geolocation; gyroscope; layout-animations; legacy-image-formats; magnetometer; microphone; midi; oversized-images; payment; picture-in-picture; publickey-credentials-get; sync-xhr; usb; vr ; wake-lock; xr-spatial-tracking" sandbox="allow-forms allow-modals allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-downloads"></iframe> | |
| That's it! You can now use this interface to transcribe audio. Notice here that | |
| by passing in the `optional` parameter as `True`, we allow the user to either | |
| provide a microphone or an audio file (or neither, but that will return an error message). | |
| Keep going to see how to share your interface with others! | |
| <EditOnGithub source="https://github.com/huggingface/course/blob/main/chapters/en/chapter9/3.mdx" /> |
Xet Storage Details
- Size:
- 9.26 kB
- Xet hash:
- 9b22fb402cde2f42e3935c654b69e92d6a51eb418c76aa767088a2ba5901b4c5
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.