# Building Your First AI App with Inference Providers You've learned the basics and understand the provider ecosystem. Now let's build something practical: an **AI Meeting Notes** app that transcribes audio files and generates summaries with action items. This project demonstrates real-world AI orchestration using multiple specialized providers within a single application. ## Project Overview Our app will: 1. **Accept audio** as a microphone input through a web interface 2. **Transcribe speech** using a fast speech-to-text model 3. **Generate summaries** using a powerful language model 4. **Deploy to the web** for easy sharing **Tech Stack**: Gradio (for the UI) + Inference Providers (for the AI) **Tech Stack**: HTML/JavaScript (for the UI) + Inference Providers (for the AI) We'll use HTML and JavaScript for the UI just to keep things simple and agnostic, but if you want to see more mature examples, you can check out the [Hugging Face JS spaces](https://huggingface.co/huggingfacejs/spaces) page. ## Step 1: Set Up Authentication Before we start coding, authenticate with Hugging Face using the CLI: ```bash pip install huggingface_hub hf auth login ``` When prompted, paste your Hugging Face token. This handles authentication automatically for all your inference calls. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). You'll need your Hugging Face token. Get one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). We can set it as an environment variable in our app. ```bash export HF_TOKEN="your_token_here" ``` ```javascript // Add your token at the top of your script const HF_TOKEN = process.env.HF_TOKEN; ``` > [!WARNING] > When we deploy our app to Hugging Face Spaces, we'll need to add our token as a secret. This is a secure way to handle the token and avoid exposing it in the code. ## Step 2: Build the User Interface Now let's create a simple web interface using Gradio: ```python import gradio as gr from huggingface_hub import InferenceClient def process_meeting_audio(audio_file): """Process uploaded audio file and return transcript + summary""" if audio_file is None: return "Please upload an audio file.", "" # We'll implement the AI logic next return "Transcript will appear here...", "Summary will appear here..." # Create the Gradio interface app = gr.Interface( fn=process_meeting_audio, inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"), outputs=[ gr.Textbox(label="Transcript", lines=10), gr.Textbox(label="Summary & Action Items", lines=8) ], title="🎤 AI Meeting Notes", description="Upload an audio file to get an instant transcript and summary with action items." ) if __name__ == "__main__": app.launch() ``` Here we're using Gradio's `gr.Audio` component to either upload an audio file or use the microphone input. We're keeping things simple with two outputs: a transcript and a summary with action items. For JavaScript, we'll create a clean HTML interface with native file upload and a simple loading state: ```html

🎤 AI Meeting Notes

Upload audio file

Processing...

📝 Transcript

📋 Summary

``` This creates a clean drag-and-drop interface with styled results sections for the transcript and summary. Our application can then use the `InferenceClient` from `huggingface.js` to call the transcription and summarization functions. ```javascript import { InferenceClient } from "https://esm.sh/@huggingface/inference"; // Access the token from Hugging Face Spaces secrets const HF_TOKEN = window.huggingface?.variables?.HF_TOKEN; // Or if you're running locally, you can set it as an environment variable // const HF_TOKEN = process.env.HF_TOKEN; document.getElementById("file").onchange = async (e) => { if (!e.target.files[0]) return; const file = e.target.files[0]; show(document.getElementById("loading")); hide(document.getElementById("results"), document.getElementById("error")); try { const transcript = await transcribe(file); const summary = await summarize(transcript); document.getElementById("transcript").textContent = transcript; document.getElementById("summary").textContent = summary; hide(document.getElementById("loading")); show(document.getElementById("results")); } catch (error) { hide(document.getElementById("loading")); showError(`Error: ${error.message}`); } }; ``` We'll also need to implement the `transcribe` and `summarize` functions. ## Step 3: Add Speech Transcription Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing. > [!TIP] > We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page. ```python def transcribe_audio(audio_file_path): """Transcribe audio using fal.ai for speed""" client = InferenceClient(provider="auto") # Pass the file path directly - the client handles file reading transcript = client.automatic_speech_recognition( audio=audio_file_path, model="openai/whisper-large-v3" ) return transcript.text ``` Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing. > [!TIP] > We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page. ```javascript import { InferenceClient } from "https://esm.sh/@huggingface/inference"; async function transcribe(file) { const client = new InferenceClient(HF_TOKEN); const output = await client.automaticSpeechRecognition({ data: file, model: "openai/whisper-large-v3-turbo", provider: "auto", }); return output.text || output || "Transcription completed"; } ``` ## Step 4: Add AI Summarization Next, we'll use a powerful language model like `deepseek-ai/DeepSeek-R1-0528` from DeepSeek via an Inference Provider. The `:fastest` policy is the default for chat completions, automatically selecting the best performing provider for this model. We will define a custom prompt to ensure the output is formatted as a summary with action items and decisions made: ```python def generate_summary(transcript): """Generate summary using an Inference Provider""" client = InferenceClient(provider="auto") prompt = f""" Analyze this meeting transcript and provide: 1. A concise summary of key points 2. Action items with responsible parties 3. Important decisions made Transcript: {transcript} Format with clear sections: ## Summary ## Action Items ## Decisions Made """ response = client.chat.completions.create( model="deepseek-ai/DeepSeek-R1-0528:fastest", messages=[{"role": "user", "content": prompt}], max_tokens=1000 ) return response.choices[0].message.content ``` Next, we'll use a powerful language model like `deepseek-ai/DeepSeek-R1-0528` from DeepSeek via an Inference Provider. The `:fastest` policy is the default for chat completions, automatically selecting the best performing provider for this model. We will define a custom prompt to ensure the output is formatted as a summary with action items and decisions made: ```javascript async function summarize(transcript) { const client = new InferenceClient(HF_TOKEN); const prompt = `Analyze this meeting transcript and provide: 1. A concise summary of key points 2. Action items with responsible parties 3. Important decisions made Transcript: ${transcript} Format with clear sections: ## Summary ## Action Items ## Decisions Made`; const response = await client.chatCompletion( { model: "deepseek-ai/DeepSeek-R1-0528:fastest", messages: [ { role: "user", content: prompt, }, ], max_tokens: 1000, }, { provider: "auto", } ); return ( response.choices?.[0]?.message?.content || response || "No summary available" ); } ``` ## Step 5: Deploy on Hugging Face Spaces To deploy, we'll need to create an `app.py` file and upload it to Hugging Face Spaces. 📋 Click to view the complete app.py file ```python import gradio as gr from huggingface_hub import InferenceClient def transcribe_audio(audio_file_path): """Transcribe audio using an Inference Provider""" client = InferenceClient(provider="auto") # Pass the file path directly - the client handles file reading transcript = client.automatic_speech_recognition( audio=audio_file_path, model="openai/whisper-large-v3" ) return transcript.text def generate_summary(transcript): """Generate summary using an Inference Provider""" client = InferenceClient(provider="auto") prompt = f""" Analyze this meeting transcript and provide: 1. A concise summary of key points 2. Action items with responsible parties 3. Important decisions made Transcript: {transcript} Format with clear sections: ## Summary ## Action Items ## Decisions Made """ response = client.chat.completions.create( model="deepseek-ai/DeepSeek-R1-0528:fastest", messages=[{"role": "user", "content": prompt}], max_tokens=1000, ) return response.choices[0].message.content def process_meeting_audio(audio_file): """Main processing function""" if audio_file is None: return "Please upload an audio file.", "" try: # Step 1: Transcribe transcript = transcribe_audio(audio_file) # Step 2: Summarize summary = generate_summary(transcript) return transcript, summary except Exception as e: return f"Error processing audio: {str(e)}", "" # Create Gradio interface app = gr.Interface( fn=process_meeting_audio, inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"), outputs=[ gr.Textbox(label="Transcript", lines=10), gr.Textbox(label="Summary & Action Items", lines=8), ], title="🎤 AI Meeting Notes", description="Upload audio to get instant transcripts and summaries.", ) if __name__ == "__main__": app.launch() ``` Our app will run on port 7860 and look like this: ![Gradio app](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/gradio-app.png) To deploy, we'll need to create a new Space and upload our files. 1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space) 2. **Choose Gradio SDK** and make it public 3. **Upload your files**: Upload `app.py` 4. **Add your token**: In Space settings, add `HF_TOKEN` as a secret (get it from [your settings](https://huggingface.co/settings/tokens)) 5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name` > **Note**: While we used CLI authentication locally, Spaces requires the token as a secret for the deployment environment. For JavaScript deployment, create a simple static HTML file: 📋 Click to view the complete index.html file ```html 🎤 AI Meeting Notes

🎤 AI Meeting Notes

Upload audio file

Processing...

📝 Transcript

📋 Summary

``` We can run our app locally by going to the file from our browser. ![Local app](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/js-app.png) To deploy: 1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space) 2. **Choose Static SDK** and make it public 3. **Upload your file**: Upload `index.html` 4. **Add your token as a secret**: In Space settings, add `HF_TOKEN` as a **Secret** 5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name` > **Note**: The token is securely managed by Hugging Face Spaces and accessed via `window.huggingface.variables.HF_TOKEN`. ## Next Steps Congratulations! You've created a production-ready AI application that: handles real-world tasks, provides a professional interface, scales automatically, and costs efficiently. If you want to explore more providers, you can check out the [Inference Providers](https://huggingface.co/inference-providers) page. Or here are some ideas for next steps: - **Improve your prompt**: Try different prompts to improve the quality for your use case - **Try different models**: Experiment with various speech and text models - **Compare performance**: Benchmark speed vs. accuracy across providers