Spaces:

emmajeed
/

transcriptinator_v2

Build error

App Files Files Community

emmajeed commited on Dec 26, 2025

Commit

7ee2bc7

verified ·

1 Parent(s): 71a0fd4

Upload 5 files

Browse files

Files changed (5) hide show

README.md +119 -6
ai_providers.py +271 -0
app.py +189 -0
requirements.txt +6 -0
transcribe_core.py +365 -0

README.md CHANGED Viewed

@@ -1,12 +1,125 @@
 ---
-title: Transcriptinator V2
-emoji: 😻
-colorFrom: red
-colorTo: gray
 sdk: gradio
-sdk_version: 6.2.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Transcriptinator
+emoji: 🎙️
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 4.16.0
 app_file: app.py
 pinned: false
 ---
+# 🎙️ Transcriptinator
+Simple, fast audio transcription powered by Google's Gemini AI.
+## Features
+- 🎯 **Simple & Fast** - Upload audio, get transcript in ~20-50 seconds
+- 📝 **Smart Summaries** - Automatic summary and key ideas extraction
+- 🔒 **Private** - Your API key, your data - nothing stored
+- 💰 **Free** - Uses your own Gemini API key (free tier: 15 requests/min)
+- 📄 **Markdown Output** - Clean, formatted transcripts ready to download
+## How to Use
+### 1. Get a Gemini API Key (Free)
+1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
+2. Click "Create API key"
+3. Copy the key
+### 2. Transcribe Audio
+1. Upload your audio file (max 10 minutes)
+   - Supported formats: MP3, WAV, M4A, OGG, FLAC, WEBM
+2. Paste your API key
+3. Click "🚀 Transcribe Audio"
+4. Wait ~20-50 seconds
+5. Download your transcript!
+## What You Get
+Your transcript includes:
+```yaml
+---
+title: "Your Audio File"
+date_processed: "2025-12-24"
+summary: "Quick 2-3 sentence overview..."
+key_ideas:
+  - idea: "Main Point 1"
+    description: "Explanation..."
+  - idea: "Main Point 2"
+    description: "Explanation..."
+note_id: "unique-id"
+---
+## Key Ideas
+- **Main Point 1:** Explanation...
+- **Main Point 2:** Explanation...
+## Full Transcription
+[00:00] Speaker 1: Hello...
+[00:15] Speaker 2: Welcome...
+```
+## Limitations
+- **Maximum audio length:** 10 minutes (free HuggingFace tier timeout limit)
+- **Processing time:** ~20-50 seconds depending on audio length
+- **API rate limits:** 15 requests/minute (Gemini free tier)
+## Privacy & Security
+✅ **Your API key is never stored** - Used only for the current request
+✅ **Audio files are temporary** - Deleted immediately after processing
+✅ **No data collection** - Everything runs through your own API key
+## Technical Details
+**AI Calls per transcription:** 3
+1. Transcription (with timestamps and speakers)
+2. Summary generation
+3. Key ideas extraction
+**Processing time estimate:**
+- 2-minute audio: ~22 seconds
+- 5-minute audio: ~35 seconds
+- 10-minute audio: ~50 seconds
+## Troubleshooting
+**"Invalid API key"**
+- Make sure you copied the entire key
+- Generate a new key at [Google AI Studio](https://aistudio.google.com/app/apikey)
+**"Audio file too long"**
+- Maximum is 10 minutes for free tier
+- Split longer files or use the [CLI version](https://github.com/YOUR_USERNAME/transcriptinator)
+**"Processing timeout"**
+- Audio might be too long or corrupted
+- Try with a shorter, clearer audio file
+## Local Installation
+Want to run unlimited length audio? Clone the full version:
+``bash
+git clone https://github.com/YOUR_USERNAME/transcriptinator
+cd transcriptinator
+pip install -r requirements.txt
+python audio_process_and_transcribe.py your_audio_folder -o output_folder
+```
+## Credits
+Built with:
+- [Gradio](https://gradio.app/) - Web interface
+- [Google Gemini](https://ai.google.dev/) - AI transcription
+- [HuggingFace Spaces](https://huggingface.co/spaces) - Hosting
+## License
+MIT License - Feel free to use and modify!

ai_providers.py ADDED Viewed

	@@ -0,0 +1,271 @@

+"""
+AI Provider Abstraction Layer for Transcriptinator
+Supports multiple AI providers: Gemini and HuggingFace
+"""
+from abc import ABC, abstractmethod
+from typing import Dict, List
+import google.generativeai as genai
+import requests
+class TranscriptionProvider(ABC):
+    """Base class for AI transcription providers"""
+    @abstractmethod
+    def transcribe(self, audio_file_path: str) -> str:
+        """Generate transcription from audio file"""
+        pass
+    @abstractmethod
+    def generate_summary(self, text: str) -> str:
+        """Generate summary from transcription text"""
+        pass
+    @abstractmethod
+    def generate_key_ideas(self, text: str) -> List[Dict[str, str]]:
+        """Extract key ideas from transcription text"""
+        pass
+class GeminiProvider(TranscriptionProvider):
+    """Google Gemini provider with configurable models"""
+    AVAILABLE_MODELS = {
+        "Gemini 2.5 Flash": "models/gemini-2.5-flash",
+        "Gemini 2.0 Flash": "models/gemini-2.0-flash-exp",
+        "Gemini 1.5 Flash": "models/gemini-1.5-flash"
+    }
+    def __init__(self, api_key: str, model_name: str):
+        self.api_key = api_key
+        self.model_name = model_name
+        genai.configure(api_key=api_key)
+        self.model = genai.GenerativeModel(self.AVAILABLE_MODELS[model_name])
+    def transcribe(self, audio_file_path: str) -> str:
+        """Generate transcription using Gemini API with timestamps and speakers"""
+        try:
+            with open(audio_file_path, "rb") as audio_file:
+                audio_data = audio_file.read()
+            contents = [
+                {
+                    "role": "user",
+                    "parts": [
+                        {
+                            "mime_type": "audio/mp3",
+                            "data": audio_data
+                        },
+                        "Create a clean transcription of the audio file in English. Tag timestamps and speakers separately within the transcription. If speakers can be identified, use their names; otherwise, use 'Speaker 1', 'Speaker 2', etc. **Return ONLY the raw transcription text, starting directly with the first line of the transcription.** Do not include any introductory phrases, speaker identification plans, completion messages, or any text other than the transcription itself."
+                    ]
+                },
+                {
+                    "role": "model",
+                    "parts": [
+                        "Understood. I will provide a clean, timestamped, and speaker-tagged transcription of the audio file, returning only the transcription text as requested."
+                    ]
+                }
+            ]
+            response = self.model.generate_content(contents)
+            return response.text
+        except Exception as e:
+            raise Exception(f"Error during Gemini transcription: {e}")
+    def generate_summary(self, text: str) -> str:
+        """Generate a concise 2-3 sentence summary using Gemini"""
+        try:
+            prompt_text = f"""
+            Please read the following transcription text and write a concise summary of the main points in 2-3 sentences.
+            Transcription Text:
+            {text}
+            Summary:
+            """
+            response = self.model.generate_content(prompt_text)
+            return response.text.strip()
+        except Exception as e:
+            return f"Error generating summary: {e}"
+    def generate_key_ideas(self, text: str) -> List[Dict[str, str]]:
+        """Identify 3-5 key ideas from the transcription using Gemini"""
+        try:
+            prompt_text = f"""
+            Please read the following transcription text and identify 3-5 key ideas or concepts discussed.
+            Return these key ideas as a bulleted list, with each item in the list being an idea followed by a short (1-sentence) description of the idea.
+            Transcription Text:
+            {text}
+            Key Ideas:
+            """
+            response = self.model.generate_content(prompt_text)
+            key_ideas_text = response.text.strip()
+            key_ideas_list = []
+            for item in key_ideas_text.split('\n'):
+                item = item.lstrip('-* ')
+                if item:
+                    parts = item.split(':', 1)
+                    if len(parts) == 2:
+                        idea = parts[0].strip()
+                        description = parts[1].strip()
+                        key_ideas_list.append({'idea': idea, 'description': description})
+                    else:
+                        key_ideas_list.append({'idea': item.strip(), 'description': ''})
+            return key_ideas_list
+        except Exception as e:
+            return [{'idea': 'Error generating key ideas', 'description': str(e)}]
+class OpenRouterProvider(TranscriptionProvider):
+    """OpenRouter API provider for text generation (summary/key ideas)"""
+    # Using DeepSeek R1 - excellent free model for reasoning and text generation
+    MODEL_ID = "deepseek/deepseek-r1-0528:free"
+    API_URL = "https://openrouter.ai/api/v1/chat/completions"
+    def __init__(self, api_key: str, model_name: str = None):
+        # model_name is ignored for OpenRouter since we use fixed DeepSeek R1
+        self.api_key = api_key
+    def transcribe(self, audio_file_path: str) -> str:
+        """Not supported - OpenRouter doesn't handle audio"""
+        raise NotImplementedError("OpenRouter doesn't support audio transcription. Use Gemini provider.")
+    def generate_summary(self, text: str) -> str:
+        """Generate summary using OpenRouter DeepSeek R1"""
+        try:
+            # Truncate text if too long
+            max_chars = 8000
+            text_to_summarize = text[:max_chars] if len(text) > max_chars else text
+            headers = {
+                "Authorization": f"Bearer {self.api_key}",
+                "Content-Type": "application/json"
+            }
+            payload = {
+                "model": self.MODEL_ID,
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": f"Please provide a concise 2-3 sentence summary of the following transcription:\n\n{text_to_summarize}"
+                    }
+                ]
+            }
+            response = requests.post(self.API_URL, headers=headers, json=payload)
+            # Handle errors
+            if response.status_code != 200:
+                return f"Summary unavailable: OpenRouter API error (status {response.status_code})"
+            result = response.json()
+            # Extract the response
+            if "choices" in result and len(result["choices"]) > 0:
+                return result["choices"][0]["message"]["content"].strip()
+            return "Summary generation completed but format unexpected."
+        except Exception as e:
+            return f"Error generating summary: {e}"
+    def generate_key_ideas(self, text: str) -> List[Dict[str, str]]:
+        """Generate key ideas using OpenRouter DeepSeek R1"""
+        try:
+            # Truncate text if too long
+            max_chars = 6000
+            text_to_analyze = text[:max_chars] if len(text) > max_chars else text
+            headers = {
+                "Authorization": f"Bearer {self.api_key}",
+                "Content-Type": "application/json"
+            }
+            payload = {
+                "model": self.MODEL_ID,
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": f"""Extract 3-5 key ideas from this transcription. Format each as:
+Idea: Brief title
+Description: One sentence explanation
+{text_to_analyze}"""
+                    }
+                ]
+            }
+            response = requests.post(self.API_URL, headers=headers, json=payload)
+            if response.status_code != 200:
+                return [{'idea': 'Key ideas unavailable', 'description': f'OpenRouter API error (status {response.status_code})'}]
+            result = response.json()
+            # Extract and parse the response
+            if "choices" in result and len(result["choices"]) > 0:
+                content = result["choices"][0]["message"]["content"]
+                # Parse the response into structured key ideas
+                key_ideas_list = []
+                lines = content.split('\n')
+                current_idea = None
+                for line in lines:
+                    line = line.strip()
+                    if line.startswith(("Idea:", "**Idea:")):
+                        if current_idea:
+                            key_ideas_list.append(current_idea)
+                        idea_text = line.replace("Idea:", "").replace("**", "").strip()
+                        current_idea = {'idea': idea_text, 'description': ''}
+                    elif line.startswith(("Description:", "**Description:")) and current_idea:
+                        desc_text = line.replace("Description:", "").replace("**", "").strip()
+                        current_idea['description'] = desc_text
+                    elif ':' in line and not current_idea:
+                        # Fallback parsing
+                        parts = line.split(':', 1)
+                        if len(parts) == 2:
+                            key_ideas_list.append({
+                                'idea': parts[0].strip('- •*123456789.').strip(),
+                                'description': parts[1].strip()
+                            })
+                # Add last idea if exists
+                if current_idea and current_idea['idea']:
+                    key_ideas_list.append(current_idea)
+                # Fallback if parsing fails
+                if not key_ideas_list:
+                    # Just use first few sentences
+                    sentences = [s.strip() for s in content.split('.') if s.strip()][:5]
+                    for i, sent in enumerate(sentences, 1):
+                        if sent:
+                            key_ideas_list.append({'idea': f'Key Point {i}', 'description': sent})
+                return key_ideas_list[:5]
+            return [{'idea': 'Key ideas extraction', 'description': 'Unable to parse response'}]
+        except Exception as e:
+            return [{'idea': 'Error generating key ideas', 'description': str(e)}]
+def get_provider(provider_type: str, api_key: str, model_name: str) -> TranscriptionProvider:
+    """Factory function to create appropriate provider"""
+    if provider_type == "Gemini":
+        return GeminiProvider(api_key, model_name)
+    elif provider_type == "OpenRouter":
+        return OpenRouterProvider(api_key, model_name)
+    else:
+        raise ValueError(f"Unknown provider: {provider_type}")

app.py ADDED Viewed

	@@ -0,0 +1,189 @@

+"""
+Transcriptinator - HuggingFace Spaces Gradio Interface
+Audio transcription with Gemini + OpenRouter
+"""
+import gradio as gr
+import os
+from transcribe_core import process_audio_file, get_audio_duration
+from ai_providers import GeminiProvider, OpenRouterProvider
+def transcribe_audio(audio_file, gemini_key, openrouter_key, model_name):
+    """
+    Main transcription function for Gradio interface.
+    Args:
+        audio_file: Uploaded audio file
+        gemini_key: Gemini API key for transcription
+        openrouter_key: OpenRouter API key for summary/ideas
+        model_name: Gemini model to use
+    Returns:
+        Tuple of (status_message, download_file_path)
+    """
+    if not audio_file:
+        return "❌ Please upload an audio file.", None
+    if not gemini_key or len(gemini_key.strip()) < 10:
+        return "❌ Please provide a valid Gemini API key.", None
+    try:
+        # Create Gemini provider for transcription
+        gemini_provider = GeminiProvider(gemini_key, model_name)
+        # Create OpenRouter provider for summary/ideas (optional)
+        openrouter_provider = None
+        if openrouter_key and len(openrouter_key.strip()) > 10:
+            openrouter_provider = OpenRouterProvider(openrouter_key)
+        # Get audio duration and file size for estimate
+        duration = get_audio_duration(audio_file)
+        duration_min = duration / 60
+        file_size_mb = os.path.getsize(audio_file) / (1024 * 1024)
+        # Process the audio file
+        output_path, is_zip = process_audio_file(
+            audio_file,
+            gemini_provider,
+            openrouter_provider,
+            progress_callback=lambda msg, progress: None
+        )
+        # Determine file type for success message
+        if is_zip == "True":
+            file_type = "ZIP archive"
+            file_desc = "Multiple transcript files (chunked audio)"
+        else:
+            file_type = "Markdown file"
+            file_desc = "Single transcript file"
+        text_provider = "OpenRouter (DeepSeek R1)" if openrouter_provider else "Gemini"
+        success_msg = f"""✅ **Transcription Complete!**
+📝 Original file: {os.path.basename(audio_file)}
+⏱️ Duration: {duration_min:.1f} minutes
+💾 Size: {file_size_mb:.1f} MB
+🎙️ Transcription: Gemini ({model_name})
+💡 Summary/Ideas: {text_provider}
+📄 Output: {file_type}
+{file_desc}
+Click below to download your transcript(s)."""
+        # Return the file path directly - Gradio handles the download
+        return success_msg, output_path
+    except Exception as e:
+        error_msg = f"""❌ **Error during transcription:**
+{str(e)}
+**Common issues:**
+- Invalid API key
+- Audio file too large or corrupted
+- Network connection issues"""
+        return error_msg, None
+# Create Gradio interface
+with gr.Blocks(title="Transcriptinator", theme=gr.themes.Soft()) as app:
+    gr.Markdown("""
+    # 🎙️ Transcriptinator
+    ### AI-Powered Audio Transcription
+    **Powered by:** Gemini (transcription) + OpenRouter DeepSeek R1 (summarization)
+    """)
+    with gr.Row():
+        with gr.Column(scale=2):
+            # Audio upload
+            audio_input = gr.Audio(
+                label="Upload Audio File",
+                type="filepath",
+                sources=["upload"],
+            )
+            gr.Markdown("""
+            **Supported formats:** MP3, WAV, M4A, OGG, FLAC, WEBM
+            **Large files (>30MB):** Automatically chunked and processed
+            """)
+            # Model selection
+            model_dropdown = gr.Dropdown(
+                choices=list(GeminiProvider.AVAILABLE_MODELS.keys()),
+                value="Gemini 2.5 Flash",
+                label="Gemini Model",
+                info="Select which Gemini model to use for transcription"
+            )
+            # API keys
+            gemini_key_input = gr.Textbox(
+                label="Gemini API Key (Required)",
+                placeholder="Enter your Gemini API key...",
+                type="password",
+                info="Get one free at: https://aistudio.google.com/app/apikey"
+            )
+            openrouter_key_input = gr.Textbox(
+                label="OpenRouter API Key (Optional)",
+                placeholder="Enter your OpenRouter key for better summaries...",
+                type="password",
+                info="Leave empty to use Gemini for all tasks | Get free at: https://openrouter.ai"
+            )
+            # Submit button
+            submit_btn = gr.Button("🚀 Transcribe Audio", variant="primary", size="lg")
+        with gr.Column(scale=1):
+            # Status output
+            status_output = gr.Markdown(label="Status")
+            # Download button
+            download_output = gr.File(label="📥 Download Transcript", interactive=False)
+    # Information section
+    gr.Markdown("""
+    ---
+    ### 🎯 What you'll get:
+    - 📝 **Full transcription** with timestamps and speaker detection
+    - 📊 **Summary** in 2-3 sentences
+    - 💡 **Key ideas** with descriptions
+    - 📄 **Markdown file** ready to download
+    ### 🤖 AI Models:
+    **Gemini** (Google) - Transcription:
+    - Gemini 2.5 Flash (recommended - fastest, best quality)
+    - Gemini 2.0 Flash (experimental)
+    - Gemini 1.5 Flash (stable)
+    - Native audio support with timestamps and speakers
+    **OpenRouter** (Optional) - Summarization:
+    - Uses DeepSeek R1 (free, excellent reasoning)
+    - Better summaries and key ideas extraction
+    - Leave API key empty to use Gemini for everything
+    ### 🔒 Privacy:
+    - Your API keys are never stored
+    - Audio files are processed temporarily and deleted
+    - All processing happens through your own credentials
+    ### 💡 Tips:
+    - **New users:** Start with just Gemini API key
+    - **Better summaries:** Add OpenRouter key (optional, free)
+    - **Large files:** App automatically chunks files >30MB
+    """)
+    # Connect the transcription function
+    submit_btn.click(
+        fn=transcribe_audio,
+        inputs=[audio_input, gemini_key_input, openrouter_key_input, model_dropdown],
+        outputs=[status_output, download_output]
+    )
+# Launch the app with queuing enabled
+if __name__ == "__main__":
+    app.queue().launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+gradio
+google-generativeai==0.8.3
+pyyaml==6.0.1
+ffmpeg-python==0.2.0
+psutil==5.9.0
+requests==2.31.0

transcribe_core.py ADDED Viewed

	@@ -0,0 +1,365 @@

+"""
+Simplified transcription core for HuggingFace Spaces deployment.
+Version with chunking support for large files (>30MB).
+Now supports multiple AI providers via provider abstraction.
+"""
+import os
+from datetime import date, timedelta
+import yaml
+import uuid
+from typing import List, Dict, Tuple
+import ffmpeg
+import gc
+import psutil
+import zipfile
+import time
+from ai_providers import TranscriptionProvider
+def format_timestamp(seconds: float) -> str:
+    """Convert seconds to ffmpeg time format (HH:MM:SS.xxx)."""
+    td = timedelta(seconds=float(seconds))
+    hours = int(seconds // 3600)
+    minutes = int((seconds % 3600) // 60)
+    secs = seconds % 60
+    return f"{hours:02d}:{minutes:02d}:{secs:06.3f}"
+def check_memory_usage() -> bool:
+    """Check current memory usage and print warning if too high."""
+    process = psutil.Process()
+    memory_percent = process.memory_percent()
+    if memory_percent > 80:
+        print(f"Warning: High memory usage ({memory_percent:.1f}%)")
+        return False
+    return True
+def clean_partial_chunks(base_file_path: str) -> None:
+    """Clean up any existing partial chunks before starting."""
+    try:
+        base_name = os.path.splitext(os.path.basename(base_file_path))[0]
+        output_folder = os.path.dirname(base_file_path)
+        pattern = f"{base_name}_part*"
+        print(f"Cleaning up any existing chunks matching: {pattern}")
+        for file in os.listdir(output_folder):
+            if file.startswith(f"{base_name}_part") and file.endswith(".mp3"):
+                file_path = os.path.join(output_folder, file)
+                try:
+                    os.remove(file_path)
+                    print(f"Removed existing chunk: {file}")
+                except Exception as e:
+                    print(f"Warning: Could not remove {file}: {e}")
+    except Exception as e:
+        print(f"Warning: Error during cleanup: {e}")
+def chunk_audio_file(audio_file_path: str, chunk_duration_minutes: int = 25, overlap_seconds: int = 5) -> List[str]:
+    """Chunks an audio file into smaller parts using ffmpeg streaming."""
+    chunked_files = []
+    try:
+        # Clean up any existing chunks first
+        clean_partial_chunks(audio_file_path)
+        # Get audio duration
+        print("\nAnalyzing audio file duration...")
+        duration = get_audio_duration(audio_file_path)
+        if duration is None:
+            print("Error: Could not determine audio file duration.")
+            return chunked_files
+        chunk_length = chunk_duration_minutes * 60
+        overlap = overlap_seconds
+        start_time = 0
+        chunk_index = 1
+        base_name = os.path.splitext(os.path.basename(audio_file_path))[0]
+        output_folder = os.path.dirname(audio_file_path)
+        total_chunks = int((duration - overlap) / (chunk_length - overlap)) + 1
+        print(f"\nChunking audio file: {audio_file_path}")
+        print(f"Total duration: {format_timestamp(duration)}")
+        print(f"Chunk duration: {chunk_duration_minutes} minutes, Overlap: {overlap_seconds} seconds")
+        print(f"Estimated number of chunks: {total_chunks}\n")
+        while start_time < duration:
+            if not check_memory_usage():
+                print("Memory usage too high, waiting before continuing...")
+                time.sleep(5)
+                continue
+            # Calculate end time for current chunk
+            end_time = min(start_time + chunk_length, duration)
+            # Make sure we don't create a tiny final chunk
+            if end_time - start_time < 30:  # If chunk would be less than 30 seconds
+                if chunk_index > 1:  # If not the first chunk
+                    break  # Skip creating this small final chunk
+                end_time = duration  # If it's the first chunk, include all audio
+            chunk_file_name = f"{base_name}_part{chunk_index}.mp3"
+            chunk_file_path = os.path.join(output_folder, chunk_file_name)
+            print(f"Creating chunk {chunk_index}/{total_chunks}: {chunk_file_name}")
+            print(f"  Time range: {format_timestamp(start_time)} to {format_timestamp(end_time)}")
+            try:
+                # Use ffmpeg to extract chunk
+                if os.path.exists(chunk_file_path):
+                    os.remove(chunk_file_path)
+                stream = ffmpeg.input(audio_file_path, ss=start_time, t=end_time-start_time)
+                stream = ffmpeg.output(stream, chunk_file_path, acodec='libmp3lame', loglevel='error')
+                ffmpeg.run(stream, capture_stdout=True, capture_stderr=True, overwrite_output=True)
+                if os.path.exists(chunk_file_path):
+                    chunk_size = os.path.getsize(chunk_file_path) / (1024 * 1024)
+                    print(f"  ✓ Saved chunk: {chunk_file_path} ({chunk_size:.2f}MB)")
+                    chunked_files.append(chunk_file_path)
+                    chunk_index += 1
+                else:
+                    print(f"  ✗ Error: Chunk file was not created")
+                    break
+            except ffmpeg.Error as e:
+                print(f"  ✗ Error processing chunk: {e.stderr.decode() if e.stderr else str(e)}")
+                break
+            # Update start time for next chunk, considering overlap
+            if end_time == duration:  # If this was the last chunk
+                break
+            start_time = end_time - overlap
+            # Force garbage collection after each chunk
+            gc.collect()
+        created_chunks = chunk_index - 1
+        print(f"\nAudio file chunking completed:")
+        print(f"- Created {created_chunks} out of {total_chunks} expected chunks")
+        print(f"- Final chunk duration: {format_timestamp(end_time - start_time)}")
+    except Exception as e:
+        print(f"Error during audio chunking: {e}")
+    return chunked_files
+def get_audio_duration(file_path: str) -> float:
+    """Get the duration of an audio file using ffmpeg."""
+    try:
+        probe = ffmpeg.probe(file_path)
+        duration = float(probe['format']['duration'])
+        return duration
+    except Exception as e:
+        raise Exception(f"Error getting audio duration: {e}")
+def generate_transcription(audio_file_path: str, provider: TranscriptionProvider) -> str:
+    """
+    Generate transcription using the configured AI provider.
+    Args:
+        audio_file_path: Path to audio file
+        provider: TranscriptionProvider instance (Gemini or HuggingFace)
+    Returns:
+        Transcription text (with timestamps/speakers for Gemini, plain text for HF)
+    """
+    try:
+        return provider.transcribe(audio_file_path)
+    except Exception as e:
+        raise Exception(f"Error during transcription: {e}")
+def generate_summary(transcription_text: str, provider: TranscriptionProvider) -> str:
+    """
+    Generate a concise 2-3 sentence summary using the configured provider.
+    Args:
+        transcription_text: Full transcription
+        provider: TranscriptionProvider instance
+    Returns:
+        Summary text
+    """
+    try:
+        return provider.generate_summary(transcription_text)
+    except Exception as e:
+        return f"Error generating summary: {e}"
+def generate_key_ideas(transcription_text: str, provider: TranscriptionProvider) -> List[Dict[str, str]]:
+    """
+    Identify 3-5 key ideas from the transcription using the configured provider.
+    Args:
+        transcription_text: Full transcription
+        provider: TranscriptionProvider instance
+    Returns:
+        List of {idea, description} dictionaries
+    """
+    try:
+        return provider.generate_key_ideas(transcription_text)
+    except Exception as e:
+        return [{'idea': 'Error generating key ideas', 'description': str(e)}]
+def create_transcript_markdown(audio_filename: str, transcription: str, summary: str, key_ideas: List[Dict[str, str]]) -> str:
+    """
+    Create a formatted markdown file with YAML frontmatter.
+    Args:
+        audio_filename: Name of the audio file
+        transcription: Full transcription text
+        summary: Summary text
+        key_ideas: List of key ideas
+    Returns:
+        Formatted markdown content
+    """
+    base_name = os.path.splitext(audio_filename)[0]
+    # Build YAML frontmatter
+    yaml_metadata = {
+        'title': base_name,
+        'audio_file': audio_filename,
+        'date_processed': str(date.today()),
+        'summary': summary,
+        'key_ideas': key_ideas,
+        'note_id': str(uuid.uuid4())
+    }
+    yaml_frontmatter = "---\n" + yaml.dump(yaml_metadata, sort_keys=False, indent=2, allow_unicode=True) + "---\n\n"
+    # Build content sections
+    content = yaml_frontmatter
+    # Key ideas section
+    content += "## Key Ideas\n\n"
+    if key_ideas:
+        for idea_item in key_ideas:
+            if idea_item['description']:
+                content += f"- **{idea_item['idea']}:** {idea_item['description']}\n"
+            else:
+                content += f"- **{idea_item['idea']}**\n"
+    else:
+        content += "*(No key ideas generated)*\n"
+    content += "\n## Full Transcription\n\n"
+    content += transcription
+    return content
+def process_audio_file(audio_file_path: str, gemini_provider: TranscriptionProvider, openrouter_provider: TranscriptionProvider = None, progress_callback=None) -> Tuple[str, str]:
+    """
+    Process an audio file and return the markdown content or ZIP of multiple files.
+    Args:
+        audio_file_path: Path to audio file
+        gemini_provider: GeminiProvider for transcription
+        openrouter_provider: Optional OpenRouterProvider for summary/ideas (if None, uses gemini_provider)
+        progress_callback: Optional callback function for progress updates
+    Returns:
+        Tuple of (output_file_path, is_zip_boolean_as_string)
+        - If single file: ("path/to/file.md", "False")
+        - If chunked: ("path/to/file.zip", "True")
+    """
+    audio_filename = os.path.basename(audio_file_path)
+    base_name = os.path.splitext(audio_filename)[0]
+    # Check file size
+    file_size_mb = os.path.getsize(audio_file_path) / (1024 * 1024)
+    print(f"\nProcessing: {audio_filename} ({file_size_mb:.2f}MB)")
+    # Determine if chunking is needed
+    files_to_transcribe = []
+    if file_size_mb > 30:
+        print(f"File is larger than 30MB. Chunking into smaller parts...")
+        if progress_callback:
+            progress_callback("📦 Chunking large audio file...", 0.1)
+        chunked_files = chunk_audio_file(audio_file_path)
+        files_to_transcribe.extend(chunked_files)
+    else:
+        print("File is small enough to process directly")
+        files_to_transcribe.append(audio_file_path)
+    # Process each file (chunk or original)
+    markdown_files = []
+    total_files = len(files_to_transcribe)
+    for idx, file_path in enumerate(files_to_transcribe, 1):
+        file_name = os.path.basename(file_path)
+        print(f"\nTranscribing {idx}/{total_files}: {file_name}")
+        if progress_callback:
+            progress = 0.2 + (0.6 * (idx - 1) / total_files)
+            progress_callback(f"🎙️ Transcribing part {idx}/{total_files}...", progress)
+        # Transcribe using Gemini
+        transcription = generate_transcription(file_path, gemini_provider)
+        if progress_callback:
+            progress_callback(f"📝 Generating metadata for part {idx}/{total_files}...", progress + 0.1)
+        # Generate metadata using OpenRouter if available, otherwise Gemini
+        text_provider = openrouter_provider if openrouter_provider else gemini_provider
+        summary = generate_summary(transcription, text_provider)
+        key_ideas = generate_key_ideas(transcription, text_provider)
+        # Create markdown
+        markdown_content = create_transcript_markdown(file_name, transcription, summary, key_ideas)
+        # Save markdown file to outputs directory
+        output_dir = "outputs"
+        os.makedirs(output_dir, exist_ok=True)
+        output_filename = os.path.splitext(file_name)[0] + ".md"
+        markdown_path = os.path.join(output_dir, output_filename)
+        with open(markdown_path, 'w', encoding='utf-8') as f:
+            f.write(markdown_content)
+        markdown_files.append(markdown_path)
+        # Clean up chunk audio file
+        if "_part" in file_name:
+            try:
+                os.remove(file_path)
+                print(f"Deleted chunk: {file_name}")
+            except Exception as e:
+                print(f"Warning: Could not delete chunk {file_name}: {e}")
+    # Return result
+    if len(markdown_files) == 1:
+        # Single file - return as-is
+        return markdown_files[0], "False"
+    else:
+        # Multiple files - create ZIP
+        if progress_callback:
+            progress_callback("📦 Creating ZIP file...", 0.9)
+        output_dir = "outputs"
+        os.makedirs(output_dir, exist_ok=True)
+        zip_filename = f"{base_name}_transcripts.zip"
+        zip_path = os.path.join(output_dir, zip_filename)
+        with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
+            for md_file in markdown_files:
+                # Add with proper filename
+                basename = os.path.basename(md_file)
+                zipf.write(md_file, basename)
+                # Delete individual md files after adding to ZIP
+                try:
+                    os.remove(md_file)
+                except Exception as e:
+                    print(f"Warning: Could not delete {md_file}: {e}")
+        print(f"\n✅ Created ZIP with {len(markdown_files)} transcripts: {zip_filename}")
+        return zip_path, "True"