Spaces:

abidlabs
/

closed2open

Running

File size: 4,579 Bytes

adc9dea

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Whisper API → Whisper.cpp Migration\n",
        "\n",
        "This notebook demonstrates how to migrate from OpenAI's Whisper API to using Whisper.cpp for local speech-to-text processing.\n",
        "\n",
        "## Benefits of Whisper.cpp\n",
        "- **Local processing**: No API calls, complete privacy\n",
        "- **Cost savings**: No per-minute charges\n",
        "- **Offline capability**: Works without internet\n",
        "- **Customization**: Fine-tune for your specific use case\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Installation\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Install whisper.cpp and dependencies\n",
        "%pip install whisper-cpp-python\n",
        "%pip install librosa soundfile\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import whisper_cpp\n",
        "import librosa\n",
        "import soundfile as sf\n",
        "import numpy as np\n",
        "from pathlib import Path\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Model Loading\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Load Whisper model (downloads automatically on first run)\n",
        "model = whisper_cpp.Whisper.from_pretrained(\"base\")\n",
        "print(\"Model loaded successfully!\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Audio Processing Function\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "def transcribe_audio(audio_file_path):\n",
        "    \"\"\"\n",
        "    Transcribe audio file using Whisper.cpp\n",
        "    \n",
        "    Args:\n",
        "        audio_file_path (str): Path to audio file\n",
        "    \n",
        "    Returns:\n",
        "        dict: Transcription result with text and metadata\n",
        "    \"\"\"\n",
        "    # Load audio file\n",
        "    audio, sr = librosa.load(audio_file_path, sr=16000)\n",
        "    \n",
        "    # Transcribe using Whisper.cpp\n",
        "    result = model.transcribe(audio)\n",
        "    \n",
        "    return {\n",
        "        \"text\": result[\"text\"],\n",
        "        \"language\": result.get(\"language\", \"auto\"),\n",
        "        \"segments\": result.get(\"segments\", [])\n",
        "    }\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Interactive Demo\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#nbgradio name=\"whisper_api_to_whisper_cpp\"\n",
        "import gradio as gr\n",
        "\n",
        "def process_audio(audio_file):\n",
        "    if audio_file is None:\n",
        "        return \"Please upload an audio file.\"\n",
        "    \n",
        "    try:\n",
        "        result = transcribe_audio(audio_file)\n",
        "        return f\"**Transcription:**\\\\n{result['text']}\\\\n\\\\n**Language:** {result['language']}\"\n",
        "    except Exception as e:\n",
        "        return f\"Error processing audio: {str(e)}\"\n",
        "\n",
        "# Create Gradio interface\n",
        "demo = gr.Interface(\n",
        "    fn=process_audio,\n",
        "    inputs=gr.Audio(type=\"filepath\"),\n",
        "    outputs=gr.Markdown(),\n",
        "    title=\"Whisper.cpp Speech-to-Text\",\n",
        "    description=\"Upload an audio file to transcribe it using Whisper.cpp (local processing)\",\n",
        "    examples=[\n",
        "        # Add example audio files here\n",
        "    ]\n",
        ")\n",
        "\n",
        "demo.launch()\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": []
    }
  ],
  "metadata": {
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}