File size: 59,739 Bytes

becc14c

{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "## Predspracovanie dát\n",
        "\n",
        "Pomocné funkcie využité pre vytvorenie trénovacieho/evaluačného datasetu v tomto notebooku, predspracovávali dáta kombináciou delenia audia na menšie časti na základe detekovania ticha v nahrávkach, resampling dát na 16kHz formát požadovaný modelom, automatickú trasnkripciu ladeným modelom s prvotných tréningov, a navrhnutie rozdelenia nahrávok podľa počtu tokenov na ukončené vety.\n",
        "\n",
        "### Výstupný formát\n",
        "- TSV súbor obsahujúci:\n",
        "    - cestu k zvukovému záznamu\n",
        "    - dĺžku záznamu v sekundách\n",
        "    - transkripciu záznamu\n",
        "\n",
        "\n",
        "### Využité knižnice\n",
        "  - os\n",
        "  - torch\n",
        "  - torchaudio  - Na načitanie a resampling zvukových záznamov\n",
        "  - pydub (AudioSegment, silence)\t- Na spracovanie audia, detekciu ticha, exportovanie častí\n",
        "  - transformers -\tNa načítanie procesora, modelu, a tokenizáciu pre transkripciu\n",
        "  - re - Regulárne výrazy na rozdelenie textu podľa viet.\n",
        "  - pandas - export dát do TSV\n",
        "  - tqdm - monitorovanie priebehu pri spracovaní tokenov\n",
        "\n",
        "\n"
      ],
      "metadata": {
        "id": "dB-uebOX2p8j"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Popis funckii\n",
        "\n",
        "### load_whisper_custom_model\n",
        "- **Parametre :** cesta k modelu\n",
        "- načitanie vlastného modelu pomocou transformers\n",
        "- použitie WhisperProcessor a WhisperForConditionalGeneration\n",
        "- prenesenie modelu na GPU\n",
        "\n",
        "### transcribe_function\n",
        "- **Parametre :** cesta k súboru so zvukovým záznamom, processor, model\n",
        "- Načítanie audia pomocou torchaudio.load.\n",
        "- Ak sampling rate audia ≠ 16000 Hz (cieľ), resampluje sa pomocou torchaudio.transforms.Resample.\n",
        "- Audio sa premení na vstupné features cez processor.feature_extractor.\n",
        "- Forced decoding nastavuje jazyk (\"sk\" = slovenčina) a úlohu (\"transcribe\") cez get_decoder_prompt_ids.\n",
        "- Model generuje text cez beam search (num_beams=5, skoré ukončenie).\n",
        "- Výsledok sa dekóduje cez tokenizer a vráti čistý text.\n",
        "\n",
        "### split_text_by_sentence\n",
        "- **Parametre :** text, maximálny počet tokenov, tokenizer\n",
        "- používa regulárne výrazy na delenie textu po vetách (bodka, otáznik, výkričník)\n",
        "- ak vetu už nemožno pridať do chunku bez prekročenia limitu tokenov (max_tokens), začne nový chunk\n",
        "- okenizácia sa robí pomocou processor.tokenizer(sentence).input_ids\n",
        "\n",
        "\n",
        "### split_and_transcribe_with_token_limit\n",
        "- **Parametre :** cesta k súboru so zvukovým záznamom, processor, model\n",
        "- načítanie audia cez pydub (AudioSegment.from_mp3).\n",
        "- detekcia ticha:\n",
        "  -  pomocou silence.detect_silence(audio, min_silence_len=MIN_SILENCE_LEN, silence_thresh=SILENCE_THRESH)\n",
        "  - detekované tiché body (v strede medzi začiatkom a koncom ticha) sa berú ako potenciálne body delenia\n",
        "- fallback rozdelenie pre dlhé časti (> MAX_CHUNK_MS):\n",
        "- ak chunk je príliš dlhý, skúsi sa ešte jemnejšie delenie s mäkšími kritériami (ALT_MIN_SILENCE_LEN).\n",
        "- export chunkov do .wav formátu.\n",
        "- transkripcia chunkov\n",
        "- v prípade, že text presahuje token limit, text sa rozdelí na viac častí (iba textové rozdelenie).\n",
        "\n",
        "\n",
        "### process_all_mp3\n",
        "- spracovanie všetkých súborov vo formáte mp3 v definovanom priečinku\n",
        "- volá funkcie na spracovanie dát\n",
        "- výstup TSV súbor vo formáte : **path**, **duration**, **sentence**"
      ],
      "metadata": {
        "id": "MV7Dh_Qw-nWV"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Import knižníc"
      ],
      "metadata": {
        "id": "Ybedar-N-957"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "kDNwj9t1qm8w",
        "outputId": "12aca0b1-41c4-4eeb-9dc7-f13a057bde20",
        "collapsed": true
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Collecting pydub\n",
            "  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)\n",
            "Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n",
            "Installing collected packages: pydub\n",
            "Successfully installed pydub-0.25.1\n"
          ]
        }
      ],
      "source": [
        "!pip install pydub"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "D2MGL2U4q8Iu"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import pandas as pd\n",
        "import nltk\n",
        "from pydub import AudioSegment, silence\n",
        "from nltk.tokenize import sent_tokenize\n",
        "import re\n",
        "import gc\n",
        "import torch\n",
        "import torchaudio\n",
        "import pandas as pd\n",
        "import soundfile as sf\n",
        "from tqdm import tqdm\n",
        "from pydub import AudioSegment, silence\n",
        "from transformers import WhisperProcessor, WhisperForConditionalGeneration"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Import vlastného modelu implemenácia funkcií\n"
      ],
      "metadata": {
        "id": "ynqsRY58V1ts"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from google.colab import drive\n",
        "\n",
        "drive.mount('/content/drive')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "wiQ3EoJwV6c9",
        "outputId": "6218351e-5d3b-47c8-a6dc-e0bec12abf1a"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# === CONFIG ===\n",
        "\n",
        "AUDIO_FOLDER = \"/content/drive/MyDrive/DP_data/audio_files\"\n",
        "OUTPUT_FOLDER = \"/content/drive/MyDrive/DP_data/audio_chunks\"\n",
        "TSV_OUTPUT = \"/content/drive/MyDrive/DP_data/adpocia3.tsv\"\n",
        "MODEL_DIR = \"/content/drive/MyDrive/DP_data/whisper_medium_3d\"\n",
        "\n",
        "TARGET_SAMPLE_RATE = 16000\n",
        "TOKEN_LIMIT = 100\n",
        "MIN_SILENCE_LEN = 400  # Minimum silence length v ms\n",
        "SILENCE_THRESH = -40    # Silence threshold v dB\n",
        "\n",
        "MAX_CHUNK_MS = 28000\n",
        "ALT_MIN_SILENCE_LEN = 200  # mäkšie kritérium pre fallback\n",
        "\n",
        "os.makedirs(OUTPUT_FOLDER, exist_ok=True)\n",
        "\n",
        "# ==== NACITANIE MODELU ====\n",
        "def load_whisper_custom_model(model_dir):\n",
        "    processor = WhisperProcessor.from_pretrained(model_dir)\n",
        "    model = WhisperForConditionalGeneration.from_pretrained(model_dir)\n",
        "    model.eval()\n",
        "    model.to(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
        "    return processor, model\n",
        "\n",
        "# ==== TRANSKRIPCIA ====\n",
        "def transcribe_function(path_file, processor, model):\n",
        "    speech_array, sampling_rate = torchaudio.load(path_file)\n",
        "    if sampling_rate != TARGET_SAMPLE_RATE:\n",
        "        resampler = torchaudio.transforms.Resample(orig_freq=sampling_rate, new_freq=TARGET_SAMPLE_RATE)\n",
        "        speech_array = resampler(speech_array)\n",
        "\n",
        "    inputs = processor.feature_extractor(\n",
        "        speech_array.squeeze().numpy(),\n",
        "        sampling_rate=TARGET_SAMPLE_RATE,\n",
        "        return_tensors=\"pt\"\n",
        "    )\n",
        "\n",
        "    forced_decoder_ids = processor.get_decoder_prompt_ids(language=\"sk\", task=\"transcribe\")\n",
        "    input_features = inputs[\"input_features\"].to(model.device)\n",
        "\n",
        "    with torch.no_grad():\n",
        "        generated_ids = model.generate(\n",
        "            input_features,\n",
        "            forced_decoder_ids=forced_decoder_ids,\n",
        "            num_beams=5,\n",
        "            early_stopping=True\n",
        "        )\n",
        "\n",
        "    transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]\n",
        "    return transcription.strip()\n",
        "\n",
        "# ==== DELENIE TEXTU PODLA VIET ====\n",
        "def split_text_by_sentence(text, max_tokens, tokenizer):\n",
        "    sentences = re.split(r'(?<=[.!?])\\s+', text.strip())\n",
        "    chunks, current_chunk, current_tokens = [], [], 0\n",
        "\n",
        "    for sentence in sentences:\n",
        "        tokens = tokenizer(sentence).input_ids\n",
        "        if current_tokens + len(tokens) > max_tokens and current_chunk:\n",
        "            chunks.append(\" \".join(current_chunk))\n",
        "            current_chunk = [sentence]\n",
        "            current_tokens = len(tokens)\n",
        "        else:\n",
        "            current_chunk.append(sentence)\n",
        "            current_tokens += len(tokens)\n",
        "\n",
        "    if current_chunk:\n",
        "        chunks.append(\" \".join(current_chunk))\n",
        "\n",
        "    return chunks\n",
        "\n",
        "def split_and_transcribe_with_token_limit(mp3_path, processor, model):\n",
        "    audio = AudioSegment.from_mp3(mp3_path)\n",
        "    filename = os.path.splitext(os.path.basename(mp3_path))[0]\n",
        "\n",
        "    silence_points = silence.detect_silence(audio, min_silence_len=MIN_SILENCE_LEN, silence_thresh=SILENCE_THRESH)\n",
        "    silence_points = [((start + end) // 2) for start, end in silence_points]\n",
        "\n",
        "    chunk_starts = [0]\n",
        "    for point in silence_points:\n",
        "        if point - chunk_starts[-1] >= 20000:\n",
        "            chunk_starts.append(point)\n",
        "    chunk_starts.append(len(audio))\n",
        "\n",
        "    adjusted_starts = [chunk_starts[0]]\n",
        "    for i in range(1, len(chunk_starts)):\n",
        "        prev = adjusted_starts[-1]\n",
        "        curr = chunk_starts[i]\n",
        "        duration = curr - prev\n",
        "\n",
        "        if duration > MAX_CHUNK_MS:\n",
        "            long_chunk = audio[prev:curr]\n",
        "            extra_silences = silence.detect_silence(\n",
        "                long_chunk, min_silence_len=ALT_MIN_SILENCE_LEN, silence_thresh=SILENCE_THRESH\n",
        "            )\n",
        "            extra_points = [((start + end) // 2 + prev) for start, end in extra_silences if (start + end)//2 + prev < curr]\n",
        "\n",
        "            if extra_points:\n",
        "                for p in extra_points:\n",
        "                    if p - adjusted_starts[-1] >= 8000:\n",
        "                        adjusted_starts.append(p)\n",
        "                if adjusted_starts[-1] != curr:\n",
        "                    adjusted_starts.append(curr)\n",
        "            else:\n",
        "                adjusted_starts.append(curr)\n",
        "        else:\n",
        "            adjusted_starts.append(curr)\n",
        "\n",
        "    # Transkripcia a tokenové delenie\n",
        "    rows = []\n",
        "    chunk_index = 0\n",
        "\n",
        "    for i in range(len(adjusted_starts) - 1):\n",
        "        start, end = adjusted_starts[i], adjusted_starts[i + 1]\n",
        "        chunk = audio[start:end]\n",
        "        chunk_path = os.path.join(OUTPUT_FOLDER, f\"{filename}_chunk_{chunk_index}.wav\")\n",
        "        chunk.export(chunk_path, format=\"wav\")\n",
        "\n",
        "        transcription = transcribe_function(chunk_path, processor, model)\n",
        "        token_count = len(processor.tokenizer(transcription).input_ids)\n",
        "\n",
        "        if token_count > TOKEN_LIMIT:\n",
        "            sub_chunks = split_text_by_sentence(transcription, TOKEN_LIMIT, processor.tokenizer)\n",
        "\n",
        "            for j, sub_text in enumerate(sub_chunks):\n",
        "                # Vytvorenie navrhu rozdelenia povodneho chunk\n",
        "                new_path = f\"{filename}_chunk_{chunk_index}_p{j+1}.wav\"\n",
        "                rows.append([new_path, round((end - start)/1000, 3), sub_text])\n",
        "        else:\n",
        "            duration_sec = round((end - start) / 1000, 3)\n",
        "            rows.append([os.path.basename(chunk_path), duration_sec, transcription])\n",
        "\n",
        "        chunk_index += 1\n",
        "\n",
        "    return rows\n",
        "\n",
        "def process_all_mp3():\n",
        "\n",
        "    all_rows = []\n",
        "    for file in tqdm(sorted(os.listdir(AUDIO_FOLDER))):\n",
        "        if file.endswith(\".mp3\"):\n",
        "            full_path = os.path.join(AUDIO_FOLDER, file)\n",
        "            rows = split_and_transcribe_with_token_limit(full_path, processor, model)\n",
        "            all_rows.extend(rows)\n",
        "\n",
        "    df = pd.DataFrame(all_rows, columns=[\"path\", \"duration\", \"sentence\"])\n",
        "    df.to_csv(TSV_OUTPUT, sep=\"\\t\", index=False)\n",
        "    print(f\" TSV saved to: {TSV_OUTPUT}\")\n"
      ],
      "metadata": {
        "id": "9ZCdSEiplhyd"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "processor, model = load_whisper_custom_model(MODEL_DIR)"
      ],
      "metadata": {
        "id": "cSCPsy8RljSq"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "pip install git+https://github.com/m-bain/whisperx.git"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "collapsed": true,
        "id": "xNvUOIGyyfF_",
        "outputId": "5f89b7d6-e356-44b5-8d43-f1fcc63bb13c"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Collecting git+https://github.com/m-bain/whisperx.git\n",
            "  Cloning https://github.com/m-bain/whisperx.git to /tmp/pip-req-build-y6876pd_\n",
            "  Running command git clone --filter=blob:none --quiet https://github.com/m-bain/whisperx.git /tmp/pip-req-build-y6876pd_\n",
            "  Resolved https://github.com/m-bain/whisperx.git to commit 0aed8745890f12ecfe0b2d9c4ba62bcdfb16f94e\n",
            "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
            "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
            "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting ctranslate2>=4.5.0 (from whisperx==3.3.1)\n",
            "  Downloading ctranslate2-4.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)\n",
            "Collecting faster-whisper>=1.1.1 (from whisperx==3.3.1)\n",
            "  Downloading faster_whisper-1.1.1-py3-none-any.whl.metadata (16 kB)\n",
            "Requirement already satisfied: nltk>=3.9.1 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (3.9.1)\n",
            "Requirement already satisfied: numpy>=2.0.2 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (2.0.2)\n",
            "Collecting onnxruntime==1.19 (from whisperx==3.3.1)\n",
            "  Downloading onnxruntime-1.19.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.3 kB)\n",
            "Collecting pandas>=2.2.3 (from whisperx==3.3.1)\n",
            "  Downloading pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.9/89.9 kB\u001b[0m \u001b[31m8.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting pyannote-audio>=3.3.2 (from whisperx==3.3.1)\n",
            "  Downloading pyannote.audio-3.3.2-py2.py3-none-any.whl.metadata (11 kB)\n",
            "Requirement already satisfied: torch>=2.5.1 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (2.6.0+cu124)\n",
            "Requirement already satisfied: torchaudio>=2.5.1 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (2.6.0+cu124)\n",
            "Requirement already satisfied: transformers>=4.48.0 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (4.51.3)\n",
            "Collecting coloredlogs (from onnxruntime==1.19->whisperx==3.3.1)\n",
            "  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)\n",
            "Requirement already satisfied: flatbuffers in /usr/local/lib/python3.11/dist-packages (from onnxruntime==1.19->whisperx==3.3.1) (25.2.10)\n",
            "Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from onnxruntime==1.19->whisperx==3.3.1) (24.2)\n",
            "Requirement already satisfied: protobuf in /usr/local/lib/python3.11/dist-packages (from onnxruntime==1.19->whisperx==3.3.1) (5.29.4)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.11/dist-packages (from onnxruntime==1.19->whisperx==3.3.1) (1.13.1)\n",
            "Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from ctranslate2>=4.5.0->whisperx==3.3.1) (75.2.0)\n",
            "Requirement already satisfied: pyyaml<7,>=5.3 in /usr/local/lib/python3.11/dist-packages (from ctranslate2>=4.5.0->whisperx==3.3.1) (6.0.2)\n",
            "Requirement already satisfied: huggingface-hub>=0.13 in /usr/local/lib/python3.11/dist-packages (from faster-whisper>=1.1.1->whisperx==3.3.1) (0.30.2)\n",
            "Requirement already satisfied: tokenizers<1,>=0.13 in /usr/local/lib/python3.11/dist-packages (from faster-whisper>=1.1.1->whisperx==3.3.1) (0.21.1)\n",
            "Collecting av>=11 (from faster-whisper>=1.1.1->whisperx==3.3.1)\n",
            "  Downloading av-14.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.7 kB)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (from faster-whisper>=1.1.1->whisperx==3.3.1) (4.67.1)\n",
            "Requirement already satisfied: click in /usr/local/lib/python3.11/dist-packages (from nltk>=3.9.1->whisperx==3.3.1) (8.1.8)\n",
            "Requirement already satisfied: joblib in /usr/local/lib/python3.11/dist-packages (from nltk>=3.9.1->whisperx==3.3.1) (1.4.2)\n",
            "Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.11/dist-packages (from nltk>=3.9.1->whisperx==3.3.1) (2024.11.6)\n",
            "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.2.3->whisperx==3.3.1) (2.9.0.post0)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.2.3->whisperx==3.3.1) (2025.2)\n",
            "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.2.3->whisperx==3.3.1) (2025.2)\n",
            "Collecting asteroid-filterbanks>=0.4 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading asteroid_filterbanks-0.4.0-py3-none-any.whl.metadata (3.3 kB)\n",
            "Requirement already satisfied: einops>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pyannote-audio>=3.3.2->whisperx==3.3.1) (0.8.1)\n",
            "Collecting lightning>=2.0.1 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading lightning-2.5.1.post0-py3-none-any.whl.metadata (39 kB)\n",
            "Collecting omegaconf<3.0,>=2.1 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)\n",
            "Collecting pyannote.core>=5.0.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading pyannote.core-5.0.0-py3-none-any.whl.metadata (1.4 kB)\n",
            "Collecting pyannote.database>=5.0.1 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading pyannote.database-5.1.3-py3-none-any.whl.metadata (1.1 kB)\n",
            "Collecting pyannote.metrics>=3.2 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading pyannote.metrics-3.2.1-py3-none-any.whl.metadata (1.3 kB)\n",
            "Collecting pyannote.pipeline>=3.0.1 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading pyannote.pipeline-3.0.1-py3-none-any.whl.metadata (897 bytes)\n",
            "Collecting pytorch-metric-learning>=2.1.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading pytorch_metric_learning-2.8.1-py3-none-any.whl.metadata (18 kB)\n",
            "Requirement already satisfied: rich>=12.0.0 in /usr/local/lib/python3.11/dist-packages (from pyannote-audio>=3.3.2->whisperx==3.3.1) (13.9.4)\n",
            "Collecting semver>=3.0.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading semver-3.0.4-py3-none-any.whl.metadata (6.8 kB)\n",
            "Requirement already satisfied: soundfile>=0.12.1 in /usr/local/lib/python3.11/dist-packages (from pyannote-audio>=3.3.2->whisperx==3.3.1) (0.13.1)\n",
            "Collecting speechbrain>=1.0.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading speechbrain-1.0.3-py3-none-any.whl.metadata (24 kB)\n",
            "Collecting tensorboardX>=2.6 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)\n",
            "Collecting torch-audiomentations>=0.11.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading torch_audiomentations-0.12.0-py3-none-any.whl.metadata (15 kB)\n",
            "Collecting torchmetrics>=0.11.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading torchmetrics-1.7.1-py3-none-any.whl.metadata (21 kB)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (3.18.0)\n",
            "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (4.13.2)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (3.4.2)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (3.1.6)\n",
            "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (2025.3.2)\n",
            "Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-curand-cu12==10.3.5.147 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n",
            "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (0.6.2)\n",
            "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (2.21.5)\n",
            "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (12.4.127)\n",
            "Collecting nvidia-nvjitlink-cu12==12.4.127 (from torch>=2.5.1->whisperx==3.3.1)\n",
            "  Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (3.2.0)\n",
            "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy->onnxruntime==1.19->whisperx==3.3.1) (1.3.0)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from transformers>=4.48.0->whisperx==3.3.1) (2.32.3)\n",
            "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.48.0->whisperx==3.3.1) (0.5.3)\n",
            "Collecting lightning-utilities<2.0,>=0.10.0 (from lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading lightning_utilities-0.14.3-py3-none-any.whl.metadata (5.6 kB)\n",
            "Collecting pytorch-lightning (from lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading pytorch_lightning-2.5.1.post0-py3-none-any.whl.metadata (20 kB)\n",
            "Collecting antlr4-python3-runtime==4.9.* (from omegaconf<3.0,>=2.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.0/117.0 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: sortedcontainers>=2.0.4 in /usr/local/lib/python3.11/dist-packages (from pyannote.core>=5.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.4.0)\n",
            "Requirement already satisfied: scipy>=1.1 in /usr/local/lib/python3.11/dist-packages (from pyannote.core>=5.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.15.2)\n",
            "Requirement already satisfied: typer>=0.12.1 in /usr/local/lib/python3.11/dist-packages (from pyannote.database>=5.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.15.2)\n",
            "Requirement already satisfied: scikit-learn>=0.17.1 in /usr/local/lib/python3.11/dist-packages (from pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.6.1)\n",
            "Collecting docopt>=0.6.2 (from pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading docopt-0.6.2.tar.gz (25 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.11/dist-packages (from pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.9.0)\n",
            "Requirement already satisfied: matplotlib>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.10.0)\n",
            "Collecting optuna>=3.1 (from pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading optuna-4.3.0-py3-none-any.whl.metadata (17 kB)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2->pandas>=2.2.3->whisperx==3.3.1) (1.17.0)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich>=12.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.0.0)\n",
            "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich>=12.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.19.1)\n",
            "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.11/dist-packages (from soundfile>=0.12.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.17.1)\n",
            "Collecting hyperpyyaml (from speechbrain>=1.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading HyperPyYAML-1.2.2-py3-none-any.whl.metadata (7.6 kB)\n",
            "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.11/dist-packages (from speechbrain>=1.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.2.0)\n",
            "Collecting julius<0.3,>=0.2.3 (from torch-audiomentations>=0.11.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading julius-0.2.7.tar.gz (59 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.6/59.6 kB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting torch-pitch-shift>=1.2.2 (from torch-audiomentations>=0.11.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading torch_pitch_shift-1.2.5-py3-none-any.whl.metadata (2.5 kB)\n",
            "Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime==1.19->whisperx==3.3.1)\n",
            "  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch>=2.5.1->whisperx==3.3.1) (3.0.2)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.48.0->whisperx==3.3.1) (3.4.1)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.48.0->whisperx==3.3.1) (3.10)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.48.0->whisperx==3.3.1) (2.4.0)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.48.0->whisperx==3.3.1) (2025.1.31)\n",
            "Requirement already satisfied: pycparser in /usr/local/lib/python3.11/dist-packages (from cffi>=1.0->soundfile>=0.12.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.22)\n",
            "Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.11/dist-packages (from fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.11.15)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich>=12.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.1.2)\n",
            "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.3.2)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.12.1)\n",
            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (4.57.0)\n",
            "Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.4.8)\n",
            "Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (11.2.1)\n",
            "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.2.3)\n",
            "Collecting alembic>=1.5.0 (from optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading alembic-1.15.2-py3-none-any.whl.metadata (7.3 kB)\n",
            "Collecting colorlog (from optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)\n",
            "Requirement already satisfied: sqlalchemy>=1.4.2 in /usr/local/lib/python3.11/dist-packages (from optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.0.40)\n",
            "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn>=0.17.1->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.6.0)\n",
            "Collecting primePy>=1.3 (from torch-pitch-shift>=1.2.2->torch-audiomentations>=0.11.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading primePy-1.3-py3-none-any.whl.metadata (4.8 kB)\n",
            "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.11/dist-packages (from typer>=0.12.1->pyannote.database>=5.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.5.4)\n",
            "Collecting ruamel.yaml>=0.17.28 (from hyperpyyaml->speechbrain>=1.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading ruamel.yaml-0.18.10-py3-none-any.whl.metadata (23 kB)\n",
            "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.6.1)\n",
            "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.3.2)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (25.3.0)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.6.0)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (6.4.3)\n",
            "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.3.1)\n",
            "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.20.0)\n",
            "Requirement already satisfied: Mako in /usr/lib/python3/dist-packages (from alembic>=1.5.0->optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.1.3)\n",
            "Collecting ruamel.yaml.clib>=0.2.7 (from ruamel.yaml>=0.17.28->hyperpyyaml->speechbrain>=1.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n",
            "  Downloading ruamel.yaml.clib-0.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.7 kB)\n",
            "Requirement already satisfied: greenlet>=1 in /usr/local/lib/python3.11/dist-packages (from sqlalchemy>=1.4.2->optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.2.1)\n",
            "Downloading onnxruntime-1.19.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (13.2 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.2/13.2 MB\u001b[0m \u001b[31m103.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading ctranslate2-4.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.6 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m38.6/38.6 MB\u001b[0m \u001b[31m16.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading faster_whisper-1.1.1-py3-none-any.whl (1.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m60.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.1/13.1 MB\u001b[0m \u001b[31m126.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.audio-3.3.2-py2.py3-none-any.whl (898 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m898.7/898.7 kB\u001b[0m \u001b[31m58.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl (363.4 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m363.4/363.4 MB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.8/13.8 MB\u001b[0m \u001b[31m70.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m24.6/24.6 MB\u001b[0m \u001b[31m37.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m883.7/883.7 kB\u001b[0m \u001b[31m54.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m664.8/664.8 MB\u001b[0m \u001b[31m2.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl (211.5 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m211.5/211.5 MB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl (56.3 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.3/56.3 MB\u001b[0m \u001b[31m14.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m127.9/127.9 MB\u001b[0m \u001b[31m7.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl (207.5 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.5/207.5 MB\u001b[0m \u001b[31m5.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.1/21.1 MB\u001b[0m \u001b[31m76.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading asteroid_filterbanks-0.4.0-py3-none-any.whl (29 kB)\n",
            "Downloading av-14.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (35.2 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m35.2/35.2 MB\u001b[0m \u001b[31m16.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading lightning-2.5.1.post0-py3-none-any.whl (819 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m819.0/819.0 kB\u001b[0m \u001b[31m49.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading omegaconf-2.3.0-py3-none-any.whl (79 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m7.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.core-5.0.0-py3-none-any.whl (58 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.5/58.5 kB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.database-5.1.3-py3-none-any.whl (48 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.1/48.1 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.metrics-3.2.1-py3-none-any.whl (51 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m51.4/51.4 kB\u001b[0m \u001b[31m5.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.pipeline-3.0.1-py3-none-any.whl (31 kB)\n",
            "Downloading pytorch_metric_learning-2.8.1-py3-none-any.whl (125 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m125.9/125.9 kB\u001b[0m \u001b[31m12.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading semver-3.0.4-py3-none-any.whl (17 kB)\n",
            "Downloading speechbrain-1.0.3-py3-none-any.whl (864 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m864.1/864.1 kB\u001b[0m \u001b[31m51.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m101.7/101.7 kB\u001b[0m \u001b[31m10.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading torch_audiomentations-0.12.0-py3-none-any.whl (48 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.5/48.5 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading torchmetrics-1.7.1-py3-none-any.whl (961 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m961.5/961.5 kB\u001b[0m \u001b[31m55.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.0/46.0 kB\u001b[0m \u001b[31m4.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.8/86.8 kB\u001b[0m \u001b[31m8.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading lightning_utilities-0.14.3-py3-none-any.whl (28 kB)\n",
            "Downloading optuna-4.3.0-py3-none-any.whl (386 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m386.6/386.6 kB\u001b[0m \u001b[31m33.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading torch_pitch_shift-1.2.5-py3-none-any.whl (5.0 kB)\n",
            "Downloading HyperPyYAML-1.2.2-py3-none-any.whl (16 kB)\n",
            "Downloading pytorch_lightning-2.5.1.post0-py3-none-any.whl (823 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m823.1/823.1 kB\u001b[0m \u001b[31m54.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading alembic-1.15.2-py3-none-any.whl (231 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m231.9/231.9 kB\u001b[0m \u001b[31m23.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading primePy-1.3-py3-none-any.whl (4.0 kB)\n",
            "Downloading ruamel.yaml-0.18.10-py3-none-any.whl (117 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.7/117.7 kB\u001b[0m \u001b[31m11.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)\n",
            "Downloading ruamel.yaml.clib-0.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (739 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m739.1/739.1 kB\u001b[0m \u001b[31m52.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hBuilding wheels for collected packages: whisperx, antlr4-python3-runtime, docopt, julius\n",
            "  Building wheel for whisperx (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for whisperx: filename=whisperx-3.3.1-py3-none-any.whl size=16482009 sha256=ce99924c11e21b4f22fc01e88822fc918268d56bd7f675119ccdf611d1f78640\n",
            "  Stored in directory: /tmp/pip-ephem-wheel-cache-l7xwr65x/wheels/a7/c5/cb/f337e8d88ff15af9ece963912a153e4132d00e7cdd61f48416\n",
            "  Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554 sha256=b51000eededb4a85c6a278733c1a114ae6ffe6d4761728c5c5a60d8a5fe3c45e\n",
            "  Stored in directory: /root/.cache/pip/wheels/1a/97/32/461f837398029ad76911109f07047fde1d7b661a147c7c56d1\n",
            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13706 sha256=9da05d25e10ee22a692e794b10e0ddb5a5740525a248a2b6bab8281a4a6f3dfe\n",
            "  Stored in directory: /root/.cache/pip/wheels/1a/b0/8c/4b75c4116c31f83c8f9f047231251e13cc74481cca4a78a9ce\n",
            "  Building wheel for julius (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for julius: filename=julius-0.2.7-py3-none-any.whl size=21870 sha256=e53b0a316db08563d285c0acba335e19c47be9589ec9f5ede7aafdb611fd9436\n",
            "  Stored in directory: /root/.cache/pip/wheels/16/15/d4/edd724cefe78050a6ba3344b8b0c6672db829a799dbb9f81ff\n",
            "Successfully built whisperx antlr4-python3-runtime docopt julius\n",
            "Installing collected packages: primePy, docopt, antlr4-python3-runtime, tensorboardX, semver, ruamel.yaml.clib, omegaconf, nvidia-nvjitlink-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, lightning-utilities, humanfriendly, ctranslate2, colorlog, av, ruamel.yaml, pyannote.core, pandas, nvidia-cusparse-cu12, nvidia-cudnn-cu12, coloredlogs, alembic, optuna, onnxruntime, nvidia-cusolver-cu12, hyperpyyaml, pyannote.database, faster-whisper, torchmetrics, pytorch-metric-learning, pyannote.pipeline, pyannote.metrics, julius, asteroid-filterbanks, torch-pitch-shift, speechbrain, pytorch-lightning, torch-audiomentations, lightning, pyannote-audio, whisperx\n",
            "  Attempting uninstall: nvidia-nvjitlink-cu12\n",
            "    Found existing installation: nvidia-nvjitlink-cu12 12.5.82\n",
            "    Uninstalling nvidia-nvjitlink-cu12-12.5.82:\n",
            "      Successfully uninstalled nvidia-nvjitlink-cu12-12.5.82\n",
            "  Attempting uninstall: nvidia-curand-cu12\n",
            "    Found existing installation: nvidia-curand-cu12 10.3.6.82\n",
            "    Uninstalling nvidia-curand-cu12-10.3.6.82:\n",
            "      Successfully uninstalled nvidia-curand-cu12-10.3.6.82\n",
            "  Attempting uninstall: nvidia-cufft-cu12\n",
            "    Found existing installation: nvidia-cufft-cu12 11.2.3.61\n",
            "    Uninstalling nvidia-cufft-cu12-11.2.3.61:\n",
            "      Successfully uninstalled nvidia-cufft-cu12-11.2.3.61\n",
            "  Attempting uninstall: nvidia-cuda-runtime-cu12\n",
            "    Found existing installation: nvidia-cuda-runtime-cu12 12.5.82\n",
            "    Uninstalling nvidia-cuda-runtime-cu12-12.5.82:\n",
            "      Successfully uninstalled nvidia-cuda-runtime-cu12-12.5.82\n",
            "  Attempting uninstall: nvidia-cuda-nvrtc-cu12\n",
            "    Found existing installation: nvidia-cuda-nvrtc-cu12 12.5.82\n",
            "    Uninstalling nvidia-cuda-nvrtc-cu12-12.5.82:\n",
            "      Successfully uninstalled nvidia-cuda-nvrtc-cu12-12.5.82\n",
            "  Attempting uninstall: nvidia-cuda-cupti-cu12\n",
            "    Found existing installation: nvidia-cuda-cupti-cu12 12.5.82\n",
            "    Uninstalling nvidia-cuda-cupti-cu12-12.5.82:\n",
            "      Successfully uninstalled nvidia-cuda-cupti-cu12-12.5.82\n",
            "  Attempting uninstall: nvidia-cublas-cu12\n",
            "    Found existing installation: nvidia-cublas-cu12 12.5.3.2\n",
            "    Uninstalling nvidia-cublas-cu12-12.5.3.2:\n",
            "      Successfully uninstalled nvidia-cublas-cu12-12.5.3.2\n",
            "  Attempting uninstall: pandas\n",
            "    Found existing installation: pandas 2.2.2\n",
            "    Uninstalling pandas-2.2.2:\n",
            "      Successfully uninstalled pandas-2.2.2\n",
            "  Attempting uninstall: nvidia-cusparse-cu12\n",
            "    Found existing installation: nvidia-cusparse-cu12 12.5.1.3\n",
            "    Uninstalling nvidia-cusparse-cu12-12.5.1.3:\n",
            "      Successfully uninstalled nvidia-cusparse-cu12-12.5.1.3\n",
            "  Attempting uninstall: nvidia-cudnn-cu12\n",
            "    Found existing installation: nvidia-cudnn-cu12 9.3.0.75\n",
            "    Uninstalling nvidia-cudnn-cu12-9.3.0.75:\n",
            "      Successfully uninstalled nvidia-cudnn-cu12-9.3.0.75\n",
            "  Attempting uninstall: nvidia-cusolver-cu12\n",
            "    Found existing installation: nvidia-cusolver-cu12 11.6.3.83\n",
            "    Uninstalling nvidia-cusolver-cu12-11.6.3.83:\n",
            "      Successfully uninstalled nvidia-cusolver-cu12-11.6.3.83\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 2.2.3 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0mSuccessfully installed alembic-1.15.2 antlr4-python3-runtime-4.9.3 asteroid-filterbanks-0.4.0 av-14.3.0 coloredlogs-15.0.1 colorlog-6.9.0 ctranslate2-4.6.0 docopt-0.6.2 faster-whisper-1.1.1 humanfriendly-10.0 hyperpyyaml-1.2.2 julius-0.2.7 lightning-2.5.1.post0 lightning-utilities-0.14.3 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nvjitlink-cu12-12.4.127 omegaconf-2.3.0 onnxruntime-1.19.0 optuna-4.3.0 pandas-2.2.3 primePy-1.3 pyannote-audio-3.3.2 pyannote.core-5.0.0 pyannote.database-5.1.3 pyannote.metrics-3.2.1 pyannote.pipeline-3.0.1 pytorch-lightning-2.5.1.post0 pytorch-metric-learning-2.8.1 ruamel.yaml-0.18.10 ruamel.yaml.clib-0.2.12 semver-3.0.4 speechbrain-1.0.3 tensorboardX-2.6.2.2 torch-audiomentations-0.12.0 torch-pitch-shift-1.2.5 torchmetrics-1.7.1 whisperx-3.3.1\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.colab-display-data+json": {
              "pip_warning": {
                "packages": [
                  "nvidia",
                  "pydevd_plugins"
                ]
              },
              "id": "1fe374a7d4d44a3fa775a6ea8224e42e"
            }
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Spracovanie zvukového záznamu"
      ],
      "metadata": {
        "id": "xYfl52w6k6ZW"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "process_all_mp3()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "XW4oyHCYmSuD",
        "outputId": "0a3c717d-dfce-4206-9300-315a10e3fe6b"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "100%|██████████| 1/1 [33:09<00:00, 1989.32s/it]"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            " TSV saved to: /content/drive/MyDrive/DP_data/adpocia3.tsv\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "\n"
          ]
        }
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}