{ "cells": [ { "cell_type": "markdown", "source": [ "## Predspracovanie dát\n", "\n", "Pomocné funkcie využité pre vytvorenie trénovacieho/evaluačného datasetu v tomto notebooku, predspracovávali dáta kombináciou delenia audia na menšie časti na základe detekovania ticha v nahrávkach, resampling dát na 16kHz formát požadovaný modelom, automatickú trasnkripciu ladeným modelom s prvotných tréningov, a navrhnutie rozdelenia nahrávok podľa počtu tokenov na ukončené vety.\n", "\n", "### Výstupný formát\n", "- TSV súbor obsahujúci:\n", " - cestu k zvukovému záznamu\n", " - dĺžku záznamu v sekundách\n", " - transkripciu záznamu\n", "\n", "\n", "### Využité knižnice\n", " - os\n", " - torch\n", " - torchaudio - Na načitanie a resampling zvukových záznamov\n", " - pydub (AudioSegment, silence)\t- Na spracovanie audia, detekciu ticha, exportovanie častí\n", " - transformers -\tNa načítanie procesora, modelu, a tokenizáciu pre transkripciu\n", " - re - Regulárne výrazy na rozdelenie textu podľa viet.\n", " - pandas - export dát do TSV\n", " - tqdm - monitorovanie priebehu pri spracovaní tokenov\n", "\n", "\n" ], "metadata": { "id": "dB-uebOX2p8j" } }, { "cell_type": "markdown", "source": [ "## Popis funckii\n", "\n", "### load_whisper_custom_model\n", "- **Parametre :** cesta k modelu\n", "- načitanie vlastného modelu pomocou transformers\n", "- použitie WhisperProcessor a WhisperForConditionalGeneration\n", "- prenesenie modelu na GPU\n", "\n", "### transcribe_function\n", "- **Parametre :** cesta k súboru so zvukovým záznamom, processor, model\n", "- Načítanie audia pomocou torchaudio.load.\n", "- Ak sampling rate audia ≠ 16000 Hz (cieľ), resampluje sa pomocou torchaudio.transforms.Resample.\n", "- Audio sa premení na vstupné features cez processor.feature_extractor.\n", "- Forced decoding nastavuje jazyk (\"sk\" = slovenčina) a úlohu (\"transcribe\") cez get_decoder_prompt_ids.\n", "- Model generuje text cez beam search (num_beams=5, skoré ukončenie).\n", "- Výsledok sa dekóduje cez tokenizer a vráti čistý text.\n", "\n", "### split_text_by_sentence\n", "- **Parametre :** text, maximálny počet tokenov, tokenizer\n", "- používa regulárne výrazy na delenie textu po vetách (bodka, otáznik, výkričník)\n", "- ak vetu už nemožno pridať do chunku bez prekročenia limitu tokenov (max_tokens), začne nový chunk\n", "- okenizácia sa robí pomocou processor.tokenizer(sentence).input_ids\n", "\n", "\n", "### split_and_transcribe_with_token_limit\n", "- **Parametre :** cesta k súboru so zvukovým záznamom, processor, model\n", "- načítanie audia cez pydub (AudioSegment.from_mp3).\n", "- detekcia ticha:\n", " - pomocou silence.detect_silence(audio, min_silence_len=MIN_SILENCE_LEN, silence_thresh=SILENCE_THRESH)\n", " - detekované tiché body (v strede medzi začiatkom a koncom ticha) sa berú ako potenciálne body delenia\n", "- fallback rozdelenie pre dlhé časti (> MAX_CHUNK_MS):\n", "- ak chunk je príliš dlhý, skúsi sa ešte jemnejšie delenie s mäkšími kritériami (ALT_MIN_SILENCE_LEN).\n", "- export chunkov do .wav formátu.\n", "- transkripcia chunkov\n", "- v prípade, že text presahuje token limit, text sa rozdelí na viac častí (iba textové rozdelenie).\n", "\n", "\n", "### process_all_mp3\n", "- spracovanie všetkých súborov vo formáte mp3 v definovanom priečinku\n", "- volá funkcie na spracovanie dát\n", "- výstup TSV súbor vo formáte : **path**, **duration**, **sentence**" ], "metadata": { "id": "MV7Dh_Qw-nWV" } }, { "cell_type": "markdown", "source": [ "## Import knižníc" ], "metadata": { "id": "Ybedar-N-957" } }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "kDNwj9t1qm8w", "outputId": "12aca0b1-41c4-4eeb-9dc7-f13a057bde20", "collapsed": true }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Collecting pydub\n", " Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)\n", "Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", "Installing collected packages: pydub\n", "Successfully installed pydub-0.25.1\n" ] } ], "source": [ "!pip install pydub" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "D2MGL2U4q8Iu" }, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import nltk\n", "from pydub import AudioSegment, silence\n", "from nltk.tokenize import sent_tokenize\n", "import re\n", "import gc\n", "import torch\n", "import torchaudio\n", "import pandas as pd\n", "import soundfile as sf\n", "from tqdm import tqdm\n", "from pydub import AudioSegment, silence\n", "from transformers import WhisperProcessor, WhisperForConditionalGeneration" ] }, { "cell_type": "markdown", "source": [ "## Import vlastného modelu implemenácia funkcií\n" ], "metadata": { "id": "ynqsRY58V1ts" } }, { "cell_type": "code", "source": [ "from google.colab import drive\n", "\n", "drive.mount('/content/drive')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wiQ3EoJwV6c9", "outputId": "6218351e-5d3b-47c8-a6dc-e0bec12abf1a" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" ] } ] }, { "cell_type": "code", "source": [ "# === CONFIG ===\n", "\n", "AUDIO_FOLDER = \"/content/drive/MyDrive/DP_data/audio_files\"\n", "OUTPUT_FOLDER = \"/content/drive/MyDrive/DP_data/audio_chunks\"\n", "TSV_OUTPUT = \"/content/drive/MyDrive/DP_data/adpocia3.tsv\"\n", "MODEL_DIR = \"/content/drive/MyDrive/DP_data/whisper_medium_3d\"\n", "\n", "TARGET_SAMPLE_RATE = 16000\n", "TOKEN_LIMIT = 100\n", "MIN_SILENCE_LEN = 400 # Minimum silence length v ms\n", "SILENCE_THRESH = -40 # Silence threshold v dB\n", "\n", "MAX_CHUNK_MS = 28000\n", "ALT_MIN_SILENCE_LEN = 200 # mäkšie kritérium pre fallback\n", "\n", "os.makedirs(OUTPUT_FOLDER, exist_ok=True)\n", "\n", "# ==== NACITANIE MODELU ====\n", "def load_whisper_custom_model(model_dir):\n", " processor = WhisperProcessor.from_pretrained(model_dir)\n", " model = WhisperForConditionalGeneration.from_pretrained(model_dir)\n", " model.eval()\n", " model.to(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", " return processor, model\n", "\n", "# ==== TRANSKRIPCIA ====\n", "def transcribe_function(path_file, processor, model):\n", " speech_array, sampling_rate = torchaudio.load(path_file)\n", " if sampling_rate != TARGET_SAMPLE_RATE:\n", " resampler = torchaudio.transforms.Resample(orig_freq=sampling_rate, new_freq=TARGET_SAMPLE_RATE)\n", " speech_array = resampler(speech_array)\n", "\n", " inputs = processor.feature_extractor(\n", " speech_array.squeeze().numpy(),\n", " sampling_rate=TARGET_SAMPLE_RATE,\n", " return_tensors=\"pt\"\n", " )\n", "\n", " forced_decoder_ids = processor.get_decoder_prompt_ids(language=\"sk\", task=\"transcribe\")\n", " input_features = inputs[\"input_features\"].to(model.device)\n", "\n", " with torch.no_grad():\n", " generated_ids = model.generate(\n", " input_features,\n", " forced_decoder_ids=forced_decoder_ids,\n", " num_beams=5,\n", " early_stopping=True\n", " )\n", "\n", " transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]\n", " return transcription.strip()\n", "\n", "# ==== DELENIE TEXTU PODLA VIET ====\n", "def split_text_by_sentence(text, max_tokens, tokenizer):\n", " sentences = re.split(r'(?<=[.!?])\\s+', text.strip())\n", " chunks, current_chunk, current_tokens = [], [], 0\n", "\n", " for sentence in sentences:\n", " tokens = tokenizer(sentence).input_ids\n", " if current_tokens + len(tokens) > max_tokens and current_chunk:\n", " chunks.append(\" \".join(current_chunk))\n", " current_chunk = [sentence]\n", " current_tokens = len(tokens)\n", " else:\n", " current_chunk.append(sentence)\n", " current_tokens += len(tokens)\n", "\n", " if current_chunk:\n", " chunks.append(\" \".join(current_chunk))\n", "\n", " return chunks\n", "\n", "def split_and_transcribe_with_token_limit(mp3_path, processor, model):\n", " audio = AudioSegment.from_mp3(mp3_path)\n", " filename = os.path.splitext(os.path.basename(mp3_path))[0]\n", "\n", " silence_points = silence.detect_silence(audio, min_silence_len=MIN_SILENCE_LEN, silence_thresh=SILENCE_THRESH)\n", " silence_points = [((start + end) // 2) for start, end in silence_points]\n", "\n", " chunk_starts = [0]\n", " for point in silence_points:\n", " if point - chunk_starts[-1] >= 20000:\n", " chunk_starts.append(point)\n", " chunk_starts.append(len(audio))\n", "\n", " adjusted_starts = [chunk_starts[0]]\n", " for i in range(1, len(chunk_starts)):\n", " prev = adjusted_starts[-1]\n", " curr = chunk_starts[i]\n", " duration = curr - prev\n", "\n", " if duration > MAX_CHUNK_MS:\n", " long_chunk = audio[prev:curr]\n", " extra_silences = silence.detect_silence(\n", " long_chunk, min_silence_len=ALT_MIN_SILENCE_LEN, silence_thresh=SILENCE_THRESH\n", " )\n", " extra_points = [((start + end) // 2 + prev) for start, end in extra_silences if (start + end)//2 + prev < curr]\n", "\n", " if extra_points:\n", " for p in extra_points:\n", " if p - adjusted_starts[-1] >= 8000:\n", " adjusted_starts.append(p)\n", " if adjusted_starts[-1] != curr:\n", " adjusted_starts.append(curr)\n", " else:\n", " adjusted_starts.append(curr)\n", " else:\n", " adjusted_starts.append(curr)\n", "\n", " # Transkripcia a tokenové delenie\n", " rows = []\n", " chunk_index = 0\n", "\n", " for i in range(len(adjusted_starts) - 1):\n", " start, end = adjusted_starts[i], adjusted_starts[i + 1]\n", " chunk = audio[start:end]\n", " chunk_path = os.path.join(OUTPUT_FOLDER, f\"{filename}_chunk_{chunk_index}.wav\")\n", " chunk.export(chunk_path, format=\"wav\")\n", "\n", " transcription = transcribe_function(chunk_path, processor, model)\n", " token_count = len(processor.tokenizer(transcription).input_ids)\n", "\n", " if token_count > TOKEN_LIMIT:\n", " sub_chunks = split_text_by_sentence(transcription, TOKEN_LIMIT, processor.tokenizer)\n", "\n", " for j, sub_text in enumerate(sub_chunks):\n", " # Vytvorenie navrhu rozdelenia povodneho chunk\n", " new_path = f\"{filename}_chunk_{chunk_index}_p{j+1}.wav\"\n", " rows.append([new_path, round((end - start)/1000, 3), sub_text])\n", " else:\n", " duration_sec = round((end - start) / 1000, 3)\n", " rows.append([os.path.basename(chunk_path), duration_sec, transcription])\n", "\n", " chunk_index += 1\n", "\n", " return rows\n", "\n", "def process_all_mp3():\n", "\n", " all_rows = []\n", " for file in tqdm(sorted(os.listdir(AUDIO_FOLDER))):\n", " if file.endswith(\".mp3\"):\n", " full_path = os.path.join(AUDIO_FOLDER, file)\n", " rows = split_and_transcribe_with_token_limit(full_path, processor, model)\n", " all_rows.extend(rows)\n", "\n", " df = pd.DataFrame(all_rows, columns=[\"path\", \"duration\", \"sentence\"])\n", " df.to_csv(TSV_OUTPUT, sep=\"\\t\", index=False)\n", " print(f\" TSV saved to: {TSV_OUTPUT}\")\n" ], "metadata": { "id": "9ZCdSEiplhyd" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "processor, model = load_whisper_custom_model(MODEL_DIR)" ], "metadata": { "id": "cSCPsy8RljSq" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "pip install git+https://github.com/m-bain/whisperx.git" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "collapsed": true, "id": "xNvUOIGyyfF_", "outputId": "5f89b7d6-e356-44b5-8d43-f1fcc63bb13c" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Collecting git+https://github.com/m-bain/whisperx.git\n", " Cloning https://github.com/m-bain/whisperx.git to /tmp/pip-req-build-y6876pd_\n", " Running command git clone --filter=blob:none --quiet https://github.com/m-bain/whisperx.git /tmp/pip-req-build-y6876pd_\n", " Resolved https://github.com/m-bain/whisperx.git to commit 0aed8745890f12ecfe0b2d9c4ba62bcdfb16f94e\n", " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", "Collecting ctranslate2>=4.5.0 (from whisperx==3.3.1)\n", " Downloading ctranslate2-4.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)\n", "Collecting faster-whisper>=1.1.1 (from whisperx==3.3.1)\n", " Downloading faster_whisper-1.1.1-py3-none-any.whl.metadata (16 kB)\n", "Requirement already satisfied: nltk>=3.9.1 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (3.9.1)\n", "Requirement already satisfied: numpy>=2.0.2 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (2.0.2)\n", "Collecting onnxruntime==1.19 (from whisperx==3.3.1)\n", " Downloading onnxruntime-1.19.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.3 kB)\n", "Collecting pandas>=2.2.3 (from whisperx==3.3.1)\n", " Downloading pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.9/89.9 kB\u001b[0m \u001b[31m8.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting pyannote-audio>=3.3.2 (from whisperx==3.3.1)\n", " Downloading pyannote.audio-3.3.2-py2.py3-none-any.whl.metadata (11 kB)\n", "Requirement already satisfied: torch>=2.5.1 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (2.6.0+cu124)\n", "Requirement already satisfied: torchaudio>=2.5.1 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (2.6.0+cu124)\n", "Requirement already satisfied: transformers>=4.48.0 in /usr/local/lib/python3.11/dist-packages (from whisperx==3.3.1) (4.51.3)\n", "Collecting coloredlogs (from onnxruntime==1.19->whisperx==3.3.1)\n", " Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)\n", "Requirement already satisfied: flatbuffers in /usr/local/lib/python3.11/dist-packages (from onnxruntime==1.19->whisperx==3.3.1) (25.2.10)\n", "Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from onnxruntime==1.19->whisperx==3.3.1) (24.2)\n", "Requirement already satisfied: protobuf in /usr/local/lib/python3.11/dist-packages (from onnxruntime==1.19->whisperx==3.3.1) (5.29.4)\n", "Requirement already satisfied: sympy in /usr/local/lib/python3.11/dist-packages (from onnxruntime==1.19->whisperx==3.3.1) (1.13.1)\n", "Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from ctranslate2>=4.5.0->whisperx==3.3.1) (75.2.0)\n", "Requirement already satisfied: pyyaml<7,>=5.3 in /usr/local/lib/python3.11/dist-packages (from ctranslate2>=4.5.0->whisperx==3.3.1) (6.0.2)\n", "Requirement already satisfied: huggingface-hub>=0.13 in /usr/local/lib/python3.11/dist-packages (from faster-whisper>=1.1.1->whisperx==3.3.1) (0.30.2)\n", "Requirement already satisfied: tokenizers<1,>=0.13 in /usr/local/lib/python3.11/dist-packages (from faster-whisper>=1.1.1->whisperx==3.3.1) (0.21.1)\n", "Collecting av>=11 (from faster-whisper>=1.1.1->whisperx==3.3.1)\n", " Downloading av-14.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.7 kB)\n", "Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (from faster-whisper>=1.1.1->whisperx==3.3.1) (4.67.1)\n", "Requirement already satisfied: click in /usr/local/lib/python3.11/dist-packages (from nltk>=3.9.1->whisperx==3.3.1) (8.1.8)\n", "Requirement already satisfied: joblib in /usr/local/lib/python3.11/dist-packages (from nltk>=3.9.1->whisperx==3.3.1) (1.4.2)\n", "Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.11/dist-packages (from nltk>=3.9.1->whisperx==3.3.1) (2024.11.6)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.2.3->whisperx==3.3.1) (2.9.0.post0)\n", "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.2.3->whisperx==3.3.1) (2025.2)\n", "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.2.3->whisperx==3.3.1) (2025.2)\n", "Collecting asteroid-filterbanks>=0.4 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading asteroid_filterbanks-0.4.0-py3-none-any.whl.metadata (3.3 kB)\n", "Requirement already satisfied: einops>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pyannote-audio>=3.3.2->whisperx==3.3.1) (0.8.1)\n", "Collecting lightning>=2.0.1 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading lightning-2.5.1.post0-py3-none-any.whl.metadata (39 kB)\n", "Collecting omegaconf<3.0,>=2.1 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)\n", "Collecting pyannote.core>=5.0.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading pyannote.core-5.0.0-py3-none-any.whl.metadata (1.4 kB)\n", "Collecting pyannote.database>=5.0.1 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading pyannote.database-5.1.3-py3-none-any.whl.metadata (1.1 kB)\n", "Collecting pyannote.metrics>=3.2 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading pyannote.metrics-3.2.1-py3-none-any.whl.metadata (1.3 kB)\n", "Collecting pyannote.pipeline>=3.0.1 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading pyannote.pipeline-3.0.1-py3-none-any.whl.metadata (897 bytes)\n", "Collecting pytorch-metric-learning>=2.1.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading pytorch_metric_learning-2.8.1-py3-none-any.whl.metadata (18 kB)\n", "Requirement already satisfied: rich>=12.0.0 in /usr/local/lib/python3.11/dist-packages (from pyannote-audio>=3.3.2->whisperx==3.3.1) (13.9.4)\n", "Collecting semver>=3.0.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading semver-3.0.4-py3-none-any.whl.metadata (6.8 kB)\n", "Requirement already satisfied: soundfile>=0.12.1 in /usr/local/lib/python3.11/dist-packages (from pyannote-audio>=3.3.2->whisperx==3.3.1) (0.13.1)\n", "Collecting speechbrain>=1.0.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading speechbrain-1.0.3-py3-none-any.whl.metadata (24 kB)\n", "Collecting tensorboardX>=2.6 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)\n", "Collecting torch-audiomentations>=0.11.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading torch_audiomentations-0.12.0-py3-none-any.whl.metadata (15 kB)\n", "Collecting torchmetrics>=0.11.0 (from pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading torchmetrics-1.7.1-py3-none-any.whl.metadata (21 kB)\n", "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (3.18.0)\n", "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (4.13.2)\n", "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (3.4.2)\n", "Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (3.1.6)\n", "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (2025.3.2)\n", "Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", "Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", "Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n", "Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n", "Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", "Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", "Collecting nvidia-curand-cu12==10.3.5.147 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", "Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n", "Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n", "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (0.6.2)\n", "Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (2.21.5)\n", "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (12.4.127)\n", "Collecting nvidia-nvjitlink-cu12==12.4.127 (from torch>=2.5.1->whisperx==3.3.1)\n", " Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n", "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.5.1->whisperx==3.3.1) (3.2.0)\n", "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy->onnxruntime==1.19->whisperx==3.3.1) (1.3.0)\n", "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from transformers>=4.48.0->whisperx==3.3.1) (2.32.3)\n", "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.48.0->whisperx==3.3.1) (0.5.3)\n", "Collecting lightning-utilities<2.0,>=0.10.0 (from lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading lightning_utilities-0.14.3-py3-none-any.whl.metadata (5.6 kB)\n", "Collecting pytorch-lightning (from lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading pytorch_lightning-2.5.1.post0-py3-none-any.whl.metadata (20 kB)\n", "Collecting antlr4-python3-runtime==4.9.* (from omegaconf<3.0,>=2.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.0/117.0 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", "Requirement already satisfied: sortedcontainers>=2.0.4 in /usr/local/lib/python3.11/dist-packages (from pyannote.core>=5.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.4.0)\n", "Requirement already satisfied: scipy>=1.1 in /usr/local/lib/python3.11/dist-packages (from pyannote.core>=5.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.15.2)\n", "Requirement already satisfied: typer>=0.12.1 in /usr/local/lib/python3.11/dist-packages (from pyannote.database>=5.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.15.2)\n", "Requirement already satisfied: scikit-learn>=0.17.1 in /usr/local/lib/python3.11/dist-packages (from pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.6.1)\n", "Collecting docopt>=0.6.2 (from pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading docopt-0.6.2.tar.gz (25 kB)\n", " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", "Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.11/dist-packages (from pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.9.0)\n", "Requirement already satisfied: matplotlib>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.10.0)\n", "Collecting optuna>=3.1 (from pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading optuna-4.3.0-py3-none-any.whl.metadata (17 kB)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2->pandas>=2.2.3->whisperx==3.3.1) (1.17.0)\n", "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich>=12.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.0.0)\n", "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich>=12.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.19.1)\n", "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.11/dist-packages (from soundfile>=0.12.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.17.1)\n", "Collecting hyperpyyaml (from speechbrain>=1.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading HyperPyYAML-1.2.2-py3-none-any.whl.metadata (7.6 kB)\n", "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.11/dist-packages (from speechbrain>=1.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.2.0)\n", "Collecting julius<0.3,>=0.2.3 (from torch-audiomentations>=0.11.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading julius-0.2.7.tar.gz (59 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.6/59.6 kB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", "Collecting torch-pitch-shift>=1.2.2 (from torch-audiomentations>=0.11.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading torch_pitch_shift-1.2.5-py3-none-any.whl.metadata (2.5 kB)\n", "Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime==1.19->whisperx==3.3.1)\n", " Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch>=2.5.1->whisperx==3.3.1) (3.0.2)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.48.0->whisperx==3.3.1) (3.4.1)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.48.0->whisperx==3.3.1) (3.10)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.48.0->whisperx==3.3.1) (2.4.0)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.48.0->whisperx==3.3.1) (2025.1.31)\n", "Requirement already satisfied: pycparser in /usr/local/lib/python3.11/dist-packages (from cffi>=1.0->soundfile>=0.12.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.22)\n", "Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.11/dist-packages (from fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.11.15)\n", "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich>=12.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.1.2)\n", "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.3.2)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.12.1)\n", "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (4.57.0)\n", "Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.4.8)\n", "Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (11.2.1)\n", "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.2.3)\n", "Collecting alembic>=1.5.0 (from optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading alembic-1.15.2-py3-none-any.whl.metadata (7.3 kB)\n", "Collecting colorlog (from optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)\n", "Requirement already satisfied: sqlalchemy>=1.4.2 in /usr/local/lib/python3.11/dist-packages (from optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.0.40)\n", "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn>=0.17.1->pyannote.metrics>=3.2->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.6.0)\n", "Collecting primePy>=1.3 (from torch-pitch-shift>=1.2.2->torch-audiomentations>=0.11.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading primePy-1.3-py3-none-any.whl.metadata (4.8 kB)\n", "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.11/dist-packages (from typer>=0.12.1->pyannote.database>=5.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.5.4)\n", "Collecting ruamel.yaml>=0.17.28 (from hyperpyyaml->speechbrain>=1.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading ruamel.yaml-0.18.10-py3-none-any.whl.metadata (23 kB)\n", "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (2.6.1)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.3.2)\n", "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (25.3.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.6.0)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (6.4.3)\n", "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (0.3.1)\n", "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.20.0)\n", "Requirement already satisfied: Mako in /usr/lib/python3/dist-packages (from alembic>=1.5.0->optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (1.1.3)\n", "Collecting ruamel.yaml.clib>=0.2.7 (from ruamel.yaml>=0.17.28->hyperpyyaml->speechbrain>=1.0.0->pyannote-audio>=3.3.2->whisperx==3.3.1)\n", " Downloading ruamel.yaml.clib-0.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.7 kB)\n", "Requirement already satisfied: greenlet>=1 in /usr/local/lib/python3.11/dist-packages (from sqlalchemy>=1.4.2->optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote-audio>=3.3.2->whisperx==3.3.1) (3.2.1)\n", "Downloading onnxruntime-1.19.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (13.2 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.2/13.2 MB\u001b[0m \u001b[31m103.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading ctranslate2-4.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.6 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m38.6/38.6 MB\u001b[0m \u001b[31m16.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading faster_whisper-1.1.1-py3-none-any.whl (1.1 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m60.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.1/13.1 MB\u001b[0m \u001b[31m126.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading pyannote.audio-3.3.2-py2.py3-none-any.whl (898 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m898.7/898.7 kB\u001b[0m \u001b[31m58.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl (363.4 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m363.4/363.4 MB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.8/13.8 MB\u001b[0m \u001b[31m70.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m24.6/24.6 MB\u001b[0m \u001b[31m37.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m883.7/883.7 kB\u001b[0m \u001b[31m54.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m664.8/664.8 MB\u001b[0m \u001b[31m2.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl (211.5 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m211.5/211.5 MB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl (56.3 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.3/56.3 MB\u001b[0m \u001b[31m14.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m127.9/127.9 MB\u001b[0m \u001b[31m7.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl (207.5 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.5/207.5 MB\u001b[0m \u001b[31m5.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.1/21.1 MB\u001b[0m \u001b[31m76.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading asteroid_filterbanks-0.4.0-py3-none-any.whl (29 kB)\n", "Downloading av-14.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (35.2 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m35.2/35.2 MB\u001b[0m \u001b[31m16.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading lightning-2.5.1.post0-py3-none-any.whl (819 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m819.0/819.0 kB\u001b[0m \u001b[31m49.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading omegaconf-2.3.0-py3-none-any.whl (79 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m7.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading pyannote.core-5.0.0-py3-none-any.whl (58 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.5/58.5 kB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading pyannote.database-5.1.3-py3-none-any.whl (48 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.1/48.1 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading pyannote.metrics-3.2.1-py3-none-any.whl (51 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m51.4/51.4 kB\u001b[0m \u001b[31m5.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading pyannote.pipeline-3.0.1-py3-none-any.whl (31 kB)\n", "Downloading pytorch_metric_learning-2.8.1-py3-none-any.whl (125 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m125.9/125.9 kB\u001b[0m \u001b[31m12.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading semver-3.0.4-py3-none-any.whl (17 kB)\n", "Downloading speechbrain-1.0.3-py3-none-any.whl (864 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m864.1/864.1 kB\u001b[0m \u001b[31m51.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m101.7/101.7 kB\u001b[0m \u001b[31m10.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading torch_audiomentations-0.12.0-py3-none-any.whl (48 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.5/48.5 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading torchmetrics-1.7.1-py3-none-any.whl (961 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m961.5/961.5 kB\u001b[0m \u001b[31m55.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.0/46.0 kB\u001b[0m \u001b[31m4.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.8/86.8 kB\u001b[0m \u001b[31m8.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading lightning_utilities-0.14.3-py3-none-any.whl (28 kB)\n", "Downloading optuna-4.3.0-py3-none-any.whl (386 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m386.6/386.6 kB\u001b[0m \u001b[31m33.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading torch_pitch_shift-1.2.5-py3-none-any.whl (5.0 kB)\n", "Downloading HyperPyYAML-1.2.2-py3-none-any.whl (16 kB)\n", "Downloading pytorch_lightning-2.5.1.post0-py3-none-any.whl (823 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m823.1/823.1 kB\u001b[0m \u001b[31m54.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading alembic-1.15.2-py3-none-any.whl (231 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m231.9/231.9 kB\u001b[0m \u001b[31m23.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading primePy-1.3-py3-none-any.whl (4.0 kB)\n", "Downloading ruamel.yaml-0.18.10-py3-none-any.whl (117 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.7/117.7 kB\u001b[0m \u001b[31m11.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)\n", "Downloading ruamel.yaml.clib-0.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (739 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m739.1/739.1 kB\u001b[0m \u001b[31m52.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hBuilding wheels for collected packages: whisperx, antlr4-python3-runtime, docopt, julius\n", " Building wheel for whisperx (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for whisperx: filename=whisperx-3.3.1-py3-none-any.whl size=16482009 sha256=ce99924c11e21b4f22fc01e88822fc918268d56bd7f675119ccdf611d1f78640\n", " Stored in directory: /tmp/pip-ephem-wheel-cache-l7xwr65x/wheels/a7/c5/cb/f337e8d88ff15af9ece963912a153e4132d00e7cdd61f48416\n", " Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554 sha256=b51000eededb4a85c6a278733c1a114ae6ffe6d4761728c5c5a60d8a5fe3c45e\n", " Stored in directory: /root/.cache/pip/wheels/1a/97/32/461f837398029ad76911109f07047fde1d7b661a147c7c56d1\n", " Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13706 sha256=9da05d25e10ee22a692e794b10e0ddb5a5740525a248a2b6bab8281a4a6f3dfe\n", " Stored in directory: /root/.cache/pip/wheels/1a/b0/8c/4b75c4116c31f83c8f9f047231251e13cc74481cca4a78a9ce\n", " Building wheel for julius (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for julius: filename=julius-0.2.7-py3-none-any.whl size=21870 sha256=e53b0a316db08563d285c0acba335e19c47be9589ec9f5ede7aafdb611fd9436\n", " Stored in directory: /root/.cache/pip/wheels/16/15/d4/edd724cefe78050a6ba3344b8b0c6672db829a799dbb9f81ff\n", "Successfully built whisperx antlr4-python3-runtime docopt julius\n", "Installing collected packages: primePy, docopt, antlr4-python3-runtime, tensorboardX, semver, ruamel.yaml.clib, omegaconf, nvidia-nvjitlink-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, lightning-utilities, humanfriendly, ctranslate2, colorlog, av, ruamel.yaml, pyannote.core, pandas, nvidia-cusparse-cu12, nvidia-cudnn-cu12, coloredlogs, alembic, optuna, onnxruntime, nvidia-cusolver-cu12, hyperpyyaml, pyannote.database, faster-whisper, torchmetrics, pytorch-metric-learning, pyannote.pipeline, pyannote.metrics, julius, asteroid-filterbanks, torch-pitch-shift, speechbrain, pytorch-lightning, torch-audiomentations, lightning, pyannote-audio, whisperx\n", " Attempting uninstall: nvidia-nvjitlink-cu12\n", " Found existing installation: nvidia-nvjitlink-cu12 12.5.82\n", " Uninstalling nvidia-nvjitlink-cu12-12.5.82:\n", " Successfully uninstalled nvidia-nvjitlink-cu12-12.5.82\n", " Attempting uninstall: nvidia-curand-cu12\n", " Found existing installation: nvidia-curand-cu12 10.3.6.82\n", " Uninstalling nvidia-curand-cu12-10.3.6.82:\n", " Successfully uninstalled nvidia-curand-cu12-10.3.6.82\n", " Attempting uninstall: nvidia-cufft-cu12\n", " Found existing installation: nvidia-cufft-cu12 11.2.3.61\n", " Uninstalling nvidia-cufft-cu12-11.2.3.61:\n", " Successfully uninstalled nvidia-cufft-cu12-11.2.3.61\n", " Attempting uninstall: nvidia-cuda-runtime-cu12\n", " Found existing installation: nvidia-cuda-runtime-cu12 12.5.82\n", " Uninstalling nvidia-cuda-runtime-cu12-12.5.82:\n", " Successfully uninstalled nvidia-cuda-runtime-cu12-12.5.82\n", " Attempting uninstall: nvidia-cuda-nvrtc-cu12\n", " Found existing installation: nvidia-cuda-nvrtc-cu12 12.5.82\n", " Uninstalling nvidia-cuda-nvrtc-cu12-12.5.82:\n", " Successfully uninstalled nvidia-cuda-nvrtc-cu12-12.5.82\n", " Attempting uninstall: nvidia-cuda-cupti-cu12\n", " Found existing installation: nvidia-cuda-cupti-cu12 12.5.82\n", " Uninstalling nvidia-cuda-cupti-cu12-12.5.82:\n", " Successfully uninstalled nvidia-cuda-cupti-cu12-12.5.82\n", " Attempting uninstall: nvidia-cublas-cu12\n", " Found existing installation: nvidia-cublas-cu12 12.5.3.2\n", " Uninstalling nvidia-cublas-cu12-12.5.3.2:\n", " Successfully uninstalled nvidia-cublas-cu12-12.5.3.2\n", " Attempting uninstall: pandas\n", " Found existing installation: pandas 2.2.2\n", " Uninstalling pandas-2.2.2:\n", " Successfully uninstalled pandas-2.2.2\n", " Attempting uninstall: nvidia-cusparse-cu12\n", " Found existing installation: nvidia-cusparse-cu12 12.5.1.3\n", " Uninstalling nvidia-cusparse-cu12-12.5.1.3:\n", " Successfully uninstalled nvidia-cusparse-cu12-12.5.1.3\n", " Attempting uninstall: nvidia-cudnn-cu12\n", " Found existing installation: nvidia-cudnn-cu12 9.3.0.75\n", " Uninstalling nvidia-cudnn-cu12-9.3.0.75:\n", " Successfully uninstalled nvidia-cudnn-cu12-9.3.0.75\n", " Attempting uninstall: nvidia-cusolver-cu12\n", " Found existing installation: nvidia-cusolver-cu12 11.6.3.83\n", " Uninstalling nvidia-cusolver-cu12-11.6.3.83:\n", " Successfully uninstalled nvidia-cusolver-cu12-11.6.3.83\n", "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", "google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 2.2.3 which is incompatible.\u001b[0m\u001b[31m\n", "\u001b[0mSuccessfully installed alembic-1.15.2 antlr4-python3-runtime-4.9.3 asteroid-filterbanks-0.4.0 av-14.3.0 coloredlogs-15.0.1 colorlog-6.9.0 ctranslate2-4.6.0 docopt-0.6.2 faster-whisper-1.1.1 humanfriendly-10.0 hyperpyyaml-1.2.2 julius-0.2.7 lightning-2.5.1.post0 lightning-utilities-0.14.3 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nvjitlink-cu12-12.4.127 omegaconf-2.3.0 onnxruntime-1.19.0 optuna-4.3.0 pandas-2.2.3 primePy-1.3 pyannote-audio-3.3.2 pyannote.core-5.0.0 pyannote.database-5.1.3 pyannote.metrics-3.2.1 pyannote.pipeline-3.0.1 pytorch-lightning-2.5.1.post0 pytorch-metric-learning-2.8.1 ruamel.yaml-0.18.10 ruamel.yaml.clib-0.2.12 semver-3.0.4 speechbrain-1.0.3 tensorboardX-2.6.2.2 torch-audiomentations-0.12.0 torch-pitch-shift-1.2.5 torchmetrics-1.7.1 whisperx-3.3.1\n" ] }, { "output_type": "display_data", "data": { "application/vnd.colab-display-data+json": { "pip_warning": { "packages": [ "nvidia", "pydevd_plugins" ] }, "id": "1fe374a7d4d44a3fa775a6ea8224e42e" } }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## Spracovanie zvukového záznamu" ], "metadata": { "id": "xYfl52w6k6ZW" } }, { "cell_type": "code", "source": [ "process_all_mp3()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XW4oyHCYmSuD", "outputId": "0a3c717d-dfce-4206-9300-315a10e3fe6b" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "100%|██████████| 1/1 [33:09<00:00, 1989.32s/it]" ] }, { "output_type": "stream", "name": "stdout", "text": [ " TSV saved to: /content/drive/MyDrive/DP_data/adpocia3.tsv\n" ] }, { "output_type": "stream", "name": "stderr", "text": [ "\n" ] } ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }