{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "colab": {
   "provenance": [],
   "gpuType": "T4"
  },
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3"
  },
  "language_info": {
   "name": "python"
  },
  "accelerator": "GPU"
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Rade-ASR-CTC-3B-fa — Persian Speech-to-Text\n",
    "\n",
    "Run Meta's **Omnilingual ASR CTC-3B**, fine-tuned on **Persian** by [Rade AI](https://huggingface.co/RadeAI).\n",
    "\n",
    "**Steps:** set a **GPU** runtime (`Runtime ▸ Change runtime type ▸ T4 GPU`), run **Cell 1**, then **`Runtime ▸ Restart session`**, then run the rest. Audio clips must be **< 40 s**.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cell 1 — install (then RESTART the session)\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "!apt-get -qq install -y libsndfile1\n",
    "# need omnilingual-asr 0.2.0 (it registers the 3b_v2 architecture this model uses).\n",
    "# --ignore-requires-python: 0.2.0's metadata caps python at '<=3.12', which pip reads as\n",
    "# <=3.12.0 and wrongly rejects Colab's 3.12.x — the flag installs it anyway (it works on 3.12).\n",
    "!pip install -q --ignore-requires-python omnilingual-asr==0.2.0 huggingface_hub\n",
    "# fairseq2 needs the CUDA 12.8 torch build; pin all three or you hit libcudart/torchvision errors\n",
    "!pip install -q torch==2.8.0 torchaudio==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128\n",
    "print('installed — now click Runtime ▸ Restart session, then run the cells below')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ⚠️ Now do `Runtime ▸ Restart session`, then continue ↓\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cell 2 — check GPU\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "import torch\n",
    "print('torch', torch.__version__, '| CUDA', torch.cuda.is_available(),\n",
    "      torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'NO GPU')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cell 3 — download the fine-tuned weights (single fp16 file, ~6.2 GB)\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "from huggingface_hub import hf_hub_download\n",
    "# single consolidated fp16 file — half the download of the fp32 shards, identical output\n",
    "ckpt = hf_hub_download('RadeAI/Rade-ASR-CTC-3B-fa', 'model_fp16.pt')\n",
    "print('weights at', ckpt)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cell 4 — register the model with fairseq2\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "import pathlib\n",
    "ad = pathlib.Path.home()/'.config/fairseq2/assets'; ad.mkdir(parents=True, exist_ok=True)\n",
    "(ad/'rade.yaml').write_text(f'''name: rade_CTC_3B_fa\n",
    "model_family: wav2vec2_asr\n",
    "model_arch: 3b_v2\n",
    "checkpoint: \"{ckpt}\"\n",
    "tokenizer_ref: omniASR_tokenizer_written_v2\n",
    "''')\n",
    "print('asset card ready')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cell 5 — get a Persian clip (< 40 s)\n",
    "\n",
    "By default this grabs the sample clip shipped in the repo. To use **your own** audio, run the commented `files.upload()` lines instead.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# default: use the sample clip from the repo\n",
    "audio_path = hf_hub_download('RadeAI/Rade-ASR-CTC-3B-fa', 'sample_fa.wav')\n",
    "\n",
    "# --- or upload your own (uncomment) ---\n",
    "# from google.colab import files\n",
    "# up = files.upload(); audio_path = list(up.keys())[0]\n",
    "print('using', audio_path)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cell 6 — transcribe 🎙️ → 📝\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline\n",
    "pipe = ASRInferencePipeline(model_card='rade_CTC_3B_fa',\n",
    "                            device='cuda' if torch.cuda.is_available() else 'cpu',\n",
    "                            dtype=torch.float16)   # ~199x real time, 6.4 GB VRAM\n",
    "text = pipe.transcribe([audio_path], lang=['pes_Arab'], batch_size=1)\n",
    "print('📝', text[0])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "Made by [Rade AI](https://huggingface.co/RadeAI) · base: [facebook/omniASR-CTC-3B](https://huggingface.co/facebook/omniASR-CTC-3B) · Apache-2.0\n"
   ]
  }
 ]
}