algorembrant
/

youtube-transcript-fetcher

@@ -1,199 +1,214 @@
----
-license: mit
-sdk: static
-colorFrom: blue
-colorTo: red
-tags:
-- youtube
-- transcript
-- api
-- python
-- tools
----
-![Python](https://img.shields.io/badge/Python-3.8%2B-blue?style=flat-square&logo=python&logoColor=white)
-![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)
-![Dependencies](https://img.shields.io/badge/Dependencies-1-orange?style=flat-square)
-![Platform](https://img.shields.io/badge/Platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey?style=flat-square)
-![No Scraping](https://img.shields.io/badge/No%20Scraping-Direct%20API-brightgreen?style=flat-square)
----
-# YouTube Transcript Fetcher
-A fast, zero-scraping Python command-line tool that pulls transcripts directly
-from YouTube videos using the official caption delivery API.
-No Selenium. No BeautifulSoup. No headless browsers. Just the raw transcript
-data returned by YouTube's own caption endpoint — in milliseconds.
----
-## How It Works
-YouTube serves captions through a dedicated timedtext API endpoint. The
-`youtube-transcript-api` library calls that endpoint directly, bypassing all
-HTML parsing entirely. This makes fetches nearly instant regardless of video length.
----
-## Features
-- Direct API access — no HTML parsing, no browser automation
-- Supports full YouTube URLs, short youtu.be links, Shorts URLs, embed URLs, and raw video IDs
-- Output formats: plain text, JSON, SRT (SubRip), WebVTT
-- Optional timestamp preservation in plain-text output
-- Language selection with ordered fallback (e.g. try Japanese, then English)
-- Batch processing — fetch transcripts for multiple videos in one command
-- Auto-saves to file or directory with correct file extension
-- Lists all available transcript languages for any video
----
-## Installation
-```bash
-git clone https://github.com/your-username/youtube-transcript-fetcher.git
-cd youtube-transcript-fetcher
-python -m venv .venv
-source .venv/bin/activate      # Windows: .venv\Scripts\activate
-pip install -r requirements.txt
-```
----
-## Quick Start
-```bash
-# Print transcript to terminal
-python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
-# Save as plain text
-python main.py dQw4w9WgXcQ -o transcript.txt
-# Save as SRT subtitles
-python main.py dQw4w9WgXcQ -f srt -o transcript.srt
-# Save as JSON (includes start time + duration per segment)
-python main.py dQw4w9WgXcQ -f json -o transcript.json
-# Include timestamps in plain-text output
-python main.py dQw4w9WgXcQ -t
-# Request Spanish transcript, fall back to English if unavailable
-python main.py dQw4w9WgXcQ -l es en
-# List every available language for a video
-python main.py dQw4w9WgXcQ --list
-# Batch: fetch three videos and save each to ./transcripts/
-python main.py ID1 ID2 ID3 -o ./transcripts/
-```
----
-## CLI Reference
-```
-usage: main.py [-h] [-l LANG [LANG ...]] [-f {text,json,srt,vtt}]
-               [-t] [-o PATH] [--list]
-               video [video ...]
-positional arguments:
-  video                 YouTube video URL(s) or video ID(s)
-optional arguments:
-  -h, --help            show this help message and exit
-  -l, --languages       Language codes in order of preference (default: en)
-  -f, --format          Output format: text, json, srt, vtt (default: text)
-  -t, --timestamps      Add timestamps to plain-text output
-  -o, --output          Output file (single video) or directory (batch)
-  --list                List all available transcript languages and exit
-```
----
-## JSON Output Structure
-Each entry in the JSON array contains:
-```json
-[
-  {
-    "text": "Never gonna give you up",
-    "start": 43.08,
-    "duration": 2.16
-  }
-]
-```
-| Field      | Type  | Description                      |
-|------------|-------|----------------------------------|
-| `text`     | str   | Caption text for the segment     |
-| `start`    | float | Start time in seconds            |
-| `duration` | float | Duration of the segment in seconds |
----
-## Supported URL Formats
-```
-https://www.youtube.com/watch?v=VIDEO_ID
-https://youtu.be/VIDEO_ID
-https://www.youtube.com/shorts/VIDEO_ID
-https://www.youtube.com/embed/VIDEO_ID
-VIDEO_ID  (raw 11-character ID)
-```
----
-## Error Reference
-| Exception              | Meaning                                            |
-|------------------------|----------------------------------------------------|
-| `TranscriptsDisabled`  | The video owner disabled captions                  |
-| `VideoUnavailable`     | Video is private, deleted, or region-locked        |
-| `NoTranscriptFound`    | Requested language(s) do not exist for this video  |
-| `NoTranscriptAvailable`| No captions exist at all for this video            |
----
-## Dependencies
-| Package                  | Version | Purpose                               |
-|--------------------------|---------|---------------------------------------|
-| youtube-transcript-api   | 1.2.4   | Direct YouTube caption API access     |
-No other dependencies. The standard library handles everything else.
----
-## License
-MIT License. See `LICENSE` for details.
----
-## Citation
-If you use this tool in your research or project, please cite it as follows:
-```bibtex
-@software{albeos2026yttfetcher,
-  author = {Rembrant Oyangoren Albeos},
-  title = {YouTube Transcript Fetcher: High-speed, Zero-scraping Caption Extraction},
-  year = {2026},
-  publisher = {Hugging Face},
-  journal = {Hugging Face Repository},
-  howpublished = {\url{https://huggingface.co/algorembrant/youtube-transcript-fetcher}},
-  version = {1.2.4}
-}
-```
----
-## Disclaimer
-This tool uses YouTube's publicly accessible caption endpoint for personal,
-educational, and research use. Review YouTube's Terms of Service before
 using this tool in a production or commercial context.

+---
+license: mit
+sdk: static
+colorFrom: blue
+colorTo: red
+tags:
+- youtube
+- transcript
+- api
+- python
+- tools
+---
+![Python](https://img.shields.io/badge/Python-3.8%2B-blue?style=flat-square&logo=python&logoColor=white)
+![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)
+![Dependencies](https://img.shields.io/badge/Dependencies-1-orange?style=flat-square)
+![Platform](https://img.shields.io/badge/Platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey?style=flat-square)
+![No Scraping](https://img.shields.io/badge/No%20Scraping-Direct%20API-brightgreen?style=flat-square)
+---
+# YouTube Transcript Fetcher
+A fast, zero-scraping Python command-line tool that pulls transcripts directly
+from YouTube videos using the official caption delivery API.
+No Selenium. No BeautifulSoup. No headless browsers. Just the raw transcript
+data returned by YouTube's own caption endpoint — in milliseconds.
+---
+## How It Works
+YouTube serves captions through a dedicated timedtext API endpoint. The
+`youtube-transcript-api` library calls that endpoint directly, bypassing all
+HTML parsing entirely. This makes fetches nearly instant regardless of video length.
+---
+## System Overview
+```mermaid
+graph TD
+    A[User Commands] --> B[main.py CLI Handler]
+    B --> C[YouTubeTranscriptApi Instance]
+    C --> D[YouTube timedtext Endpoint]
+    D -- XML/JSON Data --> C
+    C -- List of Snippets --> B
+    B --> E{Output Mode}
+    E -->|Write to File| F[Exported Transcript]
+    E -->|Terminal| G[Standard Output]
+```
+---
+## Features
+- Direct API access — no HTML parsing, no browser automation
+- Supports full YouTube URLs, short youtu.be links, Shorts URLs, embed URLs, and raw video IDs
+- Output formats: plain text, JSON, SRT (SubRip), WebVTT
+- Optional timestamp preservation in plain-text output
+- Language selection with ordered fallback (e.g. try Japanese, then English)
+- Batch processing — fetch transcripts for multiple videos in one command
+- Auto-saves to file or directory with correct file extension
+- Lists all available transcript languages for any video
+---
+## Installation
+```bash
+git clone https://github.com/your-username/youtube-transcript-fetcher.git
+cd youtube-transcript-fetcher
+python -m venv .venv
+source .venv/bin/activate      # Windows: .venv\Scripts\activate
+pip install -r requirements.txt
+```
+---
+## Quick Start
+```bash
+# Print transcript to terminal
+python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
+# Save as plain text
+python main.py dQw4w9WgXcQ -o transcript.txt
+# Save as SRT subtitles
+python main.py dQw4w9WgXcQ -f srt -o transcript.srt
+# Save as JSON (includes start time + duration per segment)
+python main.py dQw4w9WgXcQ -f json -o transcript.json
+# Include timestamps in plain-text output
+python main.py dQw4w9WgXcQ -t
+# Request Spanish transcript, fall back to English if unavailable
+python main.py dQw4w9WgXcQ -l es en
+# List every available language for a video
+python main.py dQw4w9WgXcQ --list
+# Batch: fetch three videos and save each to ./transcripts/
+python main.py ID1 ID2 ID3 -o ./transcripts/
+```
+---
+## CLI Reference
+```
+usage: main.py [-h] [-l LANG [LANG ...]] [-f {text,json,srt,vtt}]
+               [-t] [-o PATH] [--list]
+               video [video ...]
+positional arguments:
+  video                 YouTube video URL(s) or video ID(s)
+optional arguments:
+  -h, --help            show this help message and exit
+  -l, --languages       Language codes in order of preference (default: en)
+  -f, --format          Output format: text, json, srt, vtt (default: text)
+  -t, --timestamps      Add timestamps to plain-text output
+  -o, --output          Output file (single video) or directory (batch)
+  --list                List all available transcript languages and exit
+```
+---
+## JSON Output Structure
+Each entry in the JSON array contains:
+```json
+[
+  {
+    "text": "Never gonna give you up",
+    "start": 43.08,
+    "duration": 2.16
+  }
+]
+```
+| Field      | Type  | Description                      |
+|------------|-------|----------------------------------|
+| `text`     | str   | Caption text for the segment     |
+| `start`    | float | Start time in seconds            |
+| `duration` | float | Duration of the segment in seconds |
+---
+## Supported URL Formats
+```
+https://www.youtube.com/watch?v=VIDEO_ID
+https://youtu.be/VIDEO_ID
+https://www.youtube.com/shorts/VIDEO_ID
+https://www.youtube.com/embed/VIDEO_ID
+VIDEO_ID  (raw 11-character ID)
+```
+---
+## Error Reference
+| Exception              | Meaning                                            |
+|------------------------|----------------------------------------------------|
+| `TranscriptsDisabled`  | The video owner disabled captions                  |
+| `VideoUnavailable`     | Video is private, deleted, or region-locked        |
+| `NoTranscriptFound`    | Requested language(s) do not exist for this video  |
+| `NoTranscriptAvailable`| No captions exist at all for this video            |
+---
+## Dependencies
+| Package                  | Version | Purpose                               |
+|--------------------------|---------|---------------------------------------|
+| youtube-transcript-api   | 1.2.4   | Direct YouTube caption API access     |
+No other dependencies. The standard library handles everything else.
+---
+## License
+MIT License. See `LICENSE` for details.
+---
+## Citation
+If you use this tool in your research or project, please cite it as follows:
+```bibtex
+@software{albeos2026yttfetcher,
+  author = {Rembrant Oyangoren Albeos},
+  title = {YouTube Transcript Fetcher: High-speed, Zero-scraping Caption Extraction},
+  year = {2026},
+  publisher = {Hugging Face},
+  journal = {Hugging Face Repository},
+  howpublished = {\url{https://huggingface.co/algorembrant/youtube-transcript-fetcher}},
+  version = {1.2.4}
+}
+```
+---
+## Disclaimer
+This tool uses YouTube's publicly accessible caption endpoint for personal,
+educational, and research use. Review YouTube's Terms of Service before
 using this tool in a production or commercial context.