--- license: mit sdk: static colorFrom: blue colorTo: red tags: - youtube - transcript - api - python - tools --- ![Python](https://img.shields.io/badge/Python-3.8%2B-blue?style=flat-square&logo=python&logoColor=white) ![License](https://img.shields.io/badge/License-MIT-green?style=flat-square) ![Dependencies](https://img.shields.io/badge/Dependencies-1-orange?style=flat-square) ![Platform](https://img.shields.io/badge/Platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey?style=flat-square) ![No Scraping](https://img.shields.io/badge/No%20Scraping-Direct%20API-brightgreen?style=flat-square) --- # YouTube Transcript Fetcher A fast, zero-scraping Python command-line tool that pulls transcripts directly from YouTube videos using the official caption delivery API. No Selenium. No BeautifulSoup. No headless browsers. Just the raw transcript data returned by YouTube's own caption endpoint — in milliseconds. --- ## How It Works YouTube serves captions through a dedicated timedtext API endpoint. The `youtube-transcript-api` library calls that endpoint directly, bypassing all HTML parsing entirely. This makes fetches nearly instant regardless of video length. --- ## System Overview ```mermaid graph TD A[User Commands] --> B[main.py CLI Handler] B --> C[YouTubeTranscriptApi Instance] C --> D[YouTube timedtext Endpoint] D -- XML/JSON Data --> C C -- List of Snippets --> B B --> E{Output Mode} E -->|Write to File| F[Exported Transcript] E -->|Terminal| G[Standard Output] ``` --- ## Features - Direct API access — no HTML parsing, no browser automation - Supports full YouTube URLs, short youtu.be links, Shorts URLs, embed URLs, and raw video IDs - Output formats: plain text, JSON, SRT (SubRip), WebVTT - Optional timestamp preservation in plain-text output - Language selection with ordered fallback (e.g. try Japanese, then English) - Batch processing — fetch transcripts for multiple videos in one command - Auto-saves to file or directory with correct file extension - Lists all available transcript languages for any video --- ## Installation ```bash git clone https://github.com/your-username/youtube-transcript-fetcher.git cd youtube-transcript-fetcher python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -r requirements.txt ``` --- ## Quick Start ```bash # Print transcript to terminal python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" # Save as plain text python main.py dQw4w9WgXcQ -o transcript.txt # Save as SRT subtitles python main.py dQw4w9WgXcQ -f srt -o transcript.srt # Save as JSON (includes start time + duration per segment) python main.py dQw4w9WgXcQ -f json -o transcript.json # Include timestamps in plain-text output python main.py dQw4w9WgXcQ -t # Request Spanish transcript, fall back to English if unavailable python main.py dQw4w9WgXcQ -l es en # List every available language for a video python main.py dQw4w9WgXcQ --list # Batch: fetch three videos and save each to ./transcripts/ python main.py ID1 ID2 ID3 -o ./transcripts/ ``` --- ## CLI Reference ``` usage: main.py [-h] [-l LANG [LANG ...]] [-f {text,json,srt,vtt}] [-t] [-o PATH] [--list] video [video ...] positional arguments: video YouTube video URL(s) or video ID(s) optional arguments: -h, --help show this help message and exit -l, --languages Language codes in order of preference (default: en) -f, --format Output format: text, json, srt, vtt (default: text) -t, --timestamps Add timestamps to plain-text output -o, --output Output file (single video) or directory (batch) --list List all available transcript languages and exit ``` --- ## JSON Output Structure Each entry in the JSON array contains: ```json [ { "text": "Never gonna give you up", "start": 43.08, "duration": 2.16 } ] ``` | Field | Type | Description | |------------|-------|----------------------------------| | `text` | str | Caption text for the segment | | `start` | float | Start time in seconds | | `duration` | float | Duration of the segment in seconds | --- ## Supported URL Formats ``` https://www.youtube.com/watch?v=VIDEO_ID https://youtu.be/VIDEO_ID https://www.youtube.com/shorts/VIDEO_ID https://www.youtube.com/embed/VIDEO_ID VIDEO_ID (raw 11-character ID) ``` --- ## Error Reference | Exception | Meaning | |------------------------|----------------------------------------------------| | `TranscriptsDisabled` | The video owner disabled captions | | `VideoUnavailable` | Video is private, deleted, or region-locked | | `NoTranscriptFound` | Requested language(s) do not exist for this video | | `NoTranscriptAvailable`| No captions exist at all for this video | --- ## Dependencies | Package | Version | Purpose | |--------------------------|---------|---------------------------------------| | youtube-transcript-api | 1.2.4 | Direct YouTube caption API access | No other dependencies. The standard library handles everything else. --- ## License MIT License. See `LICENSE` for details. --- ## Citation If you use this tool in your research or project, please cite it as follows: ```bibtex @software{albeos2026yttfetcher, author = {Rembrant Oyangoren Albeos}, title = {YouTube Transcript Fetcher: High-speed, Zero-scraping Caption Extraction}, year = {2026}, publisher = {Hugging Face}, journal = {Hugging Face Repository}, howpublished = {\url{https://huggingface.co/algorembrant/youtube-transcript-fetcher}}, version = {1.2.4} } ``` --- ## Disclaimer This tool uses YouTube's publicly accessible caption endpoint for personal, educational, and research use. Review YouTube's Terms of Service before using this tool in a production or commercial context.