| | --- |
| | license: mit |
| | sdk: static |
| | colorFrom: blue |
| | colorTo: red |
| | tags: |
| | - youtube |
| | - transcript |
| | - api |
| | - python |
| | - tools |
| | --- |
| | |
| |  |
| |  |
| |  |
| |  |
| |  |
| |
|
| | --- |
| |
|
| | # YouTube Transcript Fetcher |
| |
|
| | A fast, zero-scraping Python command-line tool that pulls transcripts directly |
| | from YouTube videos using the official caption delivery API. |
| |
|
| | No Selenium. No BeautifulSoup. No headless browsers. Just the raw transcript |
| | data returned by YouTube's own caption endpoint — in milliseconds. |
| |
|
| | --- |
| |
|
| | ## How It Works |
| |
|
| | YouTube serves captions through a dedicated timedtext API endpoint. The |
| | `youtube-transcript-api` library calls that endpoint directly, bypassing all |
| | HTML parsing entirely. This makes fetches nearly instant regardless of video length. |
| |
|
| | --- |
| |
|
| | ## System Overview |
| |
|
| | ```mermaid |
| | graph TD |
| | A[User Commands] --> B[main.py CLI Handler] |
| | B --> C[YouTubeTranscriptApi Instance] |
| | C --> D[YouTube timedtext Endpoint] |
| | D -- XML/JSON Data --> C |
| | C -- List of Snippets --> B |
| | B --> E{Output Mode} |
| | E -->|Write to File| F[Exported Transcript] |
| | E -->|Terminal| G[Standard Output] |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Features |
| |
|
| | - Direct API access — no HTML parsing, no browser automation |
| | - Supports full YouTube URLs, short youtu.be links, Shorts URLs, embed URLs, and raw video IDs |
| | - Output formats: plain text, JSON, SRT (SubRip), WebVTT |
| | - Optional timestamp preservation in plain-text output |
| | - Language selection with ordered fallback (e.g. try Japanese, then English) |
| | - Batch processing — fetch transcripts for multiple videos in one command |
| | - Auto-saves to file or directory with correct file extension |
| | - Lists all available transcript languages for any video |
| |
|
| | --- |
| |
|
| | ## Installation |
| |
|
| | ```bash |
| | git clone https://github.com/your-username/youtube-transcript-fetcher.git |
| | cd youtube-transcript-fetcher |
| | python -m venv .venv |
| | source .venv/bin/activate # Windows: .venv\Scripts\activate |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Quick Start |
| |
|
| | ```bash |
| | # Print transcript to terminal |
| | python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" |
| | |
| | # Save as plain text |
| | python main.py dQw4w9WgXcQ -o transcript.txt |
| | |
| | # Save as SRT subtitles |
| | python main.py dQw4w9WgXcQ -f srt -o transcript.srt |
| | |
| | # Save as JSON (includes start time + duration per segment) |
| | python main.py dQw4w9WgXcQ -f json -o transcript.json |
| | |
| | # Include timestamps in plain-text output |
| | python main.py dQw4w9WgXcQ -t |
| | |
| | # Request Spanish transcript, fall back to English if unavailable |
| | python main.py dQw4w9WgXcQ -l es en |
| | |
| | # List every available language for a video |
| | python main.py dQw4w9WgXcQ --list |
| | |
| | # Batch: fetch three videos and save each to ./transcripts/ |
| | python main.py ID1 ID2 ID3 -o ./transcripts/ |
| | ``` |
| |
|
| | --- |
| |
|
| | ## CLI Reference |
| |
|
| | ``` |
| | usage: main.py [-h] [-l LANG [LANG ...]] [-f {text,json,srt,vtt}] |
| | [-t] [-o PATH] [--list] |
| | video [video ...] |
| | |
| | positional arguments: |
| | video YouTube video URL(s) or video ID(s) |
| | |
| | optional arguments: |
| | -h, --help show this help message and exit |
| | -l, --languages Language codes in order of preference (default: en) |
| | -f, --format Output format: text, json, srt, vtt (default: text) |
| | -t, --timestamps Add timestamps to plain-text output |
| | -o, --output Output file (single video) or directory (batch) |
| | --list List all available transcript languages and exit |
| | ``` |
| |
|
| | --- |
| |
|
| | ## JSON Output Structure |
| |
|
| | Each entry in the JSON array contains: |
| |
|
| | ```json |
| | [ |
| | { |
| | "text": "Never gonna give you up", |
| | "start": 43.08, |
| | "duration": 2.16 |
| | } |
| | ] |
| | ``` |
| |
|
| | | Field | Type | Description | |
| | |------------|-------|----------------------------------| |
| | | `text` | str | Caption text for the segment | |
| | | `start` | float | Start time in seconds | |
| | | `duration` | float | Duration of the segment in seconds | |
| |
|
| | --- |
| |
|
| | ## Supported URL Formats |
| |
|
| | ``` |
| | https://www.youtube.com/watch?v=VIDEO_ID |
| | https://youtu.be/VIDEO_ID |
| | https://www.youtube.com/shorts/VIDEO_ID |
| | https://www.youtube.com/embed/VIDEO_ID |
| | VIDEO_ID (raw 11-character ID) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Error Reference |
| |
|
| | | Exception | Meaning | |
| | |------------------------|----------------------------------------------------| |
| | | `TranscriptsDisabled` | The video owner disabled captions | |
| | | `VideoUnavailable` | Video is private, deleted, or region-locked | |
| | | `NoTranscriptFound` | Requested language(s) do not exist for this video | |
| | | `NoTranscriptAvailable`| No captions exist at all for this video | |
| |
|
| | --- |
| |
|
| | ## Dependencies |
| |
|
| | | Package | Version | Purpose | |
| | |--------------------------|---------|---------------------------------------| |
| | | youtube-transcript-api | 1.2.4 | Direct YouTube caption API access | |
| |
|
| | No other dependencies. The standard library handles everything else. |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | MIT License. See `LICENSE` for details. |
| |
|
| | --- |
| |
|
| | ## Citation |
| | If you use this tool in your research or project, please cite it as follows: |
| |
|
| | ```bibtex |
| | @software{albeos2026yttfetcher, |
| | author = {Rembrant Oyangoren Albeos}, |
| | title = {YouTube Transcript Fetcher: High-speed, Zero-scraping Caption Extraction}, |
| | year = {2026}, |
| | publisher = {Hugging Face}, |
| | journal = {Hugging Face Repository}, |
| | howpublished = {\url{https://huggingface.co/algorembrant/youtube-transcript-fetcher}}, |
| | version = {1.2.4} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Disclaimer |
| |
|
| | This tool uses YouTube's publicly accessible caption endpoint for personal, |
| | educational, and research use. Review YouTube's Terms of Service before |
| | using this tool in a production or commercial context. |