algorembrant's picture
Update README.md
c9f8d95 verified
metadata
license: mit
sdk: static
colorFrom: blue
colorTo: red
tags:
  - youtube
  - transcript
  - api
  - python
  - tools

Python License Dependencies Platform No Scraping


YouTube Transcript Fetcher

A fast, zero-scraping Python command-line tool that pulls transcripts directly from YouTube videos using the official caption delivery API.

No Selenium. No BeautifulSoup. No headless browsers. Just the raw transcript data returned by YouTube's own caption endpoint — in milliseconds.


How It Works

YouTube serves captions through a dedicated timedtext API endpoint. The youtube-transcript-api library calls that endpoint directly, bypassing all HTML parsing entirely. This makes fetches nearly instant regardless of video length.


System Overview

graph TD
    A[User Commands] --> B[main.py CLI Handler]
    B --> C[YouTubeTranscriptApi Instance]
    C --> D[YouTube timedtext Endpoint]
    D -- XML/JSON Data --> C
    C -- List of Snippets --> B
    B --> E{Output Mode}
    E -->|Write to File| F[Exported Transcript]
    E -->|Terminal| G[Standard Output]

Features

  • Direct API access — no HTML parsing, no browser automation
  • Supports full YouTube URLs, short youtu.be links, Shorts URLs, embed URLs, and raw video IDs
  • Output formats: plain text, JSON, SRT (SubRip), WebVTT
  • Optional timestamp preservation in plain-text output
  • Language selection with ordered fallback (e.g. try Japanese, then English)
  • Batch processing — fetch transcripts for multiple videos in one command
  • Auto-saves to file or directory with correct file extension
  • Lists all available transcript languages for any video

Installation

git clone https://github.com/your-username/youtube-transcript-fetcher.git
cd youtube-transcript-fetcher
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Quick Start

# Print transcript to terminal
python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Save as plain text
python main.py dQw4w9WgXcQ -o transcript.txt

# Save as SRT subtitles
python main.py dQw4w9WgXcQ -f srt -o transcript.srt

# Save as JSON (includes start time + duration per segment)
python main.py dQw4w9WgXcQ -f json -o transcript.json

# Include timestamps in plain-text output
python main.py dQw4w9WgXcQ -t

# Request Spanish transcript, fall back to English if unavailable
python main.py dQw4w9WgXcQ -l es en

# List every available language for a video
python main.py dQw4w9WgXcQ --list

# Batch: fetch three videos and save each to ./transcripts/
python main.py ID1 ID2 ID3 -o ./transcripts/

CLI Reference

usage: main.py [-h] [-l LANG [LANG ...]] [-f {text,json,srt,vtt}]
               [-t] [-o PATH] [--list]
               video [video ...]

positional arguments:
  video                 YouTube video URL(s) or video ID(s)

optional arguments:
  -h, --help            show this help message and exit
  -l, --languages       Language codes in order of preference (default: en)
  -f, --format          Output format: text, json, srt, vtt (default: text)
  -t, --timestamps      Add timestamps to plain-text output
  -o, --output          Output file (single video) or directory (batch)
  --list                List all available transcript languages and exit

JSON Output Structure

Each entry in the JSON array contains:

[
  {
    "text": "Never gonna give you up",
    "start": 43.08,
    "duration": 2.16
  }
]
Field Type Description
text str Caption text for the segment
start float Start time in seconds
duration float Duration of the segment in seconds

Supported URL Formats

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://www.youtube.com/shorts/VIDEO_ID
https://www.youtube.com/embed/VIDEO_ID
VIDEO_ID  (raw 11-character ID)

Error Reference

Exception Meaning
TranscriptsDisabled The video owner disabled captions
VideoUnavailable Video is private, deleted, or region-locked
NoTranscriptFound Requested language(s) do not exist for this video
NoTranscriptAvailable No captions exist at all for this video

Dependencies

Package Version Purpose
youtube-transcript-api 1.2.4 Direct YouTube caption API access

No other dependencies. The standard library handles everything else.


License

MIT License. See LICENSE for details.


Citation

If you use this tool in your research or project, please cite it as follows:

@software{albeos2026yttfetcher,
  author = {Rembrant Oyangoren Albeos},
  title = {YouTube Transcript Fetcher: High-speed, Zero-scraping Caption Extraction},
  year = {2026},
  publisher = {Hugging Face},
  journal = {Hugging Face Repository},
  howpublished = {\url{https://huggingface.co/algorembrant/youtube-transcript-fetcher}},
  version = {1.2.4}
}

Disclaimer

This tool uses YouTube's publicly accessible caption endpoint for personal, educational, and research use. Review YouTube's Terms of Service before using this tool in a production or commercial context.