Description
The YouTube Transcript Fetcher is a high-performance, command-line utility designed to extract transcripts directly from YouTube videos using the official internal caption delivery API. By bypassing traditional HTML scraping or headless browser automation, it achieves near-instant retrieval of caption data. The tool supports multiple output formats (Text, JSON, SRT, WebVTT), handles batch processing, and maintains language priority with automatic fallback.
System Overview
graph TD
A[User Commands] --> B[main.py CLI Handler]
B --> C[YouTubeTranscriptApi Instance]
C --> D[YouTube timedtext Endpoint]
D -- XML/JSON Data --> C
C -- List of Snippets --> B
B --> E{Output Mode}
E -->|Write to File| F[Exported Transcript]
E -->|Terminal| G[Standard Output]
Project Structure
youtube-transcript-fetcher/
βββ .gitignore
βββ GUIDE.md
βββ LICENSE
βββ main.py
βββ README.md
βββ STACKS.md
βββ requirements.txt
Techstack
Audit of project files (excluding environment and cache):
| File Type | Count | Size (KB) |
|---|---|---|
| Python (.py) | 1 | 9.9 |
| Markdown (.md) | 2 | 8.6 |
| Text (.txt) | 1 | 0.1 |
| Gitignore (.gitignore) | 1 | 0.1 |
| License | 1 | 1.1 |
Total Files: 6
Dependencies
- Python:
youtube-transcript-api: Core caption data retrieval and formatting.argparse: Command-line interface definition and parsing.requests: Underlying HTTP request handling (via the API library).re: URL parsing and video ID extraction.