algorembrant's picture
Upload STACKS.md
17df267 verified

Description

The YouTube Transcript Fetcher is a high-performance, command-line utility designed to extract transcripts directly from YouTube videos using the official internal caption delivery API. By bypassing traditional HTML scraping or headless browser automation, it achieves near-instant retrieval of caption data. The tool supports multiple output formats (Text, JSON, SRT, WebVTT), handles batch processing, and maintains language priority with automatic fallback.

System Overview

graph TD
    A[User Commands] --> B[main.py CLI Handler]
    B --> C[YouTubeTranscriptApi Instance]
    C --> D[YouTube timedtext Endpoint]
    D -- XML/JSON Data --> C
    C -- List of Snippets --> B
    B --> E{Output Mode}
    E -->|Write to File| F[Exported Transcript]
    E -->|Terminal| G[Standard Output]

Project Structure

youtube-transcript-fetcher/
β”œβ”€β”€ .gitignore
β”œβ”€β”€ GUIDE.md
β”œβ”€β”€ LICENSE
β”œβ”€β”€ main.py
β”œβ”€β”€ README.md
β”œβ”€β”€ STACKS.md
└── requirements.txt

Techstack

Audit of project files (excluding environment and cache):

File Type Count Size (KB)
Python (.py) 1 9.9
Markdown (.md) 2 8.6
Text (.txt) 1 0.1
Gitignore (.gitignore) 1 0.1
License 1 1.1

Total Files: 6

Dependencies

  • Python:
    • youtube-transcript-api: Core caption data retrieval and formatting.
    • argparse: Command-line interface definition and parsing.
    • requests: Underlying HTTP request handling (via the API library).
    • re: URL parsing and video ID extraction.