algorembrant
/

youtube-transcript-fetcher

Model card Files Files and versions

youtube-transcript-fetcher / STACKS.md

algorembrant's picture

Upload STACKS.md

17df267 verified 1 day ago

|

history blame contribute delete

1.67 kB

	## Description
	The `YouTube Transcript Fetcher` is a high-performance, command-line utility designed to extract transcripts directly from YouTube videos using the official internal caption delivery API. By bypassing traditional HTML scraping or headless browser automation, it achieves near-instant retrieval of caption data. The tool supports multiple output formats (Text, JSON, SRT, WebVTT), handles batch processing, and maintains language priority with automatic fallback.

	## System Overview

	```mermaid
	graph TD
	A[User Commands] --> B[main.py CLI Handler]
	B --> C[YouTubeTranscriptApi Instance]
	C --> D[YouTube timedtext Endpoint]
	D -- XML/JSON Data --> C
	C -- List of Snippets --> B
	B --> E{Output Mode}
	E -->\|Write to File\| F[Exported Transcript]
	E -->\|Terminal\| G[Standard Output]
	```

	## Project Structure

	```text
	youtube-transcript-fetcher/
	├── .gitignore
	├── GUIDE.md
	├── LICENSE
	├── main.py
	├── README.md
	├── STACKS.md
	└── requirements.txt
	```

	## Techstack
	Audit of project files (excluding environment and cache):

	\| File Type \| Count \| Size (KB) \|
	\| :--- \| :--- \| :--- \|
	\| Python (.py) \| 1 \| 9.9 \|
	\| Markdown (.md) \| 2 \| 8.6 \|
	\| Text (.txt) \| 1 \| 0.1 \|
	\| Gitignore (.gitignore) \| 1 \| 0.1 \|
	\| License \| 1 \| 1.1 \|

	Total Files: 6

	## Dependencies
	- Python:
	- `youtube-transcript-api`: Core caption data retrieval and formatting.
	- `argparse`: Command-line interface definition and parsing.
	- `requests`: Underlying HTTP request handling (via the API library).
	- `re`: URL parsing and video ID extraction.