algorembrant
/

youtube-transcript-fetcher

Model card Files Files and versions

algorembrant commited on Mar 4

Commit

17df267

·

verified ·

1 Parent(s): c9f8d95

Upload STACKS.md

Files changed (1) hide show

STACKS.md +51 -0

STACKS.md ADDED Viewed

	@@ -0,0 +1,51 @@

+## Description
+The `YouTube Transcript Fetcher` is a high-performance, command-line utility designed to extract transcripts directly from YouTube videos using the official internal caption delivery API. By bypassing traditional HTML scraping or headless browser automation, it achieves near-instant retrieval of caption data. The tool supports multiple output formats (Text, JSON, SRT, WebVTT), handles batch processing, and maintains language priority with automatic fallback.
+## System Overview
+```mermaid
+graph TD
+    A[User Commands] --> B[main.py CLI Handler]
+    B --> C[YouTubeTranscriptApi Instance]
+    C --> D[YouTube timedtext Endpoint]
+    D -- XML/JSON Data --> C
+    C -- List of Snippets --> B
+    B --> E{Output Mode}
+    E -->|Write to File| F[Exported Transcript]
+    E -->|Terminal| G[Standard Output]
+```
+## Project Structure
+```text
+youtube-transcript-fetcher/
+├── .gitignore
+├── GUIDE.md
+├── LICENSE
+├── main.py
+├── README.md
+├── STACKS.md
+└── requirements.txt
+```
+## Techstack
+Audit of project files (excluding environment and cache):
+| File Type | Count | Size (KB) |
+| :--- | :--- | :--- |
+| Python (.py) | 1 | 9.9 |
+| Markdown (.md) | 2 | 8.6 |
+| Text (.txt) | 1 | 0.1 |
+| Gitignore (.gitignore) | 1 | 0.1 |
+| License | 1 | 1.1 |
+**Total Files**: 6
+## Dependencies
+- **Python**:
+  - `youtube-transcript-api`: Core caption data retrieval and formatting.
+  - `argparse`: Command-line interface definition and parsing.
+  - `requests`: Underlying HTTP request handling (via the API library).
+  - `re`: URL parsing and video ID extraction.