Upload STACKS.md
Browse files
STACKS.md
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
## Description
|
| 2 |
+
The `YouTube Transcript Fetcher` is a high-performance, command-line utility designed to extract transcripts directly from YouTube videos using the official internal caption delivery API. By bypassing traditional HTML scraping or headless browser automation, it achieves near-instant retrieval of caption data. The tool supports multiple output formats (Text, JSON, SRT, WebVTT), handles batch processing, and maintains language priority with automatic fallback.
|
| 3 |
+
|
| 4 |
+
## System Overview
|
| 5 |
+
|
| 6 |
+
```mermaid
|
| 7 |
+
graph TD
|
| 8 |
+
A[User Commands] --> B[main.py CLI Handler]
|
| 9 |
+
B --> C[YouTubeTranscriptApi Instance]
|
| 10 |
+
C --> D[YouTube timedtext Endpoint]
|
| 11 |
+
D -- XML/JSON Data --> C
|
| 12 |
+
C -- List of Snippets --> B
|
| 13 |
+
B --> E{Output Mode}
|
| 14 |
+
E -->|Write to File| F[Exported Transcript]
|
| 15 |
+
E -->|Terminal| G[Standard Output]
|
| 16 |
+
```
|
| 17 |
+
|
| 18 |
+
## Project Structure
|
| 19 |
+
|
| 20 |
+
```text
|
| 21 |
+
youtube-transcript-fetcher/
|
| 22 |
+
βββ .gitignore
|
| 23 |
+
βββ GUIDE.md
|
| 24 |
+
βββ LICENSE
|
| 25 |
+
βββ main.py
|
| 26 |
+
βββ README.md
|
| 27 |
+
βββ STACKS.md
|
| 28 |
+
βββ requirements.txt
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
## Techstack
|
| 32 |
+
Audit of project files (excluding environment and cache):
|
| 33 |
+
|
| 34 |
+
| File Type | Count | Size (KB) |
|
| 35 |
+
| :--- | :--- | :--- |
|
| 36 |
+
| Python (.py) | 1 | 9.9 |
|
| 37 |
+
| Markdown (.md) | 2 | 8.6 |
|
| 38 |
+
| Text (.txt) | 1 | 0.1 |
|
| 39 |
+
| Gitignore (.gitignore) | 1 | 0.1 |
|
| 40 |
+
| License | 1 | 1.1 |
|
| 41 |
+
|
| 42 |
+
**Total Files**: 6
|
| 43 |
+
|
| 44 |
+
## Dependencies
|
| 45 |
+
- **Python**:
|
| 46 |
+
- `youtube-transcript-api`: Core caption data retrieval and formatting.
|
| 47 |
+
- `argparse`: Command-line interface definition and parsing.
|
| 48 |
+
- `requests`: Underlying HTTP request handling (via the API library).
|
| 49 |
+
- `re`: URL parsing and video ID extraction.
|
| 50 |
+
|
| 51 |
+
|