File size: 6,054 Bytes
c9f8d95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
297ee7b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
---
license: mit
sdk: static
colorFrom: blue
colorTo: red
tags:
- youtube
- transcript
- api
- python
- tools
---

![Python](https://img.shields.io/badge/Python-3.8%2B-blue?style=flat-square&logo=python&logoColor=white)
![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)
![Dependencies](https://img.shields.io/badge/Dependencies-1-orange?style=flat-square)
![Platform](https://img.shields.io/badge/Platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey?style=flat-square)
![No Scraping](https://img.shields.io/badge/No%20Scraping-Direct%20API-brightgreen?style=flat-square)

---

# YouTube Transcript Fetcher

A fast, zero-scraping Python command-line tool that pulls transcripts directly
from YouTube videos using the official caption delivery API.

No Selenium. No BeautifulSoup. No headless browsers. Just the raw transcript
data returned by YouTube's own caption endpoint — in milliseconds.

---

## How It Works

YouTube serves captions through a dedicated timedtext API endpoint. The
`youtube-transcript-api` library calls that endpoint directly, bypassing all
HTML parsing entirely. This makes fetches nearly instant regardless of video length.

---

## System Overview

```mermaid
graph TD
    A[User Commands] --> B[main.py CLI Handler]
    B --> C[YouTubeTranscriptApi Instance]
    C --> D[YouTube timedtext Endpoint]
    D -- XML/JSON Data --> C
    C -- List of Snippets --> B
    B --> E{Output Mode}
    E -->|Write to File| F[Exported Transcript]
    E -->|Terminal| G[Standard Output]
```

---

## Features

- Direct API access — no HTML parsing, no browser automation
- Supports full YouTube URLs, short youtu.be links, Shorts URLs, embed URLs, and raw video IDs
- Output formats: plain text, JSON, SRT (SubRip), WebVTT
- Optional timestamp preservation in plain-text output
- Language selection with ordered fallback (e.g. try Japanese, then English)
- Batch processing — fetch transcripts for multiple videos in one command
- Auto-saves to file or directory with correct file extension
- Lists all available transcript languages for any video

---

## Installation

```bash
git clone https://github.com/your-username/youtube-transcript-fetcher.git
cd youtube-transcript-fetcher
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```

---

## Quick Start

```bash
# Print transcript to terminal
python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Save as plain text
python main.py dQw4w9WgXcQ -o transcript.txt

# Save as SRT subtitles
python main.py dQw4w9WgXcQ -f srt -o transcript.srt

# Save as JSON (includes start time + duration per segment)
python main.py dQw4w9WgXcQ -f json -o transcript.json

# Include timestamps in plain-text output
python main.py dQw4w9WgXcQ -t

# Request Spanish transcript, fall back to English if unavailable
python main.py dQw4w9WgXcQ -l es en

# List every available language for a video
python main.py dQw4w9WgXcQ --list

# Batch: fetch three videos and save each to ./transcripts/
python main.py ID1 ID2 ID3 -o ./transcripts/
```

---

## CLI Reference

```
usage: main.py [-h] [-l LANG [LANG ...]] [-f {text,json,srt,vtt}]
               [-t] [-o PATH] [--list]
               video [video ...]

positional arguments:
  video                 YouTube video URL(s) or video ID(s)

optional arguments:
  -h, --help            show this help message and exit
  -l, --languages       Language codes in order of preference (default: en)
  -f, --format          Output format: text, json, srt, vtt (default: text)
  -t, --timestamps      Add timestamps to plain-text output
  -o, --output          Output file (single video) or directory (batch)
  --list                List all available transcript languages and exit
```

---

## JSON Output Structure

Each entry in the JSON array contains:

```json
[
  {
    "text": "Never gonna give you up",
    "start": 43.08,
    "duration": 2.16
  }
]
```

| Field      | Type  | Description                      |
|------------|-------|----------------------------------|
| `text`     | str   | Caption text for the segment     |
| `start`    | float | Start time in seconds            |
| `duration` | float | Duration of the segment in seconds |

---

## Supported URL Formats

```
https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://www.youtube.com/shorts/VIDEO_ID
https://www.youtube.com/embed/VIDEO_ID
VIDEO_ID  (raw 11-character ID)
```

---

## Error Reference

| Exception              | Meaning                                            |
|------------------------|----------------------------------------------------|
| `TranscriptsDisabled`  | The video owner disabled captions                  |
| `VideoUnavailable`     | Video is private, deleted, or region-locked        |
| `NoTranscriptFound`    | Requested language(s) do not exist for this video  |
| `NoTranscriptAvailable`| No captions exist at all for this video            |

---

## Dependencies

| Package                  | Version | Purpose                               |
|--------------------------|---------|---------------------------------------|
| youtube-transcript-api   | 1.2.4   | Direct YouTube caption API access     |

No other dependencies. The standard library handles everything else.

---

## License

MIT License. See `LICENSE` for details.

---

## Citation
If you use this tool in your research or project, please cite it as follows:

```bibtex
@software{albeos2026yttfetcher,
  author = {Rembrant Oyangoren Albeos},
  title = {YouTube Transcript Fetcher: High-speed, Zero-scraping Caption Extraction},
  year = {2026},
  publisher = {Hugging Face},
  journal = {Hugging Face Repository},
  howpublished = {\url{https://huggingface.co/algorembrant/youtube-transcript-fetcher}},
  version = {1.2.4}
}
```

---

## Disclaimer

This tool uses YouTube's publicly accessible caption endpoint for personal,
educational, and research use. Review YouTube's Terms of Service before
using this tool in a production or commercial context.