File size: 6,073 Bytes
d2bfe97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
# Step-by-Step Setup and Usage Guide

Author: algorembrant

---

## Prerequisites

| Requirement          | Minimum Version | Notes                                      |
|----------------------|-----------------|--------------------------------------------|
| Python               | 3.8             | 3.10+ recommended                          |
| pip                  | 21.0            |                                            |
| Anthropic API Key    | --              | Required for clean and summarize commands  |

You need an Anthropic API key to use the `clean`, `summarize`, and `pipeline` commands.
Obtain one at: https://console.anthropic.com

---

## Step 1 β€” Get the Code

**Option A: Git clone**
```bash
git clone https://github.com/algorembrant/youtube-transcript-toolkit.git
cd youtube-transcript-toolkit
```

**Option B: Download ZIP**
Download and unzip, then open a terminal inside the project folder.

---

## Step 2 β€” Create a Virtual Environment

**macOS / Linux**
```bash
python3 -m venv .venv
source .venv/bin/activate
```

**Windows (Command Prompt)**
```cmd
python -m venv .venv
.venv\Scripts\activate.bat
```

**Windows (PowerShell)**
```powershell
python -m venv .venv
.venv\Scripts\Activate.ps1
```

You should see `(.venv)` at the start of your terminal prompt.

---

## Step 3 β€” Install Dependencies

```bash
pip install -r requirements.txt
```

Verify:
```bash
pip show anthropic
pip show youtube-transcript-api
```

---

## Step 4 β€” Set Your Anthropic API Key

**macOS / Linux (current session)**
```bash
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
```

**macOS / Linux (permanent β€” add to shell profile)**
```bash
echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc
source ~/.zshrc
```

**Windows (Command Prompt)**
```cmd
set ANTHROPIC_API_KEY=sk-ant-your-key-here
```

**Windows (PowerShell)**
```powershell
$env:ANTHROPIC_API_KEY = "sk-ant-your-key-here"
```

**Windows (permanent via System Settings)**
1. Search "Environment Variables" in Start Menu
2. Click "Edit the system environment variables"
3. Add a new variable: `ANTHROPIC_API_KEY` = your key

The `fetch` and `list` commands do NOT require an API key.
Only `clean`, `summarize`, and `pipeline` need it.

---

## Step 5 β€” Run Your First Commands

### Fetch a raw transcript (no API key needed)

```bash
python main.py fetch "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
```

### See what languages are available

```bash
python main.py list dQw4w9WgXcQ
```

### Clean the transcript into paragraphs

```bash
python main.py clean dQw4w9WgXcQ
```

### Summarize the transcript

```bash
python main.py summarize dQw4w9WgXcQ -m brief
python main.py summarize dQw4w9WgXcQ -m detailed
python main.py summarize dQw4w9WgXcQ -m bullets
python main.py summarize dQw4w9WgXcQ -m outline
```

### Run the full pipeline (fetch + clean + summarize)

```bash
python main.py pipeline dQw4w9WgXcQ -m bullets
```

---

## Step 6 β€” Save Output to Files

### Single video β€” specify a file path

```bash
python main.py clean dQw4w9WgXcQ -o cleaned.txt
python main.py summarize dQw4w9WgXcQ -m detailed -o summary.txt
```

### Pipeline β€” specify a directory (creates 3 files per video)

```bash
python main.py pipeline dQw4w9WgXcQ -o ./output/
```

Files created:
```
./output/
  dQw4w9WgXcQ_transcript.txt
  dQw4w9WgXcQ_cleaned.txt
  dQw4w9WgXcQ_summary.txt
```

### Batch β€” multiple videos at once

```bash
python main.py pipeline VIDEO_ID_1 VIDEO_ID_2 VIDEO_ID_3 -o ./batch_output/
```

---

## Step 7 β€” Advanced Options

### Use the higher-quality model

```bash
python main.py clean dQw4w9WgXcQ --quality
python main.py summarize dQw4w9WgXcQ -m detailed --quality
```

Default model: `claude-haiku-4-5` (fast, cost-efficient)
Quality model: `claude-sonnet-4-6` (better for complex or long transcripts)

### Disable streaming (show output only after completion)

```bash
python main.py clean dQw4w9WgXcQ --no-stream
```

### Request a non-English transcript

```bash
python main.py clean dQw4w9WgXcQ -l ja       # Japanese only
python main.py clean dQw4w9WgXcQ -l es en    # Spanish, fall back to English
```

### Fetch raw transcript as SRT or JSON

```bash
python main.py fetch dQw4w9WgXcQ -f srt -o captions.srt
python main.py fetch dQw4w9WgXcQ -f json -o transcript.json
python main.py fetch dQw4w9WgXcQ -f vtt -o captions.vtt
```

### Fetch with timestamps

```bash
python main.py fetch dQw4w9WgXcQ -t
python main.py pipeline dQw4w9WgXcQ -t -o ./output/
```

### Pipeline β€” skip individual steps

```bash
# Fetch and summarize without cleaning
python main.py pipeline dQw4w9WgXcQ --skip-clean -m bullets

# Fetch and clean without summarizing
python main.py pipeline dQw4w9WgXcQ --skip-summary
```

---

## Troubleshooting

| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| `TranscriptsDisabled` error | Video owner disabled captions | Use a different video |
| `VideoUnavailable` error | Private, deleted, or region-locked | Check URL; try VPN if region-locked |
| `NoTranscriptFound` | Requested language missing | Run `list` to see available languages |
| `AuthenticationError` | API key missing or wrong | Check `ANTHROPIC_API_KEY` env variable |
| `ModuleNotFoundError` | Dependencies not installed | Run `pip install -r requirements.txt` |
| Chunking messages in stderr | Transcript very long | Normal β€” multi-pass processing is automatic |
| Output cuts off mid-sentence | max_tokens limit hit | This is rare; open an issue if it occurs |

---

## Project File Reference

```
main.py          CLI entry point β€” all five commands
fetcher.py       YouTube direct caption API (no scraping)
cleaner.py       AI paragraph reformatter
summarizer.py    AI summarizer (4 modes)
pipeline.py      Orchestrates the full fetch -> clean -> summarize chain
ai_client.py     Anthropic API wrapper with chunking and streaming
config.py        Constants: model names, chunk size, summary modes
requirements.txt Two dependencies
README.md        Full project documentation
GUIDE.md         This file
LICENSE          MIT License
```