File size: 6,073 Bytes
d2bfe97 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 | # Step-by-Step Setup and Usage Guide
Author: algorembrant
---
## Prerequisites
| Requirement | Minimum Version | Notes |
|----------------------|-----------------|--------------------------------------------|
| Python | 3.8 | 3.10+ recommended |
| pip | 21.0 | |
| Anthropic API Key | -- | Required for clean and summarize commands |
You need an Anthropic API key to use the `clean`, `summarize`, and `pipeline` commands.
Obtain one at: https://console.anthropic.com
---
## Step 1 β Get the Code
**Option A: Git clone**
```bash
git clone https://github.com/algorembrant/youtube-transcript-toolkit.git
cd youtube-transcript-toolkit
```
**Option B: Download ZIP**
Download and unzip, then open a terminal inside the project folder.
---
## Step 2 β Create a Virtual Environment
**macOS / Linux**
```bash
python3 -m venv .venv
source .venv/bin/activate
```
**Windows (Command Prompt)**
```cmd
python -m venv .venv
.venv\Scripts\activate.bat
```
**Windows (PowerShell)**
```powershell
python -m venv .venv
.venv\Scripts\Activate.ps1
```
You should see `(.venv)` at the start of your terminal prompt.
---
## Step 3 β Install Dependencies
```bash
pip install -r requirements.txt
```
Verify:
```bash
pip show anthropic
pip show youtube-transcript-api
```
---
## Step 4 β Set Your Anthropic API Key
**macOS / Linux (current session)**
```bash
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
```
**macOS / Linux (permanent β add to shell profile)**
```bash
echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc
source ~/.zshrc
```
**Windows (Command Prompt)**
```cmd
set ANTHROPIC_API_KEY=sk-ant-your-key-here
```
**Windows (PowerShell)**
```powershell
$env:ANTHROPIC_API_KEY = "sk-ant-your-key-here"
```
**Windows (permanent via System Settings)**
1. Search "Environment Variables" in Start Menu
2. Click "Edit the system environment variables"
3. Add a new variable: `ANTHROPIC_API_KEY` = your key
The `fetch` and `list` commands do NOT require an API key.
Only `clean`, `summarize`, and `pipeline` need it.
---
## Step 5 β Run Your First Commands
### Fetch a raw transcript (no API key needed)
```bash
python main.py fetch "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
```
### See what languages are available
```bash
python main.py list dQw4w9WgXcQ
```
### Clean the transcript into paragraphs
```bash
python main.py clean dQw4w9WgXcQ
```
### Summarize the transcript
```bash
python main.py summarize dQw4w9WgXcQ -m brief
python main.py summarize dQw4w9WgXcQ -m detailed
python main.py summarize dQw4w9WgXcQ -m bullets
python main.py summarize dQw4w9WgXcQ -m outline
```
### Run the full pipeline (fetch + clean + summarize)
```bash
python main.py pipeline dQw4w9WgXcQ -m bullets
```
---
## Step 6 β Save Output to Files
### Single video β specify a file path
```bash
python main.py clean dQw4w9WgXcQ -o cleaned.txt
python main.py summarize dQw4w9WgXcQ -m detailed -o summary.txt
```
### Pipeline β specify a directory (creates 3 files per video)
```bash
python main.py pipeline dQw4w9WgXcQ -o ./output/
```
Files created:
```
./output/
dQw4w9WgXcQ_transcript.txt
dQw4w9WgXcQ_cleaned.txt
dQw4w9WgXcQ_summary.txt
```
### Batch β multiple videos at once
```bash
python main.py pipeline VIDEO_ID_1 VIDEO_ID_2 VIDEO_ID_3 -o ./batch_output/
```
---
## Step 7 β Advanced Options
### Use the higher-quality model
```bash
python main.py clean dQw4w9WgXcQ --quality
python main.py summarize dQw4w9WgXcQ -m detailed --quality
```
Default model: `claude-haiku-4-5` (fast, cost-efficient)
Quality model: `claude-sonnet-4-6` (better for complex or long transcripts)
### Disable streaming (show output only after completion)
```bash
python main.py clean dQw4w9WgXcQ --no-stream
```
### Request a non-English transcript
```bash
python main.py clean dQw4w9WgXcQ -l ja # Japanese only
python main.py clean dQw4w9WgXcQ -l es en # Spanish, fall back to English
```
### Fetch raw transcript as SRT or JSON
```bash
python main.py fetch dQw4w9WgXcQ -f srt -o captions.srt
python main.py fetch dQw4w9WgXcQ -f json -o transcript.json
python main.py fetch dQw4w9WgXcQ -f vtt -o captions.vtt
```
### Fetch with timestamps
```bash
python main.py fetch dQw4w9WgXcQ -t
python main.py pipeline dQw4w9WgXcQ -t -o ./output/
```
### Pipeline β skip individual steps
```bash
# Fetch and summarize without cleaning
python main.py pipeline dQw4w9WgXcQ --skip-clean -m bullets
# Fetch and clean without summarizing
python main.py pipeline dQw4w9WgXcQ --skip-summary
```
---
## Troubleshooting
| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| `TranscriptsDisabled` error | Video owner disabled captions | Use a different video |
| `VideoUnavailable` error | Private, deleted, or region-locked | Check URL; try VPN if region-locked |
| `NoTranscriptFound` | Requested language missing | Run `list` to see available languages |
| `AuthenticationError` | API key missing or wrong | Check `ANTHROPIC_API_KEY` env variable |
| `ModuleNotFoundError` | Dependencies not installed | Run `pip install -r requirements.txt` |
| Chunking messages in stderr | Transcript very long | Normal β multi-pass processing is automatic |
| Output cuts off mid-sentence | max_tokens limit hit | This is rare; open an issue if it occurs |
---
## Project File Reference
```
main.py CLI entry point β all five commands
fetcher.py YouTube direct caption API (no scraping)
cleaner.py AI paragraph reformatter
summarizer.py AI summarizer (4 modes)
pipeline.py Orchestrates the full fetch -> clean -> summarize chain
ai_client.py Anthropic API wrapper with chunking and streaming
config.py Constants: model names, chunk size, summary modes
requirements.txt Two dependencies
README.md Full project documentation
GUIDE.md This file
LICENSE MIT License
```
|