algorembrant's picture
Upload 12 files
d2bfe97 verified
# Step-by-Step Setup and Usage Guide
Author: algorembrant
---
## Prerequisites
| Requirement | Minimum Version | Notes |
|----------------------|-----------------|--------------------------------------------|
| Python | 3.8 | 3.10+ recommended |
| pip | 21.0 | |
| Anthropic API Key | -- | Required for clean and summarize commands |
You need an Anthropic API key to use the `clean`, `summarize`, and `pipeline` commands.
Obtain one at: https://console.anthropic.com
---
## Step 1 β€” Get the Code
**Option A: Git clone**
```bash
git clone https://github.com/algorembrant/youtube-transcript-toolkit.git
cd youtube-transcript-toolkit
```
**Option B: Download ZIP**
Download and unzip, then open a terminal inside the project folder.
---
## Step 2 β€” Create a Virtual Environment
**macOS / Linux**
```bash
python3 -m venv .venv
source .venv/bin/activate
```
**Windows (Command Prompt)**
```cmd
python -m venv .venv
.venv\Scripts\activate.bat
```
**Windows (PowerShell)**
```powershell
python -m venv .venv
.venv\Scripts\Activate.ps1
```
You should see `(.venv)` at the start of your terminal prompt.
---
## Step 3 β€” Install Dependencies
```bash
pip install -r requirements.txt
```
Verify:
```bash
pip show anthropic
pip show youtube-transcript-api
```
---
## Step 4 β€” Set Your Anthropic API Key
**macOS / Linux (current session)**
```bash
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
```
**macOS / Linux (permanent β€” add to shell profile)**
```bash
echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc
source ~/.zshrc
```
**Windows (Command Prompt)**
```cmd
set ANTHROPIC_API_KEY=sk-ant-your-key-here
```
**Windows (PowerShell)**
```powershell
$env:ANTHROPIC_API_KEY = "sk-ant-your-key-here"
```
**Windows (permanent via System Settings)**
1. Search "Environment Variables" in Start Menu
2. Click "Edit the system environment variables"
3. Add a new variable: `ANTHROPIC_API_KEY` = your key
The `fetch` and `list` commands do NOT require an API key.
Only `clean`, `summarize`, and `pipeline` need it.
---
## Step 5 β€” Run Your First Commands
### Fetch a raw transcript (no API key needed)
```bash
python main.py fetch "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
```
### See what languages are available
```bash
python main.py list dQw4w9WgXcQ
```
### Clean the transcript into paragraphs
```bash
python main.py clean dQw4w9WgXcQ
```
### Summarize the transcript
```bash
python main.py summarize dQw4w9WgXcQ -m brief
python main.py summarize dQw4w9WgXcQ -m detailed
python main.py summarize dQw4w9WgXcQ -m bullets
python main.py summarize dQw4w9WgXcQ -m outline
```
### Run the full pipeline (fetch + clean + summarize)
```bash
python main.py pipeline dQw4w9WgXcQ -m bullets
```
---
## Step 6 β€” Save Output to Files
### Single video β€” specify a file path
```bash
python main.py clean dQw4w9WgXcQ -o cleaned.txt
python main.py summarize dQw4w9WgXcQ -m detailed -o summary.txt
```
### Pipeline β€” specify a directory (creates 3 files per video)
```bash
python main.py pipeline dQw4w9WgXcQ -o ./output/
```
Files created:
```
./output/
dQw4w9WgXcQ_transcript.txt
dQw4w9WgXcQ_cleaned.txt
dQw4w9WgXcQ_summary.txt
```
### Batch β€” multiple videos at once
```bash
python main.py pipeline VIDEO_ID_1 VIDEO_ID_2 VIDEO_ID_3 -o ./batch_output/
```
---
## Step 7 β€” Advanced Options
### Use the higher-quality model
```bash
python main.py clean dQw4w9WgXcQ --quality
python main.py summarize dQw4w9WgXcQ -m detailed --quality
```
Default model: `claude-haiku-4-5` (fast, cost-efficient)
Quality model: `claude-sonnet-4-6` (better for complex or long transcripts)
### Disable streaming (show output only after completion)
```bash
python main.py clean dQw4w9WgXcQ --no-stream
```
### Request a non-English transcript
```bash
python main.py clean dQw4w9WgXcQ -l ja # Japanese only
python main.py clean dQw4w9WgXcQ -l es en # Spanish, fall back to English
```
### Fetch raw transcript as SRT or JSON
```bash
python main.py fetch dQw4w9WgXcQ -f srt -o captions.srt
python main.py fetch dQw4w9WgXcQ -f json -o transcript.json
python main.py fetch dQw4w9WgXcQ -f vtt -o captions.vtt
```
### Fetch with timestamps
```bash
python main.py fetch dQw4w9WgXcQ -t
python main.py pipeline dQw4w9WgXcQ -t -o ./output/
```
### Pipeline β€” skip individual steps
```bash
# Fetch and summarize without cleaning
python main.py pipeline dQw4w9WgXcQ --skip-clean -m bullets
# Fetch and clean without summarizing
python main.py pipeline dQw4w9WgXcQ --skip-summary
```
---
## Troubleshooting
| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| `TranscriptsDisabled` error | Video owner disabled captions | Use a different video |
| `VideoUnavailable` error | Private, deleted, or region-locked | Check URL; try VPN if region-locked |
| `NoTranscriptFound` | Requested language missing | Run `list` to see available languages |
| `AuthenticationError` | API key missing or wrong | Check `ANTHROPIC_API_KEY` env variable |
| `ModuleNotFoundError` | Dependencies not installed | Run `pip install -r requirements.txt` |
| Chunking messages in stderr | Transcript very long | Normal β€” multi-pass processing is automatic |
| Output cuts off mid-sentence | max_tokens limit hit | This is rare; open an issue if it occurs |
---
## Project File Reference
```
main.py CLI entry point β€” all five commands
fetcher.py YouTube direct caption API (no scraping)
cleaner.py AI paragraph reformatter
summarizer.py AI summarizer (4 modes)
pipeline.py Orchestrates the full fetch -> clean -> summarize chain
ai_client.py Anthropic API wrapper with chunking and streaming
config.py Constants: model names, chunk size, summary modes
requirements.txt Two dependencies
README.md Full project documentation
GUIDE.md This file
LICENSE MIT License
```