# Step-by-Step Setup and Usage Guide

Author: algorembrant

---

## Prerequisites

| Requirement          | Minimum Version | Notes                                      |
|----------------------|-----------------|--------------------------------------------|
| Python               | 3.8             | 3.10+ recommended                          |
| pip                  | 21.0            |                                            |
| Anthropic API Key    | --              | Required for clean and summarize commands  |

You need an Anthropic API key to use the `clean`, `summarize`, and `pipeline` commands.
Obtain one at: https://console.anthropic.com

---

## Step 1 — Get the Code

**Option A: Git clone**
```bash
git clone https://github.com/algorembrant/youtube-transcript-toolkit.git
cd youtube-transcript-toolkit
```

**Option B: Download ZIP**
Download and unzip, then open a terminal inside the project folder.

---

## Step 2 — Create a Virtual Environment

**macOS / Linux**
```bash
python3 -m venv .venv
source .venv/bin/activate
```

**Windows (Command Prompt)**
```cmd
python -m venv .venv
.venv\Scripts\activate.bat
```

**Windows (PowerShell)**
```powershell
python -m venv .venv
.venv\Scripts\Activate.ps1
```

You should see `(.venv)` at the start of your terminal prompt.

---

## Step 3 — Install Dependencies

```bash
pip install -r requirements.txt
```

Verify:
```bash
pip show anthropic
pip show youtube-transcript-api
```

---

## Step 4 — Set Your Anthropic API Key

**macOS / Linux (current session)**
```bash
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
```

**macOS / Linux (permanent — add to shell profile)**
```bash
echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc
source ~/.zshrc
```

**Windows (Command Prompt)**
```cmd
set ANTHROPIC_API_KEY=sk-ant-your-key-here
```

**Windows (PowerShell)**
```powershell
$env:ANTHROPIC_API_KEY = "sk-ant-your-key-here"
```

**Windows (permanent via System Settings)**
1. Search "Environment Variables" in Start Menu
2. Click "Edit the system environment variables"
3. Add a new variable: `ANTHROPIC_API_KEY` = your key

The `fetch` and `list` commands do NOT require an API key.
Only `clean`, `summarize`, and `pipeline` need it.

---

## Step 5 — Run Your First Commands

### Fetch a raw transcript (no API key needed)

```bash
python main.py fetch "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
```

### See what languages are available

```bash
python main.py list dQw4w9WgXcQ
```

### Clean the transcript into paragraphs

```bash
python main.py clean dQw4w9WgXcQ
```

### Summarize the transcript

```bash
python main.py summarize dQw4w9WgXcQ -m brief
python main.py summarize dQw4w9WgXcQ -m detailed
python main.py summarize dQw4w9WgXcQ -m bullets
python main.py summarize dQw4w9WgXcQ -m outline
```

### Run the full pipeline (fetch + clean + summarize)

```bash
python main.py pipeline dQw4w9WgXcQ -m bullets
```

---

## Step 6 — Save Output to Files

### Single video — specify a file path

```bash
python main.py clean dQw4w9WgXcQ -o cleaned.txt
python main.py summarize dQw4w9WgXcQ -m detailed -o summary.txt
```

### Pipeline — specify a directory (creates 3 files per video)

```bash
python main.py pipeline dQw4w9WgXcQ -o ./output/
```

Files created:
```
./output/
  dQw4w9WgXcQ_transcript.txt
  dQw4w9WgXcQ_cleaned.txt
  dQw4w9WgXcQ_summary.txt
```

### Batch — multiple videos at once

```bash
python main.py pipeline VIDEO_ID_1 VIDEO_ID_2 VIDEO_ID_3 -o ./batch_output/
```

---

## Step 7 — Advanced Options

### Use the higher-quality model

```bash
python main.py clean dQw4w9WgXcQ --quality
python main.py summarize dQw4w9WgXcQ -m detailed --quality
```

Default model: `claude-haiku-4-5` (fast, cost-efficient)
Quality model: `claude-sonnet-4-6` (better for complex or long transcripts)

### Disable streaming (show output only after completion)

```bash
python main.py clean dQw4w9WgXcQ --no-stream
```

### Request a non-English transcript

```bash
python main.py clean dQw4w9WgXcQ -l ja       # Japanese only
python main.py clean dQw4w9WgXcQ -l es en    # Spanish, fall back to English
```

### Fetch raw transcript as SRT or JSON

```bash
python main.py fetch dQw4w9WgXcQ -f srt -o captions.srt
python main.py fetch dQw4w9WgXcQ -f json -o transcript.json
python main.py fetch dQw4w9WgXcQ -f vtt -o captions.vtt
```

### Fetch with timestamps

```bash
python main.py fetch dQw4w9WgXcQ -t
python main.py pipeline dQw4w9WgXcQ -t -o ./output/
```

### Pipeline — skip individual steps

```bash
# Fetch and summarize without cleaning
python main.py pipeline dQw4w9WgXcQ --skip-clean -m bullets

# Fetch and clean without summarizing
python main.py pipeline dQw4w9WgXcQ --skip-summary
```

---

## Troubleshooting

| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| `TranscriptsDisabled` error | Video owner disabled captions | Use a different video |
| `VideoUnavailable` error | Private, deleted, or region-locked | Check URL; try VPN if region-locked |
| `NoTranscriptFound` | Requested language missing | Run `list` to see available languages |
| `AuthenticationError` | API key missing or wrong | Check `ANTHROPIC_API_KEY` env variable |
| `ModuleNotFoundError` | Dependencies not installed | Run `pip install -r requirements.txt` |
| Chunking messages in stderr | Transcript very long | Normal — multi-pass processing is automatic |
| Output cuts off mid-sentence | max_tokens limit hit | This is rare; open an issue if it occurs |

---

## Project File Reference

```
main.py          CLI entry point — all five commands
fetcher.py       YouTube direct caption API (no scraping)
cleaner.py       AI paragraph reformatter
summarizer.py    AI summarizer (4 modes)
pipeline.py      Orchestrates the full fetch -> clean -> summarize chain
ai_client.py     Anthropic API wrapper with chunking and streaming
config.py        Constants: model names, chunk size, summary modes
requirements.txt Two dependencies
README.md        Full project documentation
GUIDE.md         This file
LICENSE          MIT License
```