Upload 12 files

d2bfe97 verified about 21 hours ago

6.07 kB

	# Step-by-Step Setup and Usage Guide

	Author: algorembrant

	---

	## Prerequisites

	\| Requirement \| Minimum Version \| Notes \|
	\|----------------------\|-----------------\|--------------------------------------------\|
	\| Python \| 3.8 \| 3.10+ recommended \|
	\| pip \| 21.0 \| \|
	\| Anthropic API Key \| -- \| Required for clean and summarize commands \|

	You need an Anthropic API key to use the `clean`, `summarize`, and `pipeline` commands.
	Obtain one at: https://console.anthropic.com

	---

	## Step 1 — Get the Code

	Option A: Git clone
	```bash
	git clone https://github.com/algorembrant/youtube-transcript-toolkit.git
	cd youtube-transcript-toolkit
	```

	Option B: Download ZIP
	Download and unzip, then open a terminal inside the project folder.

	---

	## Step 2 — Create a Virtual Environment

	macOS / Linux
	```bash
	python3 -m venv .venv
	source .venv/bin/activate
	```

	Windows (Command Prompt)
	```cmd
	python -m venv .venv
	.venv\Scripts\activate.bat
	```

	Windows (PowerShell)
	```powershell
	python -m venv .venv
	.venv\Scripts\Activate.ps1
	```

	You should see `(.venv)` at the start of your terminal prompt.

	---

	## Step 3 — Install Dependencies

	```bash
	pip install -r requirements.txt
	```

	Verify:
	```bash
	pip show anthropic
	pip show youtube-transcript-api
	```

	---

	## Step 4 — Set Your Anthropic API Key

	macOS / Linux (current session)
	```bash
	export ANTHROPIC_API_KEY="sk-ant-your-key-here"
	```

	macOS / Linux (permanent — add to shell profile)
	```bash
	echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc
	source ~/.zshrc
	```

	Windows (Command Prompt)
	```cmd
	set ANTHROPIC_API_KEY=sk-ant-your-key-here
	```

	Windows (PowerShell)
	```powershell
	$env:ANTHROPIC_API_KEY = "sk-ant-your-key-here"
	```

	Windows (permanent via System Settings)
	1. Search "Environment Variables" in Start Menu
	2. Click "Edit the system environment variables"
	3. Add a new variable: `ANTHROPIC_API_KEY` = your key

	The `fetch` and `list` commands do NOT require an API key.
	Only `clean`, `summarize`, and `pipeline` need it.

	---

	## Step 5 — Run Your First Commands

	### Fetch a raw transcript (no API key needed)

	```bash
	python main.py fetch "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
	```

	### See what languages are available

	```bash
	python main.py list dQw4w9WgXcQ
	```

	### Clean the transcript into paragraphs

	```bash
	python main.py clean dQw4w9WgXcQ
	```

	### Summarize the transcript

	```bash
	python main.py summarize dQw4w9WgXcQ -m brief
	python main.py summarize dQw4w9WgXcQ -m detailed
	python main.py summarize dQw4w9WgXcQ -m bullets
	python main.py summarize dQw4w9WgXcQ -m outline
	```

	### Run the full pipeline (fetch + clean + summarize)

	```bash
	python main.py pipeline dQw4w9WgXcQ -m bullets
	```

	---

	## Step 6 — Save Output to Files

	### Single video — specify a file path

	```bash
	python main.py clean dQw4w9WgXcQ -o cleaned.txt
	python main.py summarize dQw4w9WgXcQ -m detailed -o summary.txt
	```

	### Pipeline — specify a directory (creates 3 files per video)

	```bash
	python main.py pipeline dQw4w9WgXcQ -o ./output/
	```

	Files created:
	```
	./output/
	dQw4w9WgXcQ_transcript.txt
	dQw4w9WgXcQ_cleaned.txt
	dQw4w9WgXcQ_summary.txt
	```

	### Batch — multiple videos at once

	```bash
	python main.py pipeline VIDEO_ID_1 VIDEO_ID_2 VIDEO_ID_3 -o ./batch_output/
	```

	---

	## Step 7 — Advanced Options

	### Use the higher-quality model

	```bash
	python main.py clean dQw4w9WgXcQ --quality
	python main.py summarize dQw4w9WgXcQ -m detailed --quality
	```

	Default model: `claude-haiku-4-5` (fast, cost-efficient)
	Quality model: `claude-sonnet-4-6` (better for complex or long transcripts)

	### Disable streaming (show output only after completion)

	```bash
	python main.py clean dQw4w9WgXcQ --no-stream
	```

	### Request a non-English transcript

	```bash
	python main.py clean dQw4w9WgXcQ -l ja # Japanese only
	python main.py clean dQw4w9WgXcQ -l es en # Spanish, fall back to English
	```

	### Fetch raw transcript as SRT or JSON

	```bash
	python main.py fetch dQw4w9WgXcQ -f srt -o captions.srt
	python main.py fetch dQw4w9WgXcQ -f json -o transcript.json
	python main.py fetch dQw4w9WgXcQ -f vtt -o captions.vtt
	```

	### Fetch with timestamps

	```bash
	python main.py fetch dQw4w9WgXcQ -t
	python main.py pipeline dQw4w9WgXcQ -t -o ./output/
	```

	### Pipeline — skip individual steps

	```bash
	# Fetch and summarize without cleaning
	python main.py pipeline dQw4w9WgXcQ --skip-clean -m bullets

	# Fetch and clean without summarizing
	python main.py pipeline dQw4w9WgXcQ --skip-summary
	```

	---

	## Troubleshooting

	\| Symptom \| Likely Cause \| Fix \|
	\|---------\|-------------\|-----\|
	\| `TranscriptsDisabled` error \| Video owner disabled captions \| Use a different video \|
	\| `VideoUnavailable` error \| Private, deleted, or region-locked \| Check URL; try VPN if region-locked \|
	\| `NoTranscriptFound` \| Requested language missing \| Run `list` to see available languages \|
	\| `AuthenticationError` \| API key missing or wrong \| Check `ANTHROPIC_API_KEY` env variable \|
	\| `ModuleNotFoundError` \| Dependencies not installed \| Run `pip install -r requirements.txt` \|
	\| Chunking messages in stderr \| Transcript very long \| Normal — multi-pass processing is automatic \|
	\| Output cuts off mid-sentence \| max_tokens limit hit \| This is rare; open an issue if it occurs \|

	---

	## Project File Reference

	```
	main.py CLI entry point — all five commands
	fetcher.py YouTube direct caption API (no scraping)
	cleaner.py AI paragraph reformatter
	summarizer.py AI summarizer (4 modes)
	pipeline.py Orchestrates the full fetch -> clean -> summarize chain
	ai_client.py Anthropic API wrapper with chunking and streaming
	config.py Constants: model names, chunk size, summary modes
	requirements.txt Two dependencies
	README.md Full project documentation
	GUIDE.md This file
	LICENSE MIT License
	```