algorembrant

Upload 12 files

d2bfe97 verified about 17 hours ago

preview code

raw

history blame contribute delete

6.07 kB

Step-by-Step Setup and Usage Guide

Author: algorembrant

Prerequisites

Requirement	Minimum Version	Notes
Python	3.8	3.10+ recommended
pip	21.0
Anthropic API Key	--	Required for clean and summarize commands

You need an Anthropic API key to use the clean, summarize, and pipeline commands. Obtain one at: https://console.anthropic.com

Step 1 — Get the Code

Option A: Git clone

git clone https://github.com/algorembrant/youtube-transcript-toolkit.git
cd youtube-transcript-toolkit

Option B: Download ZIP Download and unzip, then open a terminal inside the project folder.

Step 2 — Create a Virtual Environment

macOS / Linux

python3 -m venv .venv
source .venv/bin/activate

Windows (Command Prompt)

python -m venv .venv
.venv\Scripts\activate.bat

Windows (PowerShell)

python -m venv .venv
.venv\Scripts\Activate.ps1

You should see (.venv) at the start of your terminal prompt.

Step 3 — Install Dependencies

pip install -r requirements.txt

Verify:

pip show anthropic
pip show youtube-transcript-api

Step 4 — Set Your Anthropic API Key

macOS / Linux (current session)

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

macOS / Linux (permanent — add to shell profile)

echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc
source ~/.zshrc

Windows (Command Prompt)

set ANTHROPIC_API_KEY=sk-ant-your-key-here

Windows (PowerShell)

$env:ANTHROPIC_API_KEY = "sk-ant-your-key-here"

Windows (permanent via System Settings)

Search "Environment Variables" in Start Menu
Click "Edit the system environment variables"
Add a new variable: ANTHROPIC_API_KEY = your key

The fetch and list commands do NOT require an API key. Only clean, summarize, and pipeline need it.

Step 5 — Run Your First Commands

Fetch a raw transcript (no API key needed)

python main.py fetch "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

See what languages are available

python main.py list dQw4w9WgXcQ

Clean the transcript into paragraphs

python main.py clean dQw4w9WgXcQ

Summarize the transcript

python main.py summarize dQw4w9WgXcQ -m brief
python main.py summarize dQw4w9WgXcQ -m detailed
python main.py summarize dQw4w9WgXcQ -m bullets
python main.py summarize dQw4w9WgXcQ -m outline

Run the full pipeline (fetch + clean + summarize)

python main.py pipeline dQw4w9WgXcQ -m bullets

Step 6 — Save Output to Files

Single video — specify a file path

python main.py clean dQw4w9WgXcQ -o cleaned.txt
python main.py summarize dQw4w9WgXcQ -m detailed -o summary.txt

Pipeline — specify a directory (creates 3 files per video)

python main.py pipeline dQw4w9WgXcQ -o ./output/

Files created:

./output/
  dQw4w9WgXcQ_transcript.txt
  dQw4w9WgXcQ_cleaned.txt
  dQw4w9WgXcQ_summary.txt

Batch — multiple videos at once

python main.py pipeline VIDEO_ID_1 VIDEO_ID_2 VIDEO_ID_3 -o ./batch_output/

Step 7 — Advanced Options

Use the higher-quality model

python main.py clean dQw4w9WgXcQ --quality
python main.py summarize dQw4w9WgXcQ -m detailed --quality

Default model: claude-haiku-4-5 (fast, cost-efficient) Quality model: claude-sonnet-4-6 (better for complex or long transcripts)

Disable streaming (show output only after completion)

python main.py clean dQw4w9WgXcQ --no-stream

Request a non-English transcript

python main.py clean dQw4w9WgXcQ -l ja       # Japanese only
python main.py clean dQw4w9WgXcQ -l es en    # Spanish, fall back to English

Fetch raw transcript as SRT or JSON

python main.py fetch dQw4w9WgXcQ -f srt -o captions.srt
python main.py fetch dQw4w9WgXcQ -f json -o transcript.json
python main.py fetch dQw4w9WgXcQ -f vtt -o captions.vtt

Fetch with timestamps

python main.py fetch dQw4w9WgXcQ -t
python main.py pipeline dQw4w9WgXcQ -t -o ./output/

Pipeline — skip individual steps

# Fetch and summarize without cleaning
python main.py pipeline dQw4w9WgXcQ --skip-clean -m bullets

# Fetch and clean without summarizing
python main.py pipeline dQw4w9WgXcQ --skip-summary

Troubleshooting

Symptom	Likely Cause	Fix
`TranscriptsDisabled` error	Video owner disabled captions	Use a different video
`VideoUnavailable` error	Private, deleted, or region-locked	Check URL; try VPN if region-locked
`NoTranscriptFound`	Requested language missing	Run `list` to see available languages
`AuthenticationError`	API key missing or wrong	Check `ANTHROPIC_API_KEY` env variable
`ModuleNotFoundError`	Dependencies not installed	Run `pip install -r requirements.txt`
Chunking messages in stderr	Transcript very long	Normal — multi-pass processing is automatic
Output cuts off mid-sentence	max_tokens limit hit	This is rare; open an issue if it occurs

Project File Reference

main.py          CLI entry point — all five commands
fetcher.py       YouTube direct caption API (no scraping)
cleaner.py       AI paragraph reformatter
summarizer.py    AI summarizer (4 modes)
pipeline.py      Orchestrates the full fetch -> clean -> summarize chain
ai_client.py     Anthropic API wrapper with chunking and streaming
config.py        Constants: model names, chunk size, summary modes
requirements.txt Two dependencies
README.md        Full project documentation
GUIDE.md         This file
LICENSE          MIT License