| # Step-by-Step Setup and Usage Guide | |
| Author: algorembrant | |
| --- | |
| ## Prerequisites | |
| | Requirement | Minimum Version | Notes | | |
| |----------------------|-----------------|--------------------------------------------| | |
| | Python | 3.8 | 3.10+ recommended | | |
| | pip | 21.0 | | | |
| | Anthropic API Key | -- | Required for clean and summarize commands | | |
| You need an Anthropic API key to use the `clean`, `summarize`, and `pipeline` commands. | |
| Obtain one at: https://console.anthropic.com | |
| --- | |
| ## Step 1 β Get the Code | |
| **Option A: Git clone** | |
| ```bash | |
| git clone https://github.com/algorembrant/youtube-transcript-toolkit.git | |
| cd youtube-transcript-toolkit | |
| ``` | |
| **Option B: Download ZIP** | |
| Download and unzip, then open a terminal inside the project folder. | |
| --- | |
| ## Step 2 β Create a Virtual Environment | |
| **macOS / Linux** | |
| ```bash | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| ``` | |
| **Windows (Command Prompt)** | |
| ```cmd | |
| python -m venv .venv | |
| .venv\Scripts\activate.bat | |
| ``` | |
| **Windows (PowerShell)** | |
| ```powershell | |
| python -m venv .venv | |
| .venv\Scripts\Activate.ps1 | |
| ``` | |
| You should see `(.venv)` at the start of your terminal prompt. | |
| --- | |
| ## Step 3 β Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| Verify: | |
| ```bash | |
| pip show anthropic | |
| pip show youtube-transcript-api | |
| ``` | |
| --- | |
| ## Step 4 β Set Your Anthropic API Key | |
| **macOS / Linux (current session)** | |
| ```bash | |
| export ANTHROPIC_API_KEY="sk-ant-your-key-here" | |
| ``` | |
| **macOS / Linux (permanent β add to shell profile)** | |
| ```bash | |
| echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc | |
| source ~/.zshrc | |
| ``` | |
| **Windows (Command Prompt)** | |
| ```cmd | |
| set ANTHROPIC_API_KEY=sk-ant-your-key-here | |
| ``` | |
| **Windows (PowerShell)** | |
| ```powershell | |
| $env:ANTHROPIC_API_KEY = "sk-ant-your-key-here" | |
| ``` | |
| **Windows (permanent via System Settings)** | |
| 1. Search "Environment Variables" in Start Menu | |
| 2. Click "Edit the system environment variables" | |
| 3. Add a new variable: `ANTHROPIC_API_KEY` = your key | |
| The `fetch` and `list` commands do NOT require an API key. | |
| Only `clean`, `summarize`, and `pipeline` need it. | |
| --- | |
| ## Step 5 β Run Your First Commands | |
| ### Fetch a raw transcript (no API key needed) | |
| ```bash | |
| python main.py fetch "https://www.youtube.com/watch?v=dQw4w9WgXcQ" | |
| ``` | |
| ### See what languages are available | |
| ```bash | |
| python main.py list dQw4w9WgXcQ | |
| ``` | |
| ### Clean the transcript into paragraphs | |
| ```bash | |
| python main.py clean dQw4w9WgXcQ | |
| ``` | |
| ### Summarize the transcript | |
| ```bash | |
| python main.py summarize dQw4w9WgXcQ -m brief | |
| python main.py summarize dQw4w9WgXcQ -m detailed | |
| python main.py summarize dQw4w9WgXcQ -m bullets | |
| python main.py summarize dQw4w9WgXcQ -m outline | |
| ``` | |
| ### Run the full pipeline (fetch + clean + summarize) | |
| ```bash | |
| python main.py pipeline dQw4w9WgXcQ -m bullets | |
| ``` | |
| --- | |
| ## Step 6 β Save Output to Files | |
| ### Single video β specify a file path | |
| ```bash | |
| python main.py clean dQw4w9WgXcQ -o cleaned.txt | |
| python main.py summarize dQw4w9WgXcQ -m detailed -o summary.txt | |
| ``` | |
| ### Pipeline β specify a directory (creates 3 files per video) | |
| ```bash | |
| python main.py pipeline dQw4w9WgXcQ -o ./output/ | |
| ``` | |
| Files created: | |
| ``` | |
| ./output/ | |
| dQw4w9WgXcQ_transcript.txt | |
| dQw4w9WgXcQ_cleaned.txt | |
| dQw4w9WgXcQ_summary.txt | |
| ``` | |
| ### Batch β multiple videos at once | |
| ```bash | |
| python main.py pipeline VIDEO_ID_1 VIDEO_ID_2 VIDEO_ID_3 -o ./batch_output/ | |
| ``` | |
| --- | |
| ## Step 7 β Advanced Options | |
| ### Use the higher-quality model | |
| ```bash | |
| python main.py clean dQw4w9WgXcQ --quality | |
| python main.py summarize dQw4w9WgXcQ -m detailed --quality | |
| ``` | |
| Default model: `claude-haiku-4-5` (fast, cost-efficient) | |
| Quality model: `claude-sonnet-4-6` (better for complex or long transcripts) | |
| ### Disable streaming (show output only after completion) | |
| ```bash | |
| python main.py clean dQw4w9WgXcQ --no-stream | |
| ``` | |
| ### Request a non-English transcript | |
| ```bash | |
| python main.py clean dQw4w9WgXcQ -l ja # Japanese only | |
| python main.py clean dQw4w9WgXcQ -l es en # Spanish, fall back to English | |
| ``` | |
| ### Fetch raw transcript as SRT or JSON | |
| ```bash | |
| python main.py fetch dQw4w9WgXcQ -f srt -o captions.srt | |
| python main.py fetch dQw4w9WgXcQ -f json -o transcript.json | |
| python main.py fetch dQw4w9WgXcQ -f vtt -o captions.vtt | |
| ``` | |
| ### Fetch with timestamps | |
| ```bash | |
| python main.py fetch dQw4w9WgXcQ -t | |
| python main.py pipeline dQw4w9WgXcQ -t -o ./output/ | |
| ``` | |
| ### Pipeline β skip individual steps | |
| ```bash | |
| # Fetch and summarize without cleaning | |
| python main.py pipeline dQw4w9WgXcQ --skip-clean -m bullets | |
| # Fetch and clean without summarizing | |
| python main.py pipeline dQw4w9WgXcQ --skip-summary | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| | Symptom | Likely Cause | Fix | | |
| |---------|-------------|-----| | |
| | `TranscriptsDisabled` error | Video owner disabled captions | Use a different video | | |
| | `VideoUnavailable` error | Private, deleted, or region-locked | Check URL; try VPN if region-locked | | |
| | `NoTranscriptFound` | Requested language missing | Run `list` to see available languages | | |
| | `AuthenticationError` | API key missing or wrong | Check `ANTHROPIC_API_KEY` env variable | | |
| | `ModuleNotFoundError` | Dependencies not installed | Run `pip install -r requirements.txt` | | |
| | Chunking messages in stderr | Transcript very long | Normal β multi-pass processing is automatic | | |
| | Output cuts off mid-sentence | max_tokens limit hit | This is rare; open an issue if it occurs | | |
| --- | |
| ## Project File Reference | |
| ``` | |
| main.py CLI entry point β all five commands | |
| fetcher.py YouTube direct caption API (no scraping) | |
| cleaner.py AI paragraph reformatter | |
| summarizer.py AI summarizer (4 modes) | |
| pipeline.py Orchestrates the full fetch -> clean -> summarize chain | |
| ai_client.py Anthropic API wrapper with chunking and streaming | |
| config.py Constants: model names, chunk size, summary modes | |
| requirements.txt Two dependencies | |
| README.md Full project documentation | |
| GUIDE.md This file | |
| LICENSE MIT License | |
| ``` | |