# Web Wizard — Playwright Automation Curriculum ## What This Is A structured, progressive learning repository for advanced web automation and scraping using Playwright for Python. Covers a full 12-module curriculum from browser automation basics through to distributed, AI-integrated crawler architectures. ## Problem Playwright is far more powerful than most tutorials cover. Existing resources teach basic navigation and clicking — but not stealth techniques, async concurrency, database integration, Celery-based orchestration, or using Playwright as an LLM agent tool. This repo is a self-directed curriculum to cover all of it. ## Curriculum Structure - **Part 1 — Foundations:** Python async, HTTP internals, DOM, DevTools, Playwright core API, network interception, debugging - **Part 2 — Advanced Scraping:** Anti-bot & stealth, async concurrency, Postgres/Pandas integration, Docker + pytest CI - **Part 3 — Production & AI:** Infinite scroll, SPAs, multi-step auth, Celery/RabbitMQ orchestration, Playwright as an LLM agent tool ## Hands-On Projects Planned 1. Single-page scraper with CSV export 2. Login-protected scraper 3. Infinite-scroll scraper with deduplication 4. XHR-intercept scraper 5. Multi-user crawler writing to Postgres 6. Playwright-agent connector for LLM/RAG workflows 7. Capstone: Distributed, Dockerized crawler with queue + vector DB pipeline ## Tech Stack Playwright, pytest, pandas, SQLAlchemy, PostgreSQL, Redis, Celery, Docker, ChromaDB, LangChain