ArunCore / data /github /web_wizard /readme.md
Neural Arun
ArunCore Deployment
9ae77d7

Web Wizard — Playwright Automation Curriculum

What This Is

A structured, progressive learning repository for advanced web automation and scraping using Playwright for Python. Covers a full 12-module curriculum from browser automation basics through to distributed, AI-integrated crawler architectures.

Problem

Playwright is far more powerful than most tutorials cover. Existing resources teach basic navigation and clicking — but not stealth techniques, async concurrency, database integration, Celery-based orchestration, or using Playwright as an LLM agent tool. This repo is a self-directed curriculum to cover all of it.

Curriculum Structure

  • Part 1 — Foundations: Python async, HTTP internals, DOM, DevTools, Playwright core API, network interception, debugging
  • Part 2 — Advanced Scraping: Anti-bot & stealth, async concurrency, Postgres/Pandas integration, Docker + pytest CI
  • Part 3 — Production & AI: Infinite scroll, SPAs, multi-step auth, Celery/RabbitMQ orchestration, Playwright as an LLM agent tool

Hands-On Projects Planned

  1. Single-page scraper with CSV export
  2. Login-protected scraper
  3. Infinite-scroll scraper with deduplication
  4. XHR-intercept scraper
  5. Multi-user crawler writing to Postgres
  6. Playwright-agent connector for LLM/RAG workflows
  7. Capstone: Distributed, Dockerized crawler with queue + vector DB pipeline

Tech Stack

Playwright, pytest, pandas, SQLAlchemy, PostgreSQL, Redis, Celery, Docker, ChromaDB, LangChain