Spaces:

neural-arun
/

ArunCore

Running

App Files Files Community

ArunCore / data /github /web_wizard /readme.md

Neural Arun

ArunCore Deployment

9ae77d7 about 1 month ago

preview code

raw

history blame contribute delete

1.5 kB

Web Wizard — Playwright Automation Curriculum

What This Is

A structured, progressive learning repository for advanced web automation and scraping using Playwright for Python. Covers a full 12-module curriculum from browser automation basics through to distributed, AI-integrated crawler architectures.

Problem

Playwright is far more powerful than most tutorials cover. Existing resources teach basic navigation and clicking — but not stealth techniques, async concurrency, database integration, Celery-based orchestration, or using Playwright as an LLM agent tool. This repo is a self-directed curriculum to cover all of it.

Curriculum Structure

Part 1 — Foundations: Python async, HTTP internals, DOM, DevTools, Playwright core API, network interception, debugging
Part 2 — Advanced Scraping: Anti-bot & stealth, async concurrency, Postgres/Pandas integration, Docker + pytest CI
Part 3 — Production & AI: Infinite scroll, SPAs, multi-step auth, Celery/RabbitMQ orchestration, Playwright as an LLM agent tool

Hands-On Projects Planned

Single-page scraper with CSV export
Login-protected scraper
Infinite-scroll scraper with deduplication
XHR-intercept scraper
Multi-user crawler writing to Postgres
Playwright-agent connector for LLM/RAG workflows
Capstone: Distributed, Dockerized crawler with queue + vector DB pipeline

Tech Stack

Playwright, pytest, pandas, SQLAlchemy, PostgreSQL, Redis, Celery, Docker, ChromaDB, LangChain