Spaces:
Sleeping
Sleeping
File size: 1,444 Bytes
a175850 96750d7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | ---
title: UniversalScrap
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---
# π Universal Web Scraper
A powerful web scraping tool that can handle **ANY** website, including JavaScript-heavy single-page applications (SPAs). Built with Playwright and designed for Hugging Face Spaces.
## β¨ Features
- **π― Universal Compatibility**: Scrapes static HTML and JavaScript-rendered content
- **π Recursive Crawling**: Automatically follows and scrapes all internal links
- **π Smart Content Extraction**: Converts HTML to clean, readable text
- **πΎ Multiple Export Options**: Individual TXT files + ZIP download
- **π‘οΈ Failure-Resistant**: Multiple fallback methods ensure success
- **β‘ Optimized Performance**: Rate limiting and timeout handling
## π Perfect For
- Documentation websites
- E-commerce sites
- News portals
- Blogs and content sites
- Single-page applications (React, Vue, Angular)
- Any website with dynamic content
## π οΈ How It Works
1. **Primary Method**: Uses Playwright to handle JavaScript-heavy sites
2. **Fallback Method**: Uses aiohttp for static content if Playwright fails
3. **Content Processing**: Extracts clean text and all internal links
4. **Recursive Discovery**: Follows links up to specified depth
5. **File Generation**: Creates individual TXT files for each page
Built with β€οΈ for the web scraping community. |