UniversalScrap / README.md
Lukeetah's picture
Update README.md
96750d7 verified
---
title: UniversalScrap
emoji: πŸ‘€
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---
# πŸš€ Universal Web Scraper
A powerful web scraping tool that can handle **ANY** website, including JavaScript-heavy single-page applications (SPAs). Built with Playwright and designed for Hugging Face Spaces.
## ✨ Features
- **🎯 Universal Compatibility**: Scrapes static HTML and JavaScript-rendered content
- **πŸ”„ Recursive Crawling**: Automatically follows and scrapes all internal links
- **πŸ“Š Smart Content Extraction**: Converts HTML to clean, readable text
- **πŸ’Ύ Multiple Export Options**: Individual TXT files + ZIP download
- **πŸ›‘οΈ Failure-Resistant**: Multiple fallback methods ensure success
- **⚑ Optimized Performance**: Rate limiting and timeout handling
## πŸš€ Perfect For
- Documentation websites
- E-commerce sites
- News portals
- Blogs and content sites
- Single-page applications (React, Vue, Angular)
- Any website with dynamic content
## πŸ› οΈ How It Works
1. **Primary Method**: Uses Playwright to handle JavaScript-heavy sites
2. **Fallback Method**: Uses aiohttp for static content if Playwright fails
3. **Content Processing**: Extracts clean text and all internal links
4. **Recursive Discovery**: Follows links up to specified depth
5. **File Generation**: Creates individual TXT files for each page
Built with ❀️ for the web scraping community.