Spaces:

Lukeetah
/

UniversalScrap

Sleeping

File size: 1,444 Bytes

---
title: UniversalScrap
emoji: 👀
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---

# 🚀 Universal Web Scraper

A powerful web scraping tool that can handle **ANY** website, including JavaScript-heavy single-page applications (SPAs). Built with Playwright and designed for Hugging Face Spaces.

## ✨ Features

- **🎯 Universal Compatibility**: Scrapes static HTML and JavaScript-rendered content
- **🔄 Recursive Crawling**: Automatically follows and scrapes all internal links
- **📊 Smart Content Extraction**: Converts HTML to clean, readable text
- **💾 Multiple Export Options**: Individual TXT files + ZIP download
- **🛡️ Failure-Resistant**: Multiple fallback methods ensure success
- **⚡ Optimized Performance**: Rate limiting and timeout handling

## 🚀 Perfect For

- Documentation websites
- E-commerce sites
- News portals
- Blogs and content sites
- Single-page applications (React, Vue, Angular)
- Any website with dynamic content

## 🛠️ How It Works

1. **Primary Method**: Uses Playwright to handle JavaScript-heavy sites
2. **Fallback Method**: Uses aiohttp for static content if Playwright fails
3. **Content Processing**: Extracts clean text and all internal links
4. **Recursive Discovery**: Follows links up to specified depth
5. **File Generation**: Creates individual TXT files for each page

Built with ❤️ for the web scraping community.