--- title: Web Scraper API emoji: 🕸️ colorFrom: indigo colorTo: blue sdk: docker app_port: 7860 pinned: true --- # Web Scraping Service This is a simple web scraping service built with FastAPI and BeautifulSoup, designed to be deployed on Hugging Face Spaces using Docker. ## Features - **URL Scraping**: Extracts main content from a given URL. - **Content Cleaning**: Removes ads, scripts, styles, and other clutter using heuristic rules. - **JSON Output**: Returns clean text, title, and metadata. - **Dockerized**: Easy to deploy and run anywhere. ## Local Development 1. **Install dependencies**: ```bash pip install -r requirements.txt ``` 2. **Run the application**: ```bash uvicorn main:app --reload ``` 3. **Test**: Open your browser to `http://127.0.0.1:8000/docs` to see the interactive API documentation. ## Deployment on Hugging Face Spaces 1. Create a new Space on Hugging Face. 2. Select **Docker** as the SDK. 3. Upload the files in this repository to the Space. - `Dockerfile` - `requirements.txt` - `main.py` - `README.md` 4. The application will build and start automatically on port 7860. ## API Usage ### Endpoint: `POST /scrape` **Request Body:** ```json { "url": "https://example.com/article" } ``` **Response:** ```json { "url": "https://example.com/article", "title": "Example Article Title", "content": "Extracted text content...", "status": "success" } ``` ### Endpoint: `GET /scrape` **Query Parameter:** `url` Example: `https://your-space-url.hf.space/scrape?url=https://example.com`