Spaces:

Almaatla
/

web-scraper

Sleeping

web-scraper / README.md

Update README.md

1133199 verified 20 days ago

1.6 kB

	---
	title: Web Scraper API
	emoji: 🕸️
	colorFrom: indigo
	colorTo: blue
	sdk: docker
	app_port: 7860
	pinned: true
	---

	# Web Scraping Service

	This is a simple web scraping service built with FastAPI and BeautifulSoup, designed to be deployed on Hugging Face Spaces using Docker.

	## Features

	- URL Scraping: Extracts main content from a given URL.
	- Content Cleaning: Removes ads, scripts, styles, and other clutter using heuristic rules.
	- JSON Output: Returns clean text, title, and metadata.
	- Dockerized: Easy to deploy and run anywhere.

	## Local Development

	1. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	2. Run the application:
	```bash
	uvicorn main:app --reload
	```

	3. Test:
	Open your browser to `http://127.0.0.1:8000/docs` to see the interactive API documentation.

	## Deployment on Hugging Face Spaces

	1. Create a new Space on Hugging Face.
	2. Select Docker as the SDK.
	3. Upload the files in this repository to the Space.
	- `Dockerfile`
	- `requirements.txt`
	- `main.py`
	- `README.md`
	4. The application will build and start automatically on port 7860.

	## API Usage

	### Endpoint: `POST /scrape`

	Request Body:
	```json
	{
	"url": "https://example.com/article"
	}
	```

	Response:
	```json
	{
	"url": "https://example.com/article",
	"title": "Example Article Title",
	"content": "Extracted text content...",
	"status": "success"
	}
	```

	### Endpoint: `GET /scrape`

	Query Parameter: `url`

	Example: `https://your-space-url.hf.space/scrape?url=https://example.com`