Karim shoair commited on
Commit ·
0a39c7d
1
Parent(s): 2ae7713
docs: update the website main page
Browse files- docs/index.md +8 -8
docs/index.md
CHANGED
|
@@ -18,7 +18,7 @@
|
|
| 18 |
|
| 19 |
Scrapling isn't just another Web Scraping library. It's the first **adaptive** scraping library that learns from website changes and evolves with them. While other libraries break when websites update their structure, Scrapling automatically relocates your elements and keeps your scrapers running.
|
| 20 |
|
| 21 |
-
Built for the modern Web, Scrapling features its own rapid parsing engine and fetchers to handle all Web Scraping challenges you face or will face. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.
|
| 22 |
|
| 23 |
```python
|
| 24 |
>> from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
|
|
@@ -51,8 +51,8 @@ Built for the modern Web, Scrapling features its own rapid parsing engine and fe
|
|
| 51 |
|
| 52 |
### Advanced Websites Fetching with Session Support
|
| 53 |
- **HTTP Requests**: Fast and stealthy HTTP requests with the `Fetcher` class. Can impersonate browsers' TLS fingerprint, headers, and use HTTP/3.
|
| 54 |
-
- **Dynamic Loading**: Fetch dynamic websites with full browser automation through the `DynamicFetcher` class supporting Playwright's Chromium,
|
| 55 |
-
- **Anti-bot Bypass**: Advanced stealth capabilities with `StealthyFetcher`
|
| 56 |
- **Session Management**: Persistent session support with `FetcherSession`, `StealthySession`, and `DynamicSession` classes for cookie and state management across requests.
|
| 57 |
- **Async Support**: Complete async support across all fetchers and dedicated async session classes.
|
| 58 |
|
|
@@ -60,7 +60,7 @@ Built for the modern Web, Scrapling features its own rapid parsing engine and fe
|
|
| 60 |
- 🔄 **Smart Element Tracking**: Relocate elements after website changes using intelligent similarity algorithms.
|
| 61 |
- 🎯 **Smart Flexible Selection**: CSS selectors, XPath selectors, filter-based search, text search, regex search, and more.
|
| 62 |
- 🔍 **Find Similar Elements**: Automatically locate elements similar to found elements.
|
| 63 |
-
- 🤖 **MCP Server to be used with AI**: Built-in MCP server for AI-assisted Web Scraping and data extraction. The MCP server features
|
| 64 |
|
| 65 |
### High-Performance & battle-tested Architecture
|
| 66 |
- 🚀 **Lightning Fast**: Optimized performance outperforming most Python scraping libraries.
|
|
@@ -100,14 +100,14 @@ Starting with v0.3.2, this installation only includes the parser engine and its
|
|
| 100 |
|
| 101 |
### Optional Dependencies
|
| 102 |
|
| 103 |
-
1. If you are going to use any of the extra features below, the fetchers, or their classes,
|
| 104 |
```bash
|
| 105 |
pip install "scrapling[fetchers]"
|
| 106 |
|
| 107 |
scrapling install
|
| 108 |
```
|
| 109 |
|
| 110 |
-
This downloads all browsers with their system dependencies and fingerprint manipulation dependencies.
|
| 111 |
|
| 112 |
2. Extra features:
|
| 113 |
|
|
@@ -135,10 +135,10 @@ Or download it from the GitHub registry:
|
|
| 135 |
```bash
|
| 136 |
docker pull ghcr.io/d4vinci/scrapling:latest
|
| 137 |
```
|
| 138 |
-
This image is automatically built and pushed
|
| 139 |
|
| 140 |
## How the documentation is organized
|
| 141 |
-
Scrapling has
|
| 142 |
|
| 143 |
## Support
|
| 144 |
|
|
|
|
| 18 |
|
| 19 |
Scrapling isn't just another Web Scraping library. It's the first **adaptive** scraping library that learns from website changes and evolves with them. While other libraries break when websites update their structure, Scrapling automatically relocates your elements and keeps your scrapers running.
|
| 20 |
|
| 21 |
+
Built for the modern Web, Scrapling features **its own rapid parsing engine** and fetchers to handle all Web Scraping challenges you face or will face. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.
|
| 22 |
|
| 23 |
```python
|
| 24 |
>> from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
|
|
|
|
| 51 |
|
| 52 |
### Advanced Websites Fetching with Session Support
|
| 53 |
- **HTTP Requests**: Fast and stealthy HTTP requests with the `Fetcher` class. Can impersonate browsers' TLS fingerprint, headers, and use HTTP/3.
|
| 54 |
+
- **Dynamic Loading**: Fetch dynamic websites with full browser automation through the `DynamicFetcher` class supporting Playwright's Chromium, and Google's Chrome.
|
| 55 |
+
- **Anti-bot Bypass**: Advanced stealth capabilities with `StealthyFetcher` and fingerprint spoofing. Can bypass all types of Cloudflare's Turnstile/Interstitial with automation easily.
|
| 56 |
- **Session Management**: Persistent session support with `FetcherSession`, `StealthySession`, and `DynamicSession` classes for cookie and state management across requests.
|
| 57 |
- **Async Support**: Complete async support across all fetchers and dedicated async session classes.
|
| 58 |
|
|
|
|
| 60 |
- 🔄 **Smart Element Tracking**: Relocate elements after website changes using intelligent similarity algorithms.
|
| 61 |
- 🎯 **Smart Flexible Selection**: CSS selectors, XPath selectors, filter-based search, text search, regex search, and more.
|
| 62 |
- 🔍 **Find Similar Elements**: Automatically locate elements similar to found elements.
|
| 63 |
+
- 🤖 **MCP Server to be used with AI**: Built-in MCP server for AI-assisted Web Scraping and data extraction. The MCP server features powerful, custom capabilities that leverage Scrapling to extract targeted content before passing it to the AI (Claude/Cursor/etc), thereby speeding up operations and reducing costs by minimizing token usage. ([demo video](https://www.youtube.com/watch?v=qyFk3ZNwOxE))
|
| 64 |
|
| 65 |
### High-Performance & battle-tested Architecture
|
| 66 |
- 🚀 **Lightning Fast**: Optimized performance outperforming most Python scraping libraries.
|
|
|
|
| 100 |
|
| 101 |
### Optional Dependencies
|
| 102 |
|
| 103 |
+
1. If you are going to use any of the extra features below, the fetchers, or their classes, you will need to install fetchers' dependencies and their browser dependencies as follows:
|
| 104 |
```bash
|
| 105 |
pip install "scrapling[fetchers]"
|
| 106 |
|
| 107 |
scrapling install
|
| 108 |
```
|
| 109 |
|
| 110 |
+
This downloads all browsers, along with their system dependencies and fingerprint manipulation dependencies.
|
| 111 |
|
| 112 |
2. Extra features:
|
| 113 |
|
|
|
|
| 135 |
```bash
|
| 136 |
docker pull ghcr.io/d4vinci/scrapling:latest
|
| 137 |
```
|
| 138 |
+
This image is automatically built and pushed using GitHub Actions and the repository's main branch.
|
| 139 |
|
| 140 |
## How the documentation is organized
|
| 141 |
+
Scrapling has extensive documentation, so we try to follow the [Diátaxis documentation framework](https://diataxis.fr/).
|
| 142 |
|
| 143 |
## Support
|
| 144 |
|