Karim shoair commited on
Commit ·
be68867
1
Parent(s): 78b07c8
docs: update website main page
Browse files- docs/index.md +65 -40
docs/index.md
CHANGED
|
@@ -4,29 +4,31 @@
|
|
| 4 |
}
|
| 5 |
</style>
|
| 6 |
|
| 7 |
-
<
|
| 8 |
<a href="https://scrapling.readthedocs.io/en/latest/" alt="poster">
|
| 9 |
<img alt="poster" src="assets/poster.png" style="width: 50%; height: 100%;"></a>
|
| 10 |
-
</
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
It
|
| 17 |
|
| 18 |
-
Scrapling
|
| 19 |
|
| 20 |
```python
|
| 21 |
-
>> from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher,
|
| 22 |
-
>> StealthyFetcher.
|
| 23 |
# Fetch websites' source under the radar!
|
| 24 |
>> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
|
| 25 |
>> print(page.status)
|
| 26 |
200
|
| 27 |
>> products = page.css('.product', auto_save=True) # Scrape data that survives website design changes!
|
| 28 |
-
>> # Later, if the website structure changes, pass `
|
| 29 |
-
>> products = page.css('.product',
|
| 30 |
```
|
| 31 |
|
| 32 |
## Top Sponsors
|
|
@@ -38,31 +40,38 @@ Scrapling is built from the ground up by Web scraping experts for beginners and
|
|
| 38 |
</div>
|
| 39 |
<!-- /sponsors -->
|
| 40 |
|
| 41 |
-
<i><sub>Do you want to show your ad here? Click [here](https://github.com/sponsors/D4Vinci) and
|
| 42 |
|
| 43 |
## Key Features
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
- **
|
| 47 |
-
- **
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
- **
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
- **Smart
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
- **
|
| 57 |
-
|
| 58 |
-
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
- **
|
| 62 |
-
- **
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
## Star History
|
| 68 |
Scrapling’s GitHub stars have grown steadily since its release (see chart below).
|
|
@@ -98,19 +107,35 @@ observer.observe(document.body, {
|
|
| 98 |
</script>
|
| 99 |
|
| 100 |
## Installation
|
| 101 |
-
Scrapling
|
| 102 |
|
| 103 |
-
Run this command to install it with Python's pip.
|
| 104 |
```bash
|
| 105 |
-
|
| 106 |
```
|
| 107 |
-
You are ready if you plan to use the parser only (the `Adaptor` class).
|
| 108 |
|
| 109 |
-
|
|
|
|
|
|
|
| 110 |
```bash
|
| 111 |
scrapling install
|
| 112 |
```
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
## How the documentation is organized
|
| 116 |
Scrapling has a lot of documentation, so we try to follow a guideline called the [Diátaxis documentation framework](https://diataxis.fr/).
|
|
@@ -121,7 +146,7 @@ If you like Scrapling and want to support its development:
|
|
| 121 |
|
| 122 |
- ⭐ Star the [GitHub repository](https://github.com/D4Vinci/Scrapling)
|
| 123 |
- 🚀 Follow us on [Twitter](https://x.com/Scrapling_dev) and join the [discord server](https://discord.gg/EMgGbDceNQ)
|
| 124 |
-
- 💝 Consider [sponsoring the project or buying me a
|
| 125 |
- 🐛 Report bugs and suggest features through [GitHub Issues](https://github.com/D4Vinci/Scrapling/issues)
|
| 126 |
|
| 127 |
## License
|
|
|
|
| 4 |
}
|
| 5 |
</style>
|
| 6 |
|
| 7 |
+
<div align="center">
|
| 8 |
<a href="https://scrapling.readthedocs.io/en/latest/" alt="poster">
|
| 9 |
<img alt="poster" src="assets/poster.png" style="width: 50%; height: 100%;"></a>
|
| 10 |
+
</div>
|
| 11 |
|
| 12 |
+
<div align="center">
|
| 13 |
+
<i><code>Easy, effortless Web Scraping as it should be!</code></i>
|
| 14 |
+
</div>
|
| 15 |
|
| 16 |
+
**Stop fighting anti-bot systems. Stop rewriting selectors after every website update.**
|
| 17 |
|
| 18 |
+
Scrapling isn't just another Web Scraping library. It's the first **adaptive** scraping library that learns from website changes and evolves with them. While other libraries break when websites update their structure, Scrapling automatically relocates your elements and keeps your scrapers running.
|
| 19 |
|
| 20 |
+
Built for the modern Web, Scrapling has its own rapid parsing engine and its fetchers to handle all Web Scraping challenges you are facing or will face. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.
|
| 21 |
|
| 22 |
```python
|
| 23 |
+
>> from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
|
| 24 |
+
>> StealthyFetcher.adaptive = True
|
| 25 |
# Fetch websites' source under the radar!
|
| 26 |
>> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
|
| 27 |
>> print(page.status)
|
| 28 |
200
|
| 29 |
>> products = page.css('.product', auto_save=True) # Scrape data that survives website design changes!
|
| 30 |
+
>> # Later, if the website structure changes, pass `adaptive=True`
|
| 31 |
+
>> products = page.css('.product', adaptive=True) # and Scrapling still finds them!
|
| 32 |
```
|
| 33 |
|
| 34 |
## Top Sponsors
|
|
|
|
| 40 |
</div>
|
| 41 |
<!-- /sponsors -->
|
| 42 |
|
| 43 |
+
<i><sub>Do you want to show your ad here? Click [here](https://github.com/sponsors/D4Vinci/sponsorships?tier_id=435495) and enjoy the rest of the perks!</sub></i>
|
| 44 |
|
| 45 |
## Key Features
|
| 46 |
+
|
| 47 |
+
### Advanced Websites Fetching with Session Support
|
| 48 |
+
- **HTTP Requests**: Fast and stealthy HTTP requests with the `Fetcher` class. Can impersonate browsers' TLS fingerprint, headers, and use HTTP3.
|
| 49 |
+
- **Dynamic Loading**: Fetch dynamic websites with full browser automation through the `DynamicFetcher` class supporting Playwright's Chromium, real Chrome, and custom stealth mode.
|
| 50 |
+
- **Anti-bot Bypass**: Advanced stealth capabilities with `StealthyFetcher` using a modified version of Firefox and fingerprint spoofing. Can bypass all levels of Cloudflare's Turnstile with automation easily.
|
| 51 |
+
- **Session Management**: Persistent session support with `FetcherSession`, `StealthySession`, and `DynamicSession` classes for cookie and state management across requests.
|
| 52 |
+
- **Async Support**: Complete async support across all fetchers and dedicated async session classes.
|
| 53 |
+
|
| 54 |
+
### Adaptive Scraping & AI Integration
|
| 55 |
+
- 🔄 **Smart Element Tracking**: Relocate elements after website changes using intelligent similarity algorithms.
|
| 56 |
+
- 🎯 **Smart Flexible Selection**: CSS selectors, XPath selectors, filter-based search, text search, regex search, and more.
|
| 57 |
+
- 🔍 **Find Similar Elements**: Automatically locate elements similar to found elements.
|
| 58 |
+
- 🤖 **MCP Server to be used with AI**: Built-in MCP server for AI-assisted Web Scraping and data extraction. The MCP server features custom, powerful capabilities that utilize Scrapling to extract targeted content before passing it to the AI (Claude/Cursor/etc), thereby speeding up operations and reducing costs by minimizing token usage.
|
| 59 |
+
|
| 60 |
+
### High-Performance & battle-tested Architecture
|
| 61 |
+
- 🚀 **Lightning Fast**: Optimized performance outperforming most Python scraping libraries.
|
| 62 |
+
- 🔋 **Memory Efficient**: Optimized data structures and lazy loading for a minimal memory footprint.
|
| 63 |
+
- ⚡ **Fast JSON Serialization**: 10x faster than the standard library.
|
| 64 |
+
- 🏗️ **Battle tested**: Not only does Scrapling have 92% test coverage and full type hints coverage, but it has been used daily by hundreds of Web Scrapers over the past year.
|
| 65 |
+
|
| 66 |
+
### Developer/Web Scraper Friendly Experience
|
| 67 |
+
- 🎯 **Interactive Web Scraping Shell**: Optional built-in IPython shell with Scrapling integration, shortcuts, and new tools to speed up Web Scraping scripts development, like converting curl requests to Scrapling requests and viewing requests results in your browser.
|
| 68 |
+
- 🚀 **Use it directly from the Terminal**: Optionally, you can use Scrapling to scrape a URL without writing a single code!
|
| 69 |
+
- 🛠️ **Rich Navigation API**: Advanced DOM traversal with parent, sibling, and child navigation methods.
|
| 70 |
+
- 🧬 **Enhanced Text Processing**: Built-in regex, cleaning methods, and optimized string operations.
|
| 71 |
+
- 📝 **Auto Selector Generation**: Generate robust CSS/XPath selectors for any element.
|
| 72 |
+
- 🔌 **Familiar API**: Similar to Scrapy/BeautifulSoup with the same pseudo-elements used in Scrapy/Parsel.
|
| 73 |
+
- 📘 **Complete Type Coverage**: Full type hints for excellent IDE support and code completion.
|
| 74 |
+
|
| 75 |
|
| 76 |
## Star History
|
| 77 |
Scrapling’s GitHub stars have grown steadily since its release (see chart below).
|
|
|
|
| 107 |
</script>
|
| 108 |
|
| 109 |
## Installation
|
| 110 |
+
Scrapling requires Python 3.10 or higher:
|
| 111 |
|
|
|
|
| 112 |
```bash
|
| 113 |
+
pip install scrapling
|
| 114 |
```
|
|
|
|
| 115 |
|
| 116 |
+
#### Fetchers Setup
|
| 117 |
+
|
| 118 |
+
If you are going to use any of the fetchers or their session classes, then install browser dependencies with
|
| 119 |
```bash
|
| 120 |
scrapling install
|
| 121 |
```
|
| 122 |
+
|
| 123 |
+
This downloads all browsers with their system dependencies and fingerprint manipulation dependencies.
|
| 124 |
+
|
| 125 |
+
### Optional Dependencies
|
| 126 |
+
|
| 127 |
+
- Install the MCP server feature:
|
| 128 |
+
```bash
|
| 129 |
+
pip install "scrapling[ai]"
|
| 130 |
+
```
|
| 131 |
+
- Install shell features (Web Scraping shell and the `extract` command):
|
| 132 |
+
```bash
|
| 133 |
+
pip install "scrapling[shell]"
|
| 134 |
+
```
|
| 135 |
+
- Install everything:
|
| 136 |
+
```bash
|
| 137 |
+
pip install "scrapling[all]"
|
| 138 |
+
```
|
| 139 |
|
| 140 |
## How the documentation is organized
|
| 141 |
Scrapling has a lot of documentation, so we try to follow a guideline called the [Diátaxis documentation framework](https://diataxis.fr/).
|
|
|
|
| 146 |
|
| 147 |
- ⭐ Star the [GitHub repository](https://github.com/D4Vinci/Scrapling)
|
| 148 |
- 🚀 Follow us on [Twitter](https://x.com/Scrapling_dev) and join the [discord server](https://discord.gg/EMgGbDceNQ)
|
| 149 |
+
- 💝 Consider [sponsoring the project or buying me a coffee](donate.md) :wink:
|
| 150 |
- 🐛 Report bugs and suggest features through [GitHub Issues](https://github.com/D4Vinci/Scrapling/issues)
|
| 151 |
|
| 152 |
## License
|