Karim shoair commited on
Commit
be68867
·
1 Parent(s): 78b07c8

docs: update website main page

Browse files
Files changed (1) hide show
  1. docs/index.md +65 -40
docs/index.md CHANGED
@@ -4,29 +4,31 @@
4
  }
5
  </style>
6
 
7
- <p align="center">
8
  <a href="https://scrapling.readthedocs.io/en/latest/" alt="poster">
9
  <img alt="poster" src="assets/poster.png" style="width: 50%; height: 100%;"></a>
10
- </p>
11
 
12
- Scrapling is an Undetectable, high-performance, intelligent Web scraping library for Python 3 to make Web Scraping easy!
 
 
13
 
14
- Scrapling isn't only about making undetectable requests or fetching pages under the radar!
15
 
16
- It has its own parser that adapts to website changes and provides many element selection/querying options other than traditional selectors, powerful DOM traversal API, and many other features while significantly outperforming popular parsing alternatives.
17
 
18
- Scrapling is built from the ground up by Web scraping experts for beginners and experts. The goal is to provide powerful features while maintaining simplicity and minimal boilerplate code.
19
 
20
  ```python
21
- >> from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher
22
- >> StealthyFetcher.auto_match = True
23
  # Fetch websites' source under the radar!
24
  >> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
25
  >> print(page.status)
26
  200
27
  >> products = page.css('.product', auto_save=True) # Scrape data that survives website design changes!
28
- >> # Later, if the website structure changes, pass `auto_match=True`
29
- >> products = page.css('.product', auto_match=True) # and Scrapling still finds them!
30
  ```
31
 
32
  ## Top Sponsors
@@ -38,31 +40,38 @@ Scrapling is built from the ground up by Web scraping experts for beginners and
38
  </div>
39
  <!-- /sponsors -->
40
 
41
- <i><sub>Do you want to show your ad here? Click [here](https://github.com/sponsors/D4Vinci) and choose the tier that suites you!</sub></i>
42
 
43
  ## Key Features
44
- ### Fetch websites as you prefer with async support
45
- - **HTTP Requests**: Fast and stealthy HTTP requests with the `Fetcher` class.
46
- - **Dynamic Loading & Automation**: Fetch dynamic websites with the `PlayWrightFetcher` class through your real browser, Scrapling's stealth mode, Playwright's Chromium browser, or [NSTbrowser](https://app.nstbrowser.io/r/1vO5e5)'s browserless!
47
- - **Anti-bot Protections Bypass**: Easily bypass protections with the `StealthyFetcher` and `PlayWrightFetcher` classes.
48
-
49
- ### Easy Scraping
50
- - **Smart Element Tracking**: Relocate elements after website changes using an intelligent similarity system and integrated storage.
51
- - **Flexible Selection**: CSS selectors, XPath selectors, filters-based search, text search, regex search, and more.
52
- - **Find Similar Elements**: Automatically locate elements similar to the element you found!
53
- - **Smart Content Scraping**: Extract data from multiple websites without specific selectors using Scrapling powerful features.
54
-
55
- ### High Performance
56
- - **Lightning Fast**: Built from the ground up with performance in mind, outperforming most popular Python scraping libraries.
57
- - **Memory Efficient**: Optimized data structures for minimal memory footprint.
58
- - **Fast JSON serialization**: 10x faster than standard library.
59
-
60
- ### Developer Friendly
61
- - **Powerful Navigation API**: Easy DOM traversal in all directions.
62
- - **Rich Text Processing**: All strings have built-in regex, cleaning methods, and more. All elements' attributes are optimized dictionaries that use less memory than standard dictionaries with added methods.
63
- - **Auto Selectors Generation**: Generate robust short and full CSS/XPath selectors for any element.
64
- - **Familiar API**: Similar to Scrapy/BeautifulSoup and the same CSS pseudo-elements used in Scrapy.
65
- - **Type hints**: Complete type/doc-strings coverage for future-proofing and best autocompletion support.
 
 
 
 
 
 
 
66
 
67
  ## Star History
68
  Scrapling’s GitHub stars have grown steadily since its release (see chart below).
@@ -98,19 +107,35 @@ observer.observe(document.body, {
98
  </script>
99
 
100
  ## Installation
101
- Scrapling is a breeze to get started with!<br/>Starting from version 0.2.9, we require at least Python 3.9 to work.
102
 
103
- Run this command to install it with Python's pip.
104
  ```bash
105
- pip3 install scrapling
106
  ```
107
- You are ready if you plan to use the parser only (the `Adaptor` class).
108
 
109
- But if you are going to make requests or fetch pages with Scrapling, then run this command to install browsers' dependencies needed to use the Fetchers
 
 
110
  ```bash
111
  scrapling install
112
  ```
113
- If you have any installation issues, please open an [issue](https://github.com/D4Vinci/Scrapling/issues/new/choose).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
 
115
  ## How the documentation is organized
116
  Scrapling has a lot of documentation, so we try to follow a guideline called the [Diátaxis documentation framework](https://diataxis.fr/).
@@ -121,7 +146,7 @@ If you like Scrapling and want to support its development:
121
 
122
  - ⭐ Star the [GitHub repository](https://github.com/D4Vinci/Scrapling)
123
  - 🚀 Follow us on [Twitter](https://x.com/Scrapling_dev) and join the [discord server](https://discord.gg/EMgGbDceNQ)
124
- - 💝 Consider [sponsoring the project or buying me a coffe](donate.md) :wink:
125
  - 🐛 Report bugs and suggest features through [GitHub Issues](https://github.com/D4Vinci/Scrapling/issues)
126
 
127
  ## License
 
4
  }
5
  </style>
6
 
7
+ <div align="center">
8
  <a href="https://scrapling.readthedocs.io/en/latest/" alt="poster">
9
  <img alt="poster" src="assets/poster.png" style="width: 50%; height: 100%;"></a>
10
+ </div>
11
 
12
+ <div align="center">
13
+ <i><code>Easy, effortless Web Scraping as it should be!</code></i>
14
+ </div>
15
 
16
+ **Stop fighting anti-bot systems. Stop rewriting selectors after every website update.**
17
 
18
+ Scrapling isn't just another Web Scraping library. It's the first **adaptive** scraping library that learns from website changes and evolves with them. While other libraries break when websites update their structure, Scrapling automatically relocates your elements and keeps your scrapers running.
19
 
20
+ Built for the modern Web, Scrapling has its own rapid parsing engine and its fetchers to handle all Web Scraping challenges you are facing or will face. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.
21
 
22
  ```python
23
+ >> from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
24
+ >> StealthyFetcher.adaptive = True
25
  # Fetch websites' source under the radar!
26
  >> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
27
  >> print(page.status)
28
  200
29
  >> products = page.css('.product', auto_save=True) # Scrape data that survives website design changes!
30
+ >> # Later, if the website structure changes, pass `adaptive=True`
31
+ >> products = page.css('.product', adaptive=True) # and Scrapling still finds them!
32
  ```
33
 
34
  ## Top Sponsors
 
40
  </div>
41
  <!-- /sponsors -->
42
 
43
+ <i><sub>Do you want to show your ad here? Click [here](https://github.com/sponsors/D4Vinci/sponsorships?tier_id=435495) and enjoy the rest of the perks!</sub></i>
44
 
45
  ## Key Features
46
+
47
+ ### Advanced Websites Fetching with Session Support
48
+ - **HTTP Requests**: Fast and stealthy HTTP requests with the `Fetcher` class. Can impersonate browsers' TLS fingerprint, headers, and use HTTP3.
49
+ - **Dynamic Loading**: Fetch dynamic websites with full browser automation through the `DynamicFetcher` class supporting Playwright's Chromium, real Chrome, and custom stealth mode.
50
+ - **Anti-bot Bypass**: Advanced stealth capabilities with `StealthyFetcher` using a modified version of Firefox and fingerprint spoofing. Can bypass all levels of Cloudflare's Turnstile with automation easily.
51
+ - **Session Management**: Persistent session support with `FetcherSession`, `StealthySession`, and `DynamicSession` classes for cookie and state management across requests.
52
+ - **Async Support**: Complete async support across all fetchers and dedicated async session classes.
53
+
54
+ ### Adaptive Scraping & AI Integration
55
+ - 🔄 **Smart Element Tracking**: Relocate elements after website changes using intelligent similarity algorithms.
56
+ - 🎯 **Smart Flexible Selection**: CSS selectors, XPath selectors, filter-based search, text search, regex search, and more.
57
+ - 🔍 **Find Similar Elements**: Automatically locate elements similar to found elements.
58
+ - 🤖 **MCP Server to be used with AI**: Built-in MCP server for AI-assisted Web Scraping and data extraction. The MCP server features custom, powerful capabilities that utilize Scrapling to extract targeted content before passing it to the AI (Claude/Cursor/etc), thereby speeding up operations and reducing costs by minimizing token usage.
59
+
60
+ ### High-Performance & battle-tested Architecture
61
+ - 🚀 **Lightning Fast**: Optimized performance outperforming most Python scraping libraries.
62
+ - 🔋 **Memory Efficient**: Optimized data structures and lazy loading for a minimal memory footprint.
63
+ - **Fast JSON Serialization**: 10x faster than the standard library.
64
+ - 🏗️ **Battle tested**: Not only does Scrapling have 92% test coverage and full type hints coverage, but it has been used daily by hundreds of Web Scrapers over the past year.
65
+
66
+ ### Developer/Web Scraper Friendly Experience
67
+ - 🎯 **Interactive Web Scraping Shell**: Optional built-in IPython shell with Scrapling integration, shortcuts, and new tools to speed up Web Scraping scripts development, like converting curl requests to Scrapling requests and viewing requests results in your browser.
68
+ - 🚀 **Use it directly from the Terminal**: Optionally, you can use Scrapling to scrape a URL without writing a single code!
69
+ - 🛠️ **Rich Navigation API**: Advanced DOM traversal with parent, sibling, and child navigation methods.
70
+ - 🧬 **Enhanced Text Processing**: Built-in regex, cleaning methods, and optimized string operations.
71
+ - 📝 **Auto Selector Generation**: Generate robust CSS/XPath selectors for any element.
72
+ - 🔌 **Familiar API**: Similar to Scrapy/BeautifulSoup with the same pseudo-elements used in Scrapy/Parsel.
73
+ - 📘 **Complete Type Coverage**: Full type hints for excellent IDE support and code completion.
74
+
75
 
76
  ## Star History
77
  Scrapling’s GitHub stars have grown steadily since its release (see chart below).
 
107
  </script>
108
 
109
  ## Installation
110
+ Scrapling requires Python 3.10 or higher:
111
 
 
112
  ```bash
113
+ pip install scrapling
114
  ```
 
115
 
116
+ #### Fetchers Setup
117
+
118
+ If you are going to use any of the fetchers or their session classes, then install browser dependencies with
119
  ```bash
120
  scrapling install
121
  ```
122
+
123
+ This downloads all browsers with their system dependencies and fingerprint manipulation dependencies.
124
+
125
+ ### Optional Dependencies
126
+
127
+ - Install the MCP server feature:
128
+ ```bash
129
+ pip install "scrapling[ai]"
130
+ ```
131
+ - Install shell features (Web Scraping shell and the `extract` command):
132
+ ```bash
133
+ pip install "scrapling[shell]"
134
+ ```
135
+ - Install everything:
136
+ ```bash
137
+ pip install "scrapling[all]"
138
+ ```
139
 
140
  ## How the documentation is organized
141
  Scrapling has a lot of documentation, so we try to follow a guideline called the [Diátaxis documentation framework](https://diataxis.fr/).
 
146
 
147
  - ⭐ Star the [GitHub repository](https://github.com/D4Vinci/Scrapling)
148
  - 🚀 Follow us on [Twitter](https://x.com/Scrapling_dev) and join the [discord server](https://discord.gg/EMgGbDceNQ)
149
+ - 💝 Consider [sponsoring the project or buying me a coffee](donate.md) :wink:
150
  - 🐛 Report bugs and suggest features through [GitHub Issues](https://github.com/D4Vinci/Scrapling/issues)
151
 
152
  ## License