Karim shoair commited on
Commit
028ed17
·
1 Parent(s): 309c3e5

docs: Update all pages related to last changes

Browse files
Files changed (4) hide show
  1. README.md +28 -24
  2. docs/fetching/dynamic.md +4 -1
  3. docs/fetching/stealthy.md +2 -1
  4. docs/index.md +24 -20
README.md CHANGED
@@ -92,7 +92,7 @@ Built for the modern Web, Scrapling has its own rapid parsing engine and its fet
92
  - 🔄 **Smart Element Tracking**: Relocate elements after website changes using intelligent similarity algorithms.
93
  - 🎯 **Smart Flexible Selection**: CSS selectors, XPath selectors, filter-based search, text search, regex search, and more.
94
  - 🔍 **Find Similar Elements**: Automatically locate elements similar to found elements.
95
- - 🤖 **MCP Server to be used with AI**: Built-in MCP server for AI-assisted Web Scraping and data extraction. The MCP server features custom, powerful capabilities that utilize Scrapling to extract targeted content before passing it to the AI (Claude/Cursor/etc), thereby speeding up operations and reducing costs by minimizing token usage.
96
 
97
  ### High-Performance & battle-tested Architecture
98
  - 🚀 **Lightning Fast**: Optimized performance outperforming most Python scraping libraries.
@@ -134,7 +134,7 @@ quotes = page.css('.quote .text::text')
134
 
135
  # Advanced stealth mode (Keep the browser open until you finish)
136
  with StealthySession(headless=True, solve_cloudflare=True) as session:
137
- page = session.fetch('https://nopecha.com/demo/cloudflare')
138
  data = page.css('#padded_content a')
139
 
140
  # Or use one-off request style, it opens the browser for this request, then closes it after finishing
@@ -143,7 +143,7 @@ data = page.css('#padded_content a')
143
 
144
  # Full browser automation (Keep the browser open until you finish)
145
  with DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:
146
- page = session.fetch('https://quotes.toscrape.com/')
147
  data = page.xpath('//span[@class="text"]/text()') # XPath selector if you prefer it
148
 
149
  # Or use one-off request style, it opens the browser for this request, then closes it after finishing
@@ -187,7 +187,7 @@ from scrapling.parser import Selector
187
 
188
  page = Selector("<html>...</html>")
189
  ```
190
- And it works exactly the same way!
191
 
192
  ### Async Session Management Examples
193
  ```python
@@ -271,29 +271,33 @@ Scrapling requires Python 3.10 or higher:
271
  pip install scrapling
272
  ```
273
 
274
- #### Fetchers Setup
275
-
276
- If you are going to use any of the fetchers or their classes, then install browser dependencies with
277
- ```bash
278
- scrapling install
279
- ```
280
-
281
- This downloads all browsers with their system dependencies and fingerprint manipulation dependencies.
282
 
283
  ### Optional Dependencies
284
 
285
- - Install the MCP server feature:
286
- ```bash
287
- pip install "scrapling[ai]"
288
- ```
289
- - Install shell features (Web Scraping shell and the `extract` command):
290
- ```bash
291
- pip install "scrapling[shell]"
292
- ```
293
- - Install everything:
294
- ```bash
295
- pip install "scrapling[all]"
296
- ```
 
 
 
 
 
 
 
 
 
 
 
297
 
298
  ## Contributing
299
 
 
92
  - 🔄 **Smart Element Tracking**: Relocate elements after website changes using intelligent similarity algorithms.
93
  - 🎯 **Smart Flexible Selection**: CSS selectors, XPath selectors, filter-based search, text search, regex search, and more.
94
  - 🔍 **Find Similar Elements**: Automatically locate elements similar to found elements.
95
+ - 🤖 **MCP Server to be used with AI**: Built-in MCP server for AI-assisted Web Scraping and data extraction. The MCP server features custom, powerful capabilities that utilize Scrapling to extract targeted content before passing it to the AI (Claude/Cursor/etc), thereby speeding up operations and reducing costs by minimizing token usage. ([demo video](https://www.youtube.com/watch?v=qyFk3ZNwOxE))
96
 
97
  ### High-Performance & battle-tested Architecture
98
  - 🚀 **Lightning Fast**: Optimized performance outperforming most Python scraping libraries.
 
134
 
135
  # Advanced stealth mode (Keep the browser open until you finish)
136
  with StealthySession(headless=True, solve_cloudflare=True) as session:
137
+ page = session.fetch('https://nopecha.com/demo/cloudflare', google_search=False)
138
  data = page.css('#padded_content a')
139
 
140
  # Or use one-off request style, it opens the browser for this request, then closes it after finishing
 
143
 
144
  # Full browser automation (Keep the browser open until you finish)
145
  with DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:
146
+ page = session.fetch('https://quotes.toscrape.com/', load_dom=False)
147
  data = page.xpath('//span[@class="text"]/text()') # XPath selector if you prefer it
148
 
149
  # Or use one-off request style, it opens the browser for this request, then closes it after finishing
 
187
 
188
  page = Selector("<html>...</html>")
189
  ```
190
+ And it works precisely the same way!
191
 
192
  ### Async Session Management Examples
193
  ```python
 
271
  pip install scrapling
272
  ```
273
 
274
+ Starting with v0.3.2, this installation only includes the parser engine and its dependencies, without any fetchers.
 
 
 
 
 
 
 
275
 
276
  ### Optional Dependencies
277
 
278
+ 1. If you are going to use any of the extra features below, the fetchers, or their classes, then you need to install fetchers' dependencies, and then install their browser dependencies with
279
+ ```bash
280
+ pip install "scrapling[fetchers]"
281
+
282
+ scrapling install
283
+ ```
284
+
285
+ This downloads all browsers with their system dependencies and fingerprint manipulation dependencies.
286
+
287
+ 2. Extra features:
288
+ - Install the MCP server feature:
289
+ ```bash
290
+ pip install "scrapling[ai]"
291
+ ```
292
+ - Install shell features (Web Scraping shell and the `extract` command):
293
+ ```bash
294
+ pip install "scrapling[shell]"
295
+ ```
296
+ - Install everything:
297
+ ```bash
298
+ pip install "scrapling[all]"
299
+ ```
300
+ Don't forget that you need to install the browser dependencies with `scrapling install` after any of these extras (if you didn't already)
301
 
302
  ## Contributing
303
 
docs/fetching/dynamic.md CHANGED
@@ -62,7 +62,7 @@ DynamicFetcher.fetch('https://example.com', cdp_url='ws://localhost:9222')
62
  Instead of launching a browser locally (Chromium/Google Chrome), you can connect to a remote browser through the [Chrome DevTools Protocol](https://chromedevtools.github.io/devtools-protocol/).
63
 
64
  ## Full list of arguments
65
- Scrapling provides many options with this fetcher. To make it as simple as possible, we will list the options here and give examples of using most of them.
66
 
67
  | Argument | Description | Optional |
68
  |:-------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------:|
@@ -90,6 +90,9 @@ Scrapling provides many options with this fetcher. To make it as simple as possi
90
  | cdp_url | Instead of launching a new browser instance, connect to this CDP URL to control real browsers through CDP. | ✔️ |
91
  | selector_config | A dictionary of custom parsing arguments to be used when creating the final `Selector`/`Response` class. | ✔️ |
92
 
 
 
 
93
  ## Examples
94
  It's easier to understand with examples, so let's take a look.
95
 
 
62
  Instead of launching a browser locally (Chromium/Google Chrome), you can connect to a remote browser through the [Chrome DevTools Protocol](https://chromedevtools.github.io/devtools-protocol/).
63
 
64
  ## Full list of arguments
65
+ Scrapling provides many options with this fetcher and its session classes. To make it as simple as possible, we will list the options here and give examples of using most of them.
66
 
67
  | Argument | Description | Optional |
68
  |:-------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------:|
 
90
  | cdp_url | Instead of launching a new browser instance, connect to this CDP URL to control real browsers through CDP. | ✔️ |
91
  | selector_config | A dictionary of custom parsing arguments to be used when creating the final `Selector`/`Response` class. | ✔️ |
92
 
93
+ In the session classes, all these arguments can be set for the session globally. Still, you can configure each request individually by passing some of the arguments here that can be configured on the browser tab level like: `google_search`, `timeout`, `wait`, `page_action`, `extra_headers`, `disable_resources`, `wait_selector`, `wait_selector_state`, `network_idle`, `load_dom`, and `selector_config`.
94
+
95
+
96
  ## Examples
97
  It's easier to understand with examples, so let's take a look.
98
 
docs/fetching/stealthy.md CHANGED
@@ -15,7 +15,7 @@ Check out how to configure the parsing options [here](choosing.md#parser-configu
15
  > Note: The async version of the `fetch` method is the `async_fetch` method, of course.
16
 
17
  ## Full list of arguments
18
- Before jumping to [examples](#examples), here's the full list of arguments
19
 
20
 
21
  | Argument | Description | Optional |
@@ -47,6 +47,7 @@ Before jumping to [examples](#examples), here's the full list of arguments
47
  | additional_args | Additional arguments to be passed to Camoufox as additional settings, and they take higher priority than Scrapling's settings. | ✔️ |
48
  | selector_config | A dictionary of custom parsing arguments to be used when creating the final `Selector`/`Response` class. | ✔️ |
49
 
 
50
 
51
  ## Examples
52
  It's easier to understand with examples, so we will now review most of the arguments individually with examples.
 
15
  > Note: The async version of the `fetch` method is the `async_fetch` method, of course.
16
 
17
  ## Full list of arguments
18
+ Scrapling provides many options with this fetcher and its session classes. Before jumping to the [examples](#examples), here's the full list of arguments
19
 
20
 
21
  | Argument | Description | Optional |
 
47
  | additional_args | Additional arguments to be passed to Camoufox as additional settings, and they take higher priority than Scrapling's settings. | ✔️ |
48
  | selector_config | A dictionary of custom parsing arguments to be used when creating the final `Selector`/`Response` class. | ✔️ |
49
 
50
+ In the session classes, all these arguments can be set for the session globally. Still, you can configure each request individually by passing some of the arguments here that can be configured on the browser tab level like: `google_search`, `timeout`, `wait`, `page_action`, `extra_headers`, `disable_resources`, `wait_selector`, `wait_selector_state`, `network_idle`, `load_dom`, `solve_cloudflare`, and `selector_config`.
51
 
52
  ## Examples
53
  It's easier to understand with examples, so we will now review most of the arguments individually with examples.
docs/index.md CHANGED
@@ -114,29 +114,33 @@ Scrapling requires Python 3.10 or higher:
114
  pip install scrapling
115
  ```
116
 
117
- #### Fetchers Setup
118
-
119
- If you are going to use any of the fetchers or their session classes, then install browser dependencies with
120
- ```bash
121
- scrapling install
122
- ```
123
-
124
- This downloads all browsers with their system dependencies and fingerprint manipulation dependencies.
125
 
126
  ### Optional Dependencies
127
 
128
- - Install the MCP server feature:
129
- ```bash
130
- pip install "scrapling[ai]"
131
- ```
132
- - Install shell features (Web Scraping shell and the `extract` command):
133
- ```bash
134
- pip install "scrapling[shell]"
135
- ```
136
- - Install everything:
137
- ```bash
138
- pip install "scrapling[all]"
139
- ```
 
 
 
 
 
 
 
 
 
 
 
140
 
141
  ## How the documentation is organized
142
  Scrapling has a lot of documentation, so we try to follow a guideline called the [Diátaxis documentation framework](https://diataxis.fr/).
 
114
  pip install scrapling
115
  ```
116
 
117
+ Starting with v0.3.2, this installation only includes the parser engine and its dependencies, without any fetchers.
 
 
 
 
 
 
 
118
 
119
  ### Optional Dependencies
120
 
121
+ 1. If you are going to use any of the extra features below, the fetchers, or their classes, then you need to install fetchers' dependencies, and then install their browser dependencies with
122
+ ```bash
123
+ pip install "scrapling[fetchers]"
124
+
125
+ scrapling install
126
+ ```
127
+
128
+ This downloads all browsers with their system dependencies and fingerprint manipulation dependencies.
129
+
130
+ 2. Extra features:
131
+ - Install the MCP server feature:
132
+ ```bash
133
+ pip install "scrapling[ai]"
134
+ ```
135
+ - Install shell features (Web Scraping shell and the `extract` command):
136
+ ```bash
137
+ pip install "scrapling[shell]"
138
+ ```
139
+ - Install everything:
140
+ ```bash
141
+ pip install "scrapling[all]"
142
+ ```
143
+ Don't forget that you need to install the browser dependencies with `scrapling install` after any of these extras (if you didn't already)
144
 
145
  ## How the documentation is organized
146
  Scrapling has a lot of documentation, so we try to follow a guideline called the [Diátaxis documentation framework](https://diataxis.fr/).