Karim shoair commited on
Commit ·
d8893a8
1
Parent(s): 2784aa7
docs: Update docs accordingly and some adjustments
Browse files
docs/fetching/choosing.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
## Introduction
|
| 2 |
-
Fetchers are classes that can do requests or fetch pages for you easily in a single-line fashion with many features and then return a [Response](#response-object) object. Starting with v0.3, all fetchers have
|
| 3 |
|
| 4 |
-
This feature was introduced because, before v0.2, Scrapling was only a parsing engine
|
| 5 |
|
| 6 |
> Fetchers are not wrappers built on top of other libraries. However, they utilize these libraries as an engine to request/fetch pages easily for you, while fully leveraging that engine and adding features for you. Some fetchers don't even use the official library for requests; instead, they use their own custom version. For example, `StealthyFetcher` utilizes `Camoufox` browser directly, without relying on its Python library for anything except launch options. This last part might change soon as well.
|
| 7 |
|
|
@@ -38,13 +38,13 @@ Then you use it right away without initializing like this, and it will use the d
|
|
| 38 |
If you want to configure the parser ([Selector class](../parsing/main_classes.md#selector)) that will be used on the response before returning it for you, then do this first:
|
| 39 |
```python
|
| 40 |
>>> from scrapling.fetchers import Fetcher
|
| 41 |
-
>>> Fetcher.configure(adaptive=True, encoding="
|
| 42 |
```
|
| 43 |
or
|
| 44 |
```python
|
| 45 |
>>> from scrapling.fetchers import Fetcher
|
| 46 |
>>> Fetcher.adaptive=True
|
| 47 |
-
>>> Fetcher.encoding="
|
| 48 |
>>> Fetcher.keep_comments=False
|
| 49 |
>>> Fetcher.keep_cdata=False # and the rest
|
| 50 |
```
|
|
@@ -71,7 +71,8 @@ The `Response` object is the same as the [Selector](../parsing/main_classes.md#s
|
|
| 71 |
>>> page.headers # Response headers
|
| 72 |
>>> page.request_headers # Request headers
|
| 73 |
>>> page.history # Response history of redirections, if any
|
| 74 |
-
>>> page.body # Raw response body
|
|
|
|
| 75 |
>>> page.encoding # Response encoding
|
| 76 |
```
|
| 77 |
All fetchers return the `Response` object.
|
|
|
|
| 1 |
## Introduction
|
| 2 |
+
Fetchers are classes that can do requests or fetch pages for you easily in a single-line fashion with many features and then return a [Response](#response-object) object. Starting with v0.3, all fetchers have separate classes to keep the session running, so for example, a fetcher that uses a browser will keep the browser open till you finish all your requests through it instead of opening multiple browsers. So it depends on your use case.
|
| 3 |
|
| 4 |
+
This feature was introduced because, before v0.2, Scrapling was only a parsing engine. The target here is to gradually become the one-stop shop for all Web Scraping needs.
|
| 5 |
|
| 6 |
> Fetchers are not wrappers built on top of other libraries. However, they utilize these libraries as an engine to request/fetch pages easily for you, while fully leveraging that engine and adding features for you. Some fetchers don't even use the official library for requests; instead, they use their own custom version. For example, `StealthyFetcher` utilizes `Camoufox` browser directly, without relying on its Python library for anything except launch options. This last part might change soon as well.
|
| 7 |
|
|
|
|
| 38 |
If you want to configure the parser ([Selector class](../parsing/main_classes.md#selector)) that will be used on the response before returning it for you, then do this first:
|
| 39 |
```python
|
| 40 |
>>> from scrapling.fetchers import Fetcher
|
| 41 |
+
>>> Fetcher.configure(adaptive=True, encoding="utf-8", keep_comments=False, keep_cdata=False) # and the rest
|
| 42 |
```
|
| 43 |
or
|
| 44 |
```python
|
| 45 |
>>> from scrapling.fetchers import Fetcher
|
| 46 |
>>> Fetcher.adaptive=True
|
| 47 |
+
>>> Fetcher.encoding="utf-8"
|
| 48 |
>>> Fetcher.keep_comments=False
|
| 49 |
>>> Fetcher.keep_cdata=False # and the rest
|
| 50 |
```
|
|
|
|
| 71 |
>>> page.headers # Response headers
|
| 72 |
>>> page.request_headers # Request headers
|
| 73 |
>>> page.history # Response history of redirections, if any
|
| 74 |
+
>>> page.body # Raw HTML response body without any processing
|
| 75 |
+
>>> page.raw_response # Raw response of the last request made by the browser, if any (Useful for downloading binary files and text/json files)
|
| 76 |
>>> page.encoding # Response encoding
|
| 77 |
```
|
| 78 |
All fetchers return the `Response` object.
|