Karim shoair commited on
Commit ·
1651a74
1
Parent(s): 960e783
docs: updating `adaptive` page and some corrections
Browse files- docs/parsing/adaptive.md +9 -10
docs/parsing/adaptive.md
CHANGED
|
@@ -1,10 +1,9 @@
|
|
| 1 |
## Introduction
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
> <br><br>
|
| 8 |
|
| 9 |
Adaptive scraping (previously known as automatch) is one of Scrapling's most powerful features. It allows your scraper to survive website changes by intelligently tracking and relocating elements.
|
| 10 |
|
|
@@ -84,11 +83,11 @@ Now, let's test the same selector in both versions
|
|
| 84 |
>> Fetcher.configure(adaptive = True, adaptive_domain='stackoverflow.com')
|
| 85 |
>>
|
| 86 |
>> page = Fetcher.get(old_url, timeout=30)
|
| 87 |
-
>> element1 = page.
|
| 88 |
>>
|
| 89 |
>> # Same selector but used in the updated website
|
| 90 |
>> page = Fetcher.get(new_url)
|
| 91 |
-
>> element2 = page.
|
| 92 |
>>
|
| 93 |
>> if element1.text == element2.text:
|
| 94 |
... print('Scrapling found the same element in the old and new designs!')
|
|
@@ -157,7 +156,7 @@ Now that you've enabled the `adaptive` feature globally, you have two main ways
|
|
| 157 |
### The CSS/XPath Selection way
|
| 158 |
As you have seen in the example above, first, you have to use the `auto_save` argument while selecting an element that exists on the page, like below
|
| 159 |
```python
|
| 160 |
-
element = page.css('#p1' auto_save=True)
|
| 161 |
```
|
| 162 |
And when the element doesn't exist, you can use the same selector and the `adaptive` argument, and the library will find it for you
|
| 163 |
```python
|
|
@@ -165,7 +164,7 @@ element = page.css('#p1', adaptive=True)
|
|
| 165 |
```
|
| 166 |
Pretty simple, eh?
|
| 167 |
|
| 168 |
-
Well, a lot happened under the hood here. Remember the identifier we mentioned before that you need to set to retrieve the element you want? Here, with the `css`/`
|
| 169 |
|
| 170 |
Additionally, for all these methods, you can pass the `identifier` argument to set it yourself. This is useful in some instances, or you can use it to save properties with the `auto_save` argument.
|
| 171 |
|
|
@@ -185,7 +184,7 @@ Now, later, when you want to retrieve it and relocate it inside the page with `a
|
|
| 185 |
>>> element_dict = page.retrieve('my_special_element')
|
| 186 |
>>> page.relocate(element_dict, selector_type=True)
|
| 187 |
[<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>]
|
| 188 |
-
>>> page.relocate(element_dict, selector_type=True).css('::text')
|
| 189 |
['Tipping the Velvet']
|
| 190 |
```
|
| 191 |
Hence, the `retrieve` and `relocate` methods are used.
|
|
|
|
| 1 |
## Introduction
|
| 2 |
|
| 3 |
+
!!! success "Prerequisites"
|
| 4 |
+
|
| 5 |
+
1. You've completed or read the [Querying elements](../parsing/selection.md) page to understand how to find/extract elements from the [Selector](../parsing/main_classes.md#selector) object.
|
| 6 |
+
2. You've completed or read the [Main classes](../parsing/main_classes.md) page to understand the [Selector](../parsing/main_classes.md#selector) class.
|
|
|
|
| 7 |
|
| 8 |
Adaptive scraping (previously known as automatch) is one of Scrapling's most powerful features. It allows your scraper to survive website changes by intelligently tracking and relocating elements.
|
| 9 |
|
|
|
|
| 83 |
>> Fetcher.configure(adaptive = True, adaptive_domain='stackoverflow.com')
|
| 84 |
>>
|
| 85 |
>> page = Fetcher.get(old_url, timeout=30)
|
| 86 |
+
>> element1 = page.css(selector, auto_save=True)[0]
|
| 87 |
>>
|
| 88 |
>> # Same selector but used in the updated website
|
| 89 |
>> page = Fetcher.get(new_url)
|
| 90 |
+
>> element2 = page.css(selector, adaptive=True)[0]
|
| 91 |
>>
|
| 92 |
>> if element1.text == element2.text:
|
| 93 |
... print('Scrapling found the same element in the old and new designs!')
|
|
|
|
| 156 |
### The CSS/XPath Selection way
|
| 157 |
As you have seen in the example above, first, you have to use the `auto_save` argument while selecting an element that exists on the page, like below
|
| 158 |
```python
|
| 159 |
+
element = page.css('#p1', auto_save=True)
|
| 160 |
```
|
| 161 |
And when the element doesn't exist, you can use the same selector and the `adaptive` argument, and the library will find it for you
|
| 162 |
```python
|
|
|
|
| 164 |
```
|
| 165 |
Pretty simple, eh?
|
| 166 |
|
| 167 |
+
Well, a lot happened under the hood here. Remember the identifier we mentioned before that you need to set to retrieve the element you want? Here, with the `css`/`xpath` methods, the identifier is set automatically as the selector you passed here to make things easier :)
|
| 168 |
|
| 169 |
Additionally, for all these methods, you can pass the `identifier` argument to set it yourself. This is useful in some instances, or you can use it to save properties with the `auto_save` argument.
|
| 170 |
|
|
|
|
| 184 |
>>> element_dict = page.retrieve('my_special_element')
|
| 185 |
>>> page.relocate(element_dict, selector_type=True)
|
| 186 |
[<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>]
|
| 187 |
+
>>> page.relocate(element_dict, selector_type=True).css('::text').getall()
|
| 188 |
['Tipping the Velvet']
|
| 189 |
```
|
| 190 |
Hence, the `retrieve` and `relocate` methods are used.
|