Karim shoair commited on
Commit
8d5cc87
·
1 Parent(s): 92b1671

docs: adding more data to the `adaptive` feature page

Browse files
Files changed (1) hide show
  1. docs/parsing/adaptive.md +5 -3
docs/parsing/adaptive.md CHANGED
@@ -50,7 +50,7 @@ When website owners implement structural changes like
50
  ```
51
  The selector will no longer function, and your code needs maintenance. That's where Scrapling's `adaptive` feature comes into play.
52
 
53
- With Scrapling, you can enable the `adaptive` feature the first time you select an element, and the next time you select that element and it doesn't exist, Scrapling will remember its properties and search on the website for the element with the highest percentage of similarity to that element and without AI :)
54
 
55
  ```python
56
  from scrapling import Selector, Fetcher
@@ -100,6 +100,8 @@ The code will be the same in a real-world scenario, except it will use the same
100
 
101
  Hence, in the two examples above, I used both the `Selector` and `Fetcher` classes to show that the adaptive logic is the same.
102
 
 
 
103
  ## How the adaptive scraping feature works
104
  Adaptive scraping works in two phases:
105
 
@@ -113,7 +115,7 @@ With as few technical details as possible, the general logic goes as follows:
113
  1. You tell Scrapling to save that element's unique properties in one of the ways we will show below.
114
  2. Scrapling uses its configured database (SQLite by default) and saves each element's unique properties.
115
  3. Now, because everything about the element can be changed or removed by the website's owner(s), nothing from the element can be used as a unique identifier for the database. To solve this issue, I made the storage system rely on two things:
116
- 1. The domain of the current website. If you are using the `Selector` class, pass it when initializing the class; if you are using one of the fetchers, the domain will be taken from the URL automatically.
117
  2. An `identifier` to query that element's properties from the database. You don't always have to set the identifier yourself; we'll discuss this later.
118
 
119
  Together, they will later be used to retrieve the element's unique properties from the database.
@@ -148,7 +150,7 @@ If you are using the [Selector](main_classes.md#selector) class, you need to pas
148
 
149
  If you didn't pass a URL, the word `default` will be used in place of the URL field while saving the element's unique properties. So, this will only be an issue if you use the same identifier later for a different website and don't pass the URL parameter when initializing it. The save process overwrites previous data, and the `adaptive` feature uses only the latest saved properties.
150
 
151
- Besides those arguments, we have `storage` and `storage_args`. Both are for the class to connect to the database; by default, it's set to the SQLite class the library uses. Those arguments shouldn't matter unless you want to write your own storage system, which we will cover on a [separate page in the development section](../development/adaptive_storage_system.md).
152
 
153
  Now that you've enabled the `adaptive` feature globally, you have two main ways to use it.
154
 
 
50
  ```
51
  The selector will no longer function, and your code needs maintenance. That's where Scrapling's `adaptive` feature comes into play.
52
 
53
+ With Scrapling, you can enable the `adaptive` feature the first time you select an element, and the next time you select that element and it doesn't exist, Scrapling will remember its properties and search on the website for the element with the highest percentage of similarity to that element, and without AI :)
54
 
55
  ```python
56
  from scrapling import Selector, Fetcher
 
100
 
101
  Hence, in the two examples above, I used both the `Selector` and `Fetcher` classes to show that the adaptive logic is the same.
102
 
103
+ > Note: the main reason for creating the `adaptive_domain` argument was to handle if the website changed its URL while changing the design/structure. In that case, you can use it to continue using the previously stored adaptive data for the new URL. Otherwise, scrapling will consider it a new website and discard the old data.
104
+
105
  ## How the adaptive scraping feature works
106
  Adaptive scraping works in two phases:
107
 
 
115
  1. You tell Scrapling to save that element's unique properties in one of the ways we will show below.
116
  2. Scrapling uses its configured database (SQLite by default) and saves each element's unique properties.
117
  3. Now, because everything about the element can be changed or removed by the website's owner(s), nothing from the element can be used as a unique identifier for the database. To solve this issue, I made the storage system rely on two things:
118
+ 1. The domain of the current website. If you are using the `Selector` class, pass it when initializing; if you are using a fetcher, the domain will be automatically taken from the URL.
119
  2. An `identifier` to query that element's properties from the database. You don't always have to set the identifier yourself; we'll discuss this later.
120
 
121
  Together, they will later be used to retrieve the element's unique properties from the database.
 
150
 
151
  If you didn't pass a URL, the word `default` will be used in place of the URL field while saving the element's unique properties. So, this will only be an issue if you use the same identifier later for a different website and don't pass the URL parameter when initializing it. The save process overwrites previous data, and the `adaptive` feature uses only the latest saved properties.
152
 
153
+ Besides those arguments, we have `storage` and `storage_args`. Both are for the class to connect to the database; by default, it uses the SQLite class provided by the library. Those arguments shouldn't matter unless you want to write your own storage system, which we will cover on a [separate page in the development section](../development/adaptive_storage_system.md).
154
 
155
  Now that you've enabled the `adaptive` feature globally, you have two main ways to use it.
156