Karim shoair commited on
Commit
4e09bfe
·
1 Parent(s): 5ba380b

docs: update dev articles

Browse files
docs/development/adaptive_storage_system.md CHANGED
@@ -1,6 +1,6 @@
1
- Scrapling uses SQLite by default, but this tutorial covers writing your storage system to store element properties there for `adaptive` feature.
2
 
3
- You might want to use FireBase, for example, and share the database between multiple spiders on different machines. It's a great idea to use an online database like that because the spiders will share with each other.
4
 
5
  So first, to make your storage class work, it must do the big 3:
6
 
@@ -8,7 +8,7 @@ So first, to make your storage class work, it must do the big 3:
8
  2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
9
  3. Implement methods `save` and `retrieve`, as you see from the type hints:
10
  - The method `save` returns nothing and will get two arguments from the library
11
- * The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the function `element_to_dict` in submodule `scrapling.core.utils._StorageTools` to keep the same format and save it to your database as you wish.
12
  * The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the `adaptive` data will be messed up.
13
  - The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.
14
 
 
1
+ Scrapling uses SQLite by default, but this tutorial shows how to write your own storage system to store element properties for the `adaptive` feature.
2
 
3
+ You might want to use Firebase, for example, and share the database between multiple spiders on different machines. It's a great idea to use an online database like that because spiders can share adaptive data with each other.
4
 
5
  So first, to make your storage class work, it must do the big 3:
6
 
 
8
  2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
9
  3. Implement methods `save` and `retrieve`, as you see from the type hints:
10
  - The method `save` returns nothing and will get two arguments from the library
11
+ * The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the `element_to_dict` function in the submodule `scrapling.core.utils._StorageTools` to maintain the same format, and then saved to your database as you wish.
12
  * The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the `adaptive` data will be messed up.
13
  - The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.
14
 
docs/development/scrapling_custom_types.md CHANGED
@@ -1,6 +1,6 @@
1
  > You can take advantage of the custom-made types for Scrapling and use them outside the library if you want. It's better than copying their code, after all :)
2
 
3
- ### All current types can be imported alone like below
4
  ```python
5
  >>> from scrapling.core.custom_types import TextHandler, AttributesHandler
6
 
@@ -11,11 +11,11 @@
11
  >>> somedict_2 = AttributesHandler(a=1)
12
  ```
13
 
14
- Note that `TextHandler` is a subclass of Python's `str`, so all normal operations/methods that work with Python strings will work.
15
- If you want to check for the type in your code, it's better to depend on Python's built-in function `issubclass`.
16
 
17
  The class `AttributesHandler` is a subclass of `collections.abc.Mapping`, so it's immutable (read-only), and all operations are inherited from it. The data passed can be accessed later through the `_data` property, but be careful; it's of type `types.MappingProxyType`, so it's immutable (read-only) as well (faster than `collections.abc.Mapping` by fractions of seconds).
18
 
19
- So, to make it simple for you if you are new to Python, the same operations and methods from the Python standard `dict` type will all work with class `AttributesHandler` except the ones that try to modify the actual data.
20
 
21
- If you want to modify the data inside `AttributesHandler,` you have to convert it to a dictionary first, like using the `dict` function, and then modify it outside.
 
1
  > You can take advantage of the custom-made types for Scrapling and use them outside the library if you want. It's better than copying their code, after all :)
2
 
3
+ ### All current types can be imported alone, like below
4
  ```python
5
  >>> from scrapling.core.custom_types import TextHandler, AttributesHandler
6
 
 
11
  >>> somedict_2 = AttributesHandler(a=1)
12
  ```
13
 
14
+ Note that `TextHandler` is a subclass of Python's `str`, so all standard operations/methods that work with Python strings will work.
15
+ If you want to check the type in your code, it's better to use Python's built-in `issubclass` function.
16
 
17
  The class `AttributesHandler` is a subclass of `collections.abc.Mapping`, so it's immutable (read-only), and all operations are inherited from it. The data passed can be accessed later through the `_data` property, but be careful; it's of type `types.MappingProxyType`, so it's immutable (read-only) as well (faster than `collections.abc.Mapping` by fractions of seconds).
18
 
19
+ So, to make it simple for you, if you are new to Python, the same operations and methods from the Python standard `dict` type will all work with the class `AttributesHandler` except for the ones that try to modify the actual data.
20
 
21
+ If you want to modify the data inside `AttributesHandler`, you have to convert it to a dictionary first, e.g., using the `dict` function, and then change it outside.