Spaces:

lenson78
/

Scrapling

Paused

App Files Files Community

Karim shoair commited on Jan 1

Commit

4e09bfe

1 Parent(s): 5ba380b

docs: update dev articles

Browse files

Files changed (2) hide show

docs/development/adaptive_storage_system.md +3 -3
docs/development/scrapling_custom_types.md +5 -5

docs/development/adaptive_storage_system.md CHANGED Viewed

@@ -1,6 +1,6 @@
-Scrapling uses SQLite by default, but this tutorial covers writing your storage system to store element properties there for `adaptive` feature.
-You might want to use FireBase, for example, and share the database between multiple spiders on different machines. It's a great idea to use an online database like that because the spiders will share with each other.
 So first, to make your storage class work, it must do the big 3:
@@ -8,7 +8,7 @@ So first, to make your storage class work, it must do the big 3:
 2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
 3. Implement methods `save` and `retrieve`, as you see from the type hints:
     - The method `save` returns nothing and will get two arguments from the library
-        * The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the function `element_to_dict` in submodule `scrapling.core.utils._StorageTools` to keep the same format and save it to your database as you wish.
         * The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the `adaptive` data will be messed up.
     - The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.

+Scrapling uses SQLite by default, but this tutorial shows how to write your own storage system to store element properties for the `adaptive` feature.
+You might want to use Firebase, for example, and share the database between multiple spiders on different machines. It's a great idea to use an online database like that because spiders can share adaptive data with each other.
 So first, to make your storage class work, it must do the big 3:
 2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
 3. Implement methods `save` and `retrieve`, as you see from the type hints:
     - The method `save` returns nothing and will get two arguments from the library
+        * The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the `element_to_dict` function in the submodule `scrapling.core.utils._StorageTools` to maintain the same format, and then saved to your database as you wish.
         * The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the `adaptive` data will be messed up.
     - The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.

docs/development/scrapling_custom_types.md CHANGED Viewed

@@ -1,6 +1,6 @@
 > You can take advantage of the custom-made types for Scrapling and use them outside the library if you want. It's better than copying their code, after all :)
-### All current types can be imported alone like below
 ```python
 >>> from scrapling.core.custom_types import TextHandler, AttributesHandler
@@ -11,11 +11,11 @@
 >>> somedict_2 = AttributesHandler(a=1)
 ```
-Note that `TextHandler` is a subclass of Python's `str`, so all normal operations/methods that work with Python strings will work.
-If you want to check for the type in your code, it's better to depend on Python's built-in function `issubclass`.
 The class `AttributesHandler` is a subclass of `collections.abc.Mapping`, so it's immutable (read-only), and all operations are inherited from it. The data passed can be accessed later through the `_data` property, but be careful; it's of type `types.MappingProxyType`, so it's immutable (read-only) as well (faster than `collections.abc.Mapping` by fractions of seconds).
-So, to make it simple for you if you are new to Python, the same operations and methods from the Python standard `dict` type will all work with class `AttributesHandler` except the ones that try to modify the actual data.
-If you want to modify the data inside `AttributesHandler,` you have to convert it to a dictionary first, like using the `dict` function, and then modify it outside.

 > You can take advantage of the custom-made types for Scrapling and use them outside the library if you want. It's better than copying their code, after all :)
+### All current types can be imported alone, like below
 ```python
 >>> from scrapling.core.custom_types import TextHandler, AttributesHandler
 >>> somedict_2 = AttributesHandler(a=1)
 ```
+Note that `TextHandler` is a subclass of Python's `str`, so all standard operations/methods that work with Python strings will work.
+If you want to check the type in your code, it's better to use Python's built-in `issubclass` function.
 The class `AttributesHandler` is a subclass of `collections.abc.Mapping`, so it's immutable (read-only), and all operations are inherited from it. The data passed can be accessed later through the `_data` property, but be careful; it's of type `types.MappingProxyType`, so it's immutable (read-only) as well (faster than `collections.abc.Mapping` by fractions of seconds).
+So, to make it simple for you, if you are new to Python, the same operations and methods from the Python standard `dict` type will all work with the class `AttributesHandler` except for the ones that try to modify the actual data.
+If you want to modify the data inside `AttributesHandler`, you have to convert it to a dictionary first, e.g., using the `dict` function, and then change it outside.