Karim shoair commited on
Commit ·
4e09bfe
1
Parent(s): 5ba380b
docs: update dev articles
Browse files
docs/development/adaptive_storage_system.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
Scrapling uses SQLite by default, but this tutorial
|
| 2 |
|
| 3 |
-
You might want to use
|
| 4 |
|
| 5 |
So first, to make your storage class work, it must do the big 3:
|
| 6 |
|
|
@@ -8,7 +8,7 @@ So first, to make your storage class work, it must do the big 3:
|
|
| 8 |
2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
|
| 9 |
3. Implement methods `save` and `retrieve`, as you see from the type hints:
|
| 10 |
- The method `save` returns nothing and will get two arguments from the library
|
| 11 |
-
* The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the
|
| 12 |
* The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the `adaptive` data will be messed up.
|
| 13 |
- The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.
|
| 14 |
|
|
|
|
| 1 |
+
Scrapling uses SQLite by default, but this tutorial shows how to write your own storage system to store element properties for the `adaptive` feature.
|
| 2 |
|
| 3 |
+
You might want to use Firebase, for example, and share the database between multiple spiders on different machines. It's a great idea to use an online database like that because spiders can share adaptive data with each other.
|
| 4 |
|
| 5 |
So first, to make your storage class work, it must do the big 3:
|
| 6 |
|
|
|
|
| 8 |
2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
|
| 9 |
3. Implement methods `save` and `retrieve`, as you see from the type hints:
|
| 10 |
- The method `save` returns nothing and will get two arguments from the library
|
| 11 |
+
* The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the `element_to_dict` function in the submodule `scrapling.core.utils._StorageTools` to maintain the same format, and then saved to your database as you wish.
|
| 12 |
* The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the `adaptive` data will be messed up.
|
| 13 |
- The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.
|
| 14 |
|
docs/development/scrapling_custom_types.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
> You can take advantage of the custom-made types for Scrapling and use them outside the library if you want. It's better than copying their code, after all :)
|
| 2 |
|
| 3 |
-
### All current types can be imported alone like below
|
| 4 |
```python
|
| 5 |
>>> from scrapling.core.custom_types import TextHandler, AttributesHandler
|
| 6 |
|
|
@@ -11,11 +11,11 @@
|
|
| 11 |
>>> somedict_2 = AttributesHandler(a=1)
|
| 12 |
```
|
| 13 |
|
| 14 |
-
Note that `TextHandler` is a subclass of Python's `str`, so all
|
| 15 |
-
If you want to check
|
| 16 |
|
| 17 |
The class `AttributesHandler` is a subclass of `collections.abc.Mapping`, so it's immutable (read-only), and all operations are inherited from it. The data passed can be accessed later through the `_data` property, but be careful; it's of type `types.MappingProxyType`, so it's immutable (read-only) as well (faster than `collections.abc.Mapping` by fractions of seconds).
|
| 18 |
|
| 19 |
-
So, to make it simple for you if you are new to Python, the same operations and methods from the Python standard `dict` type will all work with class `AttributesHandler` except the ones that try to modify the actual data.
|
| 20 |
|
| 21 |
-
If you want to modify the data inside `AttributesHandler
|
|
|
|
| 1 |
> You can take advantage of the custom-made types for Scrapling and use them outside the library if you want. It's better than copying their code, after all :)
|
| 2 |
|
| 3 |
+
### All current types can be imported alone, like below
|
| 4 |
```python
|
| 5 |
>>> from scrapling.core.custom_types import TextHandler, AttributesHandler
|
| 6 |
|
|
|
|
| 11 |
>>> somedict_2 = AttributesHandler(a=1)
|
| 12 |
```
|
| 13 |
|
| 14 |
+
Note that `TextHandler` is a subclass of Python's `str`, so all standard operations/methods that work with Python strings will work.
|
| 15 |
+
If you want to check the type in your code, it's better to use Python's built-in `issubclass` function.
|
| 16 |
|
| 17 |
The class `AttributesHandler` is a subclass of `collections.abc.Mapping`, so it's immutable (read-only), and all operations are inherited from it. The data passed can be accessed later through the `_data` property, but be careful; it's of type `types.MappingProxyType`, so it's immutable (read-only) as well (faster than `collections.abc.Mapping` by fractions of seconds).
|
| 18 |
|
| 19 |
+
So, to make it simple for you, if you are new to Python, the same operations and methods from the Python standard `dict` type will all work with the class `AttributesHandler` except for the ones that try to modify the actual data.
|
| 20 |
|
| 21 |
+
If you want to modify the data inside `AttributesHandler`, you have to convert it to a dictionary first, e.g., using the `dict` function, and then change it outside.
|