Karim shoair commited on
Commit
7b427e7
·
1 Parent(s): 4893321

docs: update storage dev tutorial

Browse files
docs/development/{automatch_storage_system.md → adaptive_storage_system.md} RENAMED
@@ -1,22 +1,22 @@
1
- Scrapling uses SQLite by default, but this tutorial covers writing your storage system to store element properties there for auto-matching.
2
 
3
  You might want to use FireBase, for example, and share the database between multiple spiders on different machines. It's a great idea to use an online database like that because the spiders will share with each other.
4
 
5
  So first, to make your storage class work, it must do the big 3:
6
 
7
- 1. Inherit from the abstract class `scrapling.core.storage_adaptors.StorageSystemMixin` and accept a string argument, which will be the `url` argument to maintain the library logic.
8
  2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
9
  3. Implement methods `save` and `retrieve`, as you see from the type hints:
10
  - The method `save` returns nothing and will get two arguments from the library
11
  * The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the function `element_to_dict` in submodule `scrapling.core.utils._StorageTools` to keep the same format and save it to your database as you wish.
12
- * The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the auto-match will be messed up.
13
  - The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.
14
 
15
- > If the instructions weren't clear enough for you, you can check my implementation using SQLite3 in [storage_adaptors](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/core/storage_adaptors.py) file
16
 
17
- If your class meets these criteria, the rest is easy. If you plan to use the library in a threaded application, ensure your class supports it. The default used class is thread-safe.
18
 
19
- Some helper functions are added to the abstract class if you want to use them. It's easier to see it for yourself in the [code](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/core/storage_adaptors.py); it's heavily commented :)
20
 
21
 
22
  ## Real-World Example: Redis Storage
@@ -27,7 +27,7 @@ Here's a more practical example generated by AI using Redis:
27
  import redis
28
  import orjson
29
  from functools import lru_cache
30
- from scrapling.core.storage_adaptors import StorageSystemMixin
31
  from scrapling.core.utils import _StorageTools
32
 
33
  @lru_cache(None)
 
1
+ Scrapling uses SQLite by default, but this tutorial covers writing your storage system to store element properties there for `adaptive` feature.
2
 
3
  You might want to use FireBase, for example, and share the database between multiple spiders on different machines. It's a great idea to use an online database like that because the spiders will share with each other.
4
 
5
  So first, to make your storage class work, it must do the big 3:
6
 
7
+ 1. Inherit from the abstract class `scrapling.core.storage.StorageSystemMixin` and accept a string argument, which will be the `url` argument to maintain the library logic.
8
  2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
9
  3. Implement methods `save` and `retrieve`, as you see from the type hints:
10
  - The method `save` returns nothing and will get two arguments from the library
11
  * The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the function `element_to_dict` in submodule `scrapling.core.utils._StorageTools` to keep the same format and save it to your database as you wish.
12
+ * The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the `adaptive` data will be messed up.
13
  - The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.
14
 
15
+ > If the instructions weren't clear enough for you, you can check my implementation using SQLite3 in [storage_adaptors](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/core/storage.py) file
16
 
17
+ If your class meets these criteria, the rest is straightforward. If you plan to use the library in a threaded application, ensure your class supports it. The default used class is thread-safe.
18
 
19
+ Some helper functions are added to the abstract class if you want to use them. It's easier to see it for yourself in the [code](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/core/storage.py); it's heavily commented :)
20
 
21
 
22
  ## Real-World Example: Redis Storage
 
27
  import redis
28
  import orjson
29
  from functools import lru_cache
30
+ from scrapling.core.storage import StorageSystemMixin
31
  from scrapling.core.utils import _StorageTools
32
 
33
  @lru_cache(None)