Spaces:

lenson78
/

Scrapling

Paused

App Files Files Community

Karim shoair commited on Aug 25, 2025

Commit

7b427e7

1 Parent(s): 4893321

docs: update storage dev tutorial

Browse files

Files changed (1) hide show

docs/development/{automatch_storage_system.md → adaptive_storage_system.md} +7 -7

docs/development/{automatch_storage_system.md → adaptive_storage_system.md} RENAMED Viewed

@@ -1,22 +1,22 @@
-Scrapling uses SQLite by default, but this tutorial covers writing your storage system to store element properties there for auto-matching.
 You might want to use FireBase, for example, and share the database between multiple spiders on different machines. It's a great idea to use an online database like that because the spiders will share with each other.
 So first, to make your storage class work, it must do the big 3:
-1. Inherit from the abstract class `scrapling.core.storage_adaptors.StorageSystemMixin` and accept a string argument, which will be the `url` argument to maintain the library logic.
 2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
 3. Implement methods `save` and `retrieve`, as you see from the type hints:
     - The method `save` returns nothing and will get two arguments from the library
         * The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the function `element_to_dict` in submodule `scrapling.core.utils._StorageTools` to keep the same format and save it to your database as you wish.
-        * The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the auto-match will be messed up.
     - The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.
-> If the instructions weren't clear enough for you, you can check my implementation using SQLite3 in [storage_adaptors](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/core/storage_adaptors.py) file
-If your class meets these criteria, the rest is easy. If you plan to use the library in a threaded application, ensure your class supports it. The default used class is thread-safe.
-Some helper functions are added to the abstract class if you want to use them. It's easier to see it for yourself in the [code](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/core/storage_adaptors.py); it's heavily commented :)
 ## Real-World Example: Redis Storage
@@ -27,7 +27,7 @@ Here's a more practical example generated by AI using Redis:
 import redis
 import orjson
 from functools import lru_cache
-from scrapling.core.storage_adaptors import StorageSystemMixin
 from scrapling.core.utils import _StorageTools
 @lru_cache(None)

+Scrapling uses SQLite by default, but this tutorial covers writing your storage system to store element properties there for `adaptive` feature.
 You might want to use FireBase, for example, and share the database between multiple spiders on different machines. It's a great idea to use an online database like that because the spiders will share with each other.
 So first, to make your storage class work, it must do the big 3:
+1. Inherit from the abstract class `scrapling.core.storage.StorageSystemMixin` and accept a string argument, which will be the `url` argument to maintain the library logic.
 2. Use the decorator `functools.lru_cache` on top of the class to follow the Singleton design pattern as other classes.
 3. Implement methods `save` and `retrieve`, as you see from the type hints:
     - The method `save` returns nothing and will get two arguments from the library
         * The first one is of type `lxml.html.HtmlElement`, which is the element itself. It must be converted to a dictionary using the function `element_to_dict` in submodule `scrapling.core.utils._StorageTools` to keep the same format and save it to your database as you wish.
+        * The second one is a string, the identifier used for retrieval. The combination result of this identifier and the `url` argument from initialization must be unique for each row, or the `adaptive` data will be messed up.
     - The method `retrieve` takes a string, which is the identifier; using it with the `url` passed on initialization, the element's dictionary is retrieved from the database and returned if it exists; otherwise, it returns `None`.
+> If the instructions weren't clear enough for you, you can check my implementation using SQLite3 in [storage_adaptors](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/core/storage.py) file
+If your class meets these criteria, the rest is straightforward. If you plan to use the library in a threaded application, ensure your class supports it. The default used class is thread-safe.
+Some helper functions are added to the abstract class if you want to use them. It's easier to see it for yourself in the [code](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/core/storage.py); it's heavily commented :)
 ## Real-World Example: Redis Storage
 import redis
 import orjson
 from functools import lru_cache
+from scrapling.core.storage import StorageSystemMixin
 from scrapling.core.utils import _StorageTools
 @lru_cache(None)