Spaces:

lenson78
/

Scrapling

Paused

App Files Files Community

Karim shoair commited on Apr 8, 2025

Commit

c5df5f6

1 Parent(s): 72d73e1

docs: removing old documentation

Browse files

Checkout the website at https://scrapling.readthedocs.io

Files changed (4) hide show

docs/Core/using scrapling custom types.md +0 -21
docs/Examples/selectorless_stackoverflow.py +0 -25
docs/Extending Scrapling/writing storage system.md +0 -17
docs/index.md +0 -2

docs/Core/using scrapling custom types.md DELETED Viewed

@@ -1,21 +0,0 @@
-> You can take advantage from the custom-made types for Scrapling and use it outside the library if you want. It's better than copying their code after all :)
-### All current types can be imported alone like below
-```python
->>> from scrapling.core.custom_types import TextHandler, AttributesHandler
->>> somestring = TextHandler('{}')
->>> somestring.json()
-'{}'
->>> somedict_1 = AttributesHandler({'a': 1})
->>> somedict_2 = AttributesHandler(a=1)
-```
-Note `TextHandler` is a sub-class of Python's `str` so all normal operations/methods that work with Python strings will work.
-If you want to check for the type in your code, it's better to depend on Python built-in function `issubclass`.
-The class `AttributesHandler` is a sub-class of `collections.abc.Mapping` so it's immutable (read-only) and all operations are inherited from it. The data passed can be accessed later though the `._data` method but careful it's of type `types.MappingProxyType` so it's immutable (read-only) as well (faster than `collections.abc.Mapping` by fractions of seconds).
-So basically to make it simple to you if you are new to Python, the same operations and methods from Python standard `dict` type will all work with class `AttributesHandler` except the ones that try to modify the actual data.
-If you want to modify the data inside `AttributesHandler`, you have to convert it to dictionary first like with using the `dict` function and modify it outside.

docs/Examples/selectorless_stackoverflow.py DELETED Viewed

@@ -1,25 +0,0 @@
-"""
-I only made this example to show how Scrapling features can be used to scrape a website without writing any selector
-    so this script doesn't depend on the website structure.
-"""
-import requests
-from scrapling import Adaptor
-response = requests.get('https://stackoverflow.com/questions/tagged/web-scraping?sort=MostVotes&filters=NoAcceptedAnswer&edited=true&pagesize=50&page=2')
-page = Adaptor(response.text, url=response.url)
-# First we will extract the first question title and its author based on the text content
-first_question_title = page.find_by_text('Run Selenium Python Script on Remote Server')
-first_question_author = page.find_by_text('Ryan')
-# because this page changes a lot
-if first_question_title and first_question_author:
-    # If you want you can extract other questions tags like below
-    first_question = first_question_title.find_ancestor(
-        lambda ancestor: ancestor.attrib.get('id') and 'question-summary' in ancestor.attrib.get('id')
-    )
-    rest_of_questions = first_question.find_similar()
-    # But since nothing to rely on to extract other titles/authors from these elements without CSS/XPath selectors due to the website nature
-    # We will get all the rest of the titles/authors in the page depending on the first title and the first author we got above as a starting point
-    for i, (title, author) in enumerate(zip(first_question_title.find_similar(), first_question_author.find_similar()), start=1):
-        print(i, title.text, author.text)

docs/Extending Scrapling/writing storage system.md DELETED Viewed

@@ -1,17 +0,0 @@
-Scrapling by default is using SQLite but in case you want to write your storage system to store elements properties there for the auto-matching, this tutorial got you covered.
-You might want to use FireBase for example and share the database between multiple spiders on different machines, it's a great idea to use an online database like that because this way the spiders will share with each others.
-So first to make your storage class work, it must do the big 3:
-1. Inherit from the abstract class `scrapling.storage_adaptors.StorageSystemMixin` and accept a string argument which will be the `url` argument to maintain the library logic.
-2. Use the decorator `functools.lru_cache` on top of the class itself to follow the Singleton design pattern as other classes.
-3. Implement methods `save` and `retrieve`, as you see from the type hints:
-   - The method `save` returns nothing and will get two arguments from the library
-        * The first one is of type `lxml.html.HtmlElement` which is the element itself, ofc. It must be converted to dictionary using the function `scrapling.utils._StorageTools.element_to_dict` so we keep the same format then saved to your database as you wish.
-        * The second one is string which is the identifier used for retrieval. The combination of this identifier and the `url` argument from initialization must be unique for each row or the auto-match will be messed up.
-   - The method `retrieve` takes a string which is the identifier, using it with the `url` passed on initialization the element's dictionary is retrieved from the database and returned if it exist otherwise it returns `None`
-> If the instructions weren't clear enough for you, you can check my implementation using SQLite3 in [storage_adaptors](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/storage_adaptors.py) file
-If your class satisfy this, the rest is easy. If you are planning to use the library in a threaded application, make sure that your class supports it. The default used class is thread-safe.
-There are some helper functions added to the abstract class if you want to use it. It's easier to see it for yourself in the [code](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/storage_adaptors.py), it's heavily commented :)

docs/index.md DELETED Viewed

	@@ -1,2 +0,0 @@
1	- # This section is still under work but any help is highly appreciated
2	- ## I will try to make full detailed documentation with Sphinx ASAP.