## Pick Your Path
Not sure where to start? Pick the path that matches what you're trying to do:
| I want to... | Start here |
|:---|:---|
| **Parse HTML** I already have | [Querying elements](parsing/selection.md) — CSS, XPath, and text-based selection |
| **Quickly scrape a page** and prototype | Pick a [fetcher](fetching/choosing.md) and test right away, or launch the [interactive shell](cli/interactive-shell.md) |
| **Build a crawler** that scales | [Spiders](spiders/getting-started.md) — concurrent, multi-session crawls with pause/resume |
| **Scrape without writing code** | [CLI extract commands](cli/extract-commands.md) or hook up the [MCP server](ai/mcp-server.md) to your favourite AI tool |
| **Migrate** from another library | [From BeautifulSoup](tutorials/migrating_from_beautifulsoup.md) or [Scrapy comparison](spiders/architecture.md#comparison-with-scrapy) |
---
We will start by quickly reviewing the parsing capabilities. Then we will fetch websites using custom browsers, make requests, and parse the responses.
Here's an HTML document generated by ChatGPT that we will be using as an example throughout this page:
```html
Complex Web Page
Products
Product 1
This is product 1
$10.99
In stock: 5
Product 2
This is product 2
$20.99
In stock: 3
Product 3
This is product 3
$15.99
Out of stock
Customer Reviews
Good value for money.
Jane Smith
```
Starting with loading raw HTML above like this
```python
from scrapling.parser import Selector
page = Selector(html_doc)
page # Complex Web Page
```
Get all text content on the page recursively
```python
page.get_all_text(ignore_tags=('script', 'style'))
# 'Complex Web Page\nHome\nAbout\nContact\nProducts\nProduct 1\nThis is product 1\n$10.99\nIn stock: 5\nProduct 2\nThis is product 2\n$20.99\nIn stock: 3\nProduct 3\nThis is product 3\n$15.99\nOut of stock\nCustomer Reviews\nGreat product!\nJohn Doe\nGood value for money.\nJane Smith'
```
## Finding elements
If there's an element you want to find on the page, you will find it! Your creativity level is the only limitation!
Finding the first HTML `section` element
```python
section_element = page.find('section')
#
```
Find all `section` elements
```python
section_elements = page.find_all('section')
# [, Customer Revie...' parent=']
```
Find all `section` elements whose `id` attribute value is `products`.
```python
section_elements = page.find_all('section', {'id':"products"})
# Same as
section_elements = page.find_all('section', id="products")
# []
```
Find all `section` elements whose `id` attribute value contains `product`.
```python
section_elements = page.find_all('section', {'id*':"product"})
```
Find all `h3` elements whose text content matches this regex `Product \d`
```python
page.find_all('h3', re.compile(r'Product \d'))
# [Product 1' parent=', Product 2' parent=', Product 3' parent=']
```
Find all `h3` and `h2` elements whose text content matches the regex `Product` only
```python
page.find_all(['h3', 'h2'], re.compile(r'Product'))
# [Product 1' parent=', Product 2' parent=', Product 3' parent=', Products
' parent=']
```
Find all elements whose text content matches exactly `Products` (Whitespaces are not taken into consideration)
```python
page.find_by_text('Products', first_match=False)
# [Products' parent=']
```
Or find all elements whose text content matches regex `Product \d`
```python
page.find_by_regex(r'Product \d', first_match=False)
# [Product 1' parent=', Product 2' parent=', Product 3' parent=']
```
Find all elements that are similar to the element you want
```python
target_element = page.find_by_regex(r'Product \d', first_match=True)
# Product 1' parent='
target_element.find_similar()
# [Product 2' parent=', Product 3' parent=']
```
Find the first element that matches a CSS selector
```python
page.css('.product-list [data-id="1"]')[0]
#
```
Find all elements that match a CSS selector
```python
page.css('.product-list article')
# [, , ]
```
Find the first element that matches an XPath selector
```python
page.xpath("//*[@id='products']/div/article")[0]
#
```
Find all elements that match an XPath selector
```python
page.xpath("//*[@id='products']/div/article")
# [, , ]
```
With this, we just scratched the surface of these functions; more advanced options with these selection methods are shown later.
## Accessing elements' data
It's as simple as
```python
>>> section_element.tag
'section'
>>> print(section_element.attrib)
{'id': 'products', 'schema': '{"jsonable": "data"}'}
>>> section_element.attrib['schema'].json() # If an attribute value can be converted to json, then use `.json()` to convert it
{'jsonable': 'data'}
>>> section_element.text # Direct text content
''
>>> section_element.get_all_text() # All text content recursively
'Products\nProduct 1\nThis is product 1\n$10.99\nIn stock: 5\nProduct 2\nThis is product 2\n$20.99\nIn stock: 3\nProduct 3\nThis is product 3\n$15.99\nOut of stock'
>>> section_element.html_content # The HTML content of the element
'Products
\n \n
Product 1
\n This is product 1
\n $10.99\n In stock: 5
\n Product 2
\n This is product 2
\n $20.99\n In stock: 3
\n Product 3
\n This is product 3
\n $15.99\n Out of stock
\n \n '
>>> print(section_element.prettify()) # The prettified version
'''
Products
Product 1
This is product 1
$10.99
In stock: 5
Product 2
This is product 2
$20.99
In stock: 3
Product 3
This is product 3
$15.99
Out of stock
'''
>>> section_element.path # All the ancestors in the DOM tree of this element
[