Spaces:

lenson78
/

Scrapling

Paused

App Files Files Community

Karim shoair commited on Dec 31, 2025

Commit

be5aa87

1 Parent(s): c0cf681

docs: update the cli section

Browse files

Files changed (3) hide show

docs/cli/extract-commands.md +17 -20
docs/cli/interactive-shell.md +4 -2
docs/cli/overview.md +1 -1

docs/cli/extract-commands.md CHANGED Viewed

@@ -2,7 +2,7 @@
 **Web Scraping through the terminal without requiring any programming!**
-The `scrapling extract` Command lets you download and extract content from websites directly from your terminal without writing any code. Ideal for beginners, researchers, and anyone requiring rapid web data extraction.
 > 💡 **Prerequisites:**
 >
@@ -30,7 +30,7 @@ The extract command is a set of simple terminal tools that:
     ```bash
     scrapling extract get "https://example.com" page_content.txt
     ```
-    This does an HTTP GET request and saves the text content of the webpage to `page_content.txt`.
 - **Save as Different Formats**
@@ -73,13 +73,13 @@ Commands:
   stealthy-fetch  Use StealthyFetcher to fetch content with advanced...
 ```
-We will go through each Command in detail below.
 ### HTTP Requests
 1. **GET Request**
-    The most common Command for downloading website content:
     ```bash
     scrapling extract get [URL] [OUTPUT_FILE] [OPTIONS]
@@ -105,7 +105,7 @@ We will go through each Command in detail below.
     # Add multiple headers
     scrapling extract get "https://site.com" page.html -H "Accept: text/html" -H "Accept-Language: en-US"
     ```
-    Get the available options for the Command with `scrapling extract get --help` as follows:
     ```bash
     Usage: scrapling extract get [OPTIONS] URL OUTPUT_FILE
@@ -143,7 +143,7 @@ We will go through each Command in detail below.
     # Send JSON data
     scrapling extract post "https://api.site.com" response.json --json '{"username": "test", "action": "search"}'
     ```
-    Get the available options for the Command with `scrapling extract post --help` as follows:
     ```bash
     Usage: scrapling extract post [OPTIONS] URL OUTPUT_FILE
@@ -182,7 +182,7 @@ We will go through each Command in detail below.
     # Send JSON data
     scrapling extract put "https://scrapling.requestcatcher.com/put" response.json --json '{"username": "test", "action": "search"}'
     ```
-    Get the available options for the Command with `scrapling extract put --help` as follows:
     ```bash
     Usage: scrapling extract put [OPTIONS] URL OUTPUT_FILE
@@ -220,7 +220,7 @@ We will go through each Command in detail below.
     # Send JSON data
     scrapling extract delete "https://scrapling.requestcatcher.com/" response.txt --impersonate "chrome"
     ```
-    Get the available options for the Command with `scrapling extract delete --help` as follows:
     ```bash
     Usage: scrapling extract delete [OPTIONS] URL OUTPUT_FILE
@@ -263,7 +263,7 @@ We will go through each Command in detail below.
     # Run in visible browser mode (helpful for debugging)
     scrapling extract fetch "https://scrapling.requestcatcher.com/" page.html --no-headless --disable-resources
     ```
-    Get the available options for the Command with `scrapling extract fetch --help` as follows:
     ```bash
     Usage: scrapling extract fetch [OPTIONS] URL OUTPUT_FILE
@@ -279,10 +279,8 @@ We will go through each Command in detail below.
       --wait INTEGER                              Additional wait time in milliseconds after page load (default: 0)
       -s, --css-selector TEXT                     CSS selector to extract specific content from the page. It returns all matches.
       --wait-selector TEXT                        CSS selector to wait for before proceeding
-      --locale TEXT                               Browser locale (default: en-US)
-      --stealth / --no-stealth                    Enable stealth mode (default: False)
-      --hide-canvas / --show-canvas               Add noise to canvas operations (default: False)
-      --disable-webgl / --enable-webgl            Disable WebGL support (default: False)
       --proxy TEXT                                Proxy URL in format "http://username:password@host:port"
       -H, --extra-headers TEXT                    Extra headers in format "Key: Value" (can be used multiple times)
       --help                                      Show this message and exit.
@@ -304,10 +302,10 @@ We will go through each Command in detail below.
     # Solve Cloudflare challenges
     scrapling extract stealthy-fetch "https://nopecha.com/demo/cloudflare" data.txt --solve-cloudflare --css-selector "#padded_content a"
-    # Use proxy for anonymity
     scrapling extract stealthy-fetch "https://site.com" content.md --proxy "http://proxy-server:8080"
     ```
-    Get the available options for the Command with `scrapling extract stealthy-fetch --help` as follows:
     ```bash
     Usage: scrapling extract stealthy-fetch [OPTIONS] URL OUTPUT_FILE
@@ -317,25 +315,24 @@ We will go through each Command in detail below.
     Options:
       --headless / --no-headless                  Run browser in headless mode (default: True)
-      --block-images / --allow-images             Block image loading (default: False)
       --disable-resources / --enable-resources    Drop unnecessary resources for speed boost (default: False)
       --block-webrtc / --allow-webrtc             Block WebRTC entirely (default: False)
-      --humanize / --no-humanize                  Humanize cursor movement (default: False)
       --solve-cloudflare / --no-solve-cloudflare  Solve Cloudflare challenges (default: False)
       --allow-webgl / --block-webgl               Allow WebGL (default: True)
       --network-idle / --no-network-idle          Wait for network idle (default: False)
-      --disable-ads / --allow-ads                 Install uBlock Origin addon (default: False)
       --timeout INTEGER                           Timeout in milliseconds (default: 30000)
       --wait INTEGER                              Additional wait time in milliseconds after page load (default: 0)
       -s, --css-selector TEXT                     CSS selector to extract specific content from the page. It returns all matches.
       --wait-selector TEXT                        CSS selector to wait for before proceeding
-      --geoip / --no-geoip                        Use IP/Proxy geolocation for timezone/locale (default: False)
       --proxy TEXT                                Proxy URL in format "http://username:password@host:port"
       -H, --extra-headers TEXT                    Extra headers in format "Key: Value" (can be used multiple times)
       --help                                      Show this message and exit.
     ```
-## When to use each Command
 If you are not a Web Scraping expert and can't decide what to choose, you can use the following formula to help you decide:

 **Web Scraping through the terminal without requiring any programming!**
+The `scrapling extract` command lets you download and extract content from websites directly from your terminal without writing any code. Ideal for beginners, researchers, and anyone requiring rapid web data extraction.
 > 💡 **Prerequisites:**
 >
     ```bash
     scrapling extract get "https://example.com" page_content.txt
     ```
+    This makes an HTTP GET request and saves the webpage's text content to `page_content.txt`.
 - **Save as Different Formats**
   stealthy-fetch  Use StealthyFetcher to fetch content with advanced...
 ```
+We will go through each command in detail below.
 ### HTTP Requests
 1. **GET Request**
+    The most common command for downloading website content:
     ```bash
     scrapling extract get [URL] [OUTPUT_FILE] [OPTIONS]
     # Add multiple headers
     scrapling extract get "https://site.com" page.html -H "Accept: text/html" -H "Accept-Language: en-US"
     ```
+    Get the available options for the command with `scrapling extract get --help` as follows:
     ```bash
     Usage: scrapling extract get [OPTIONS] URL OUTPUT_FILE
     # Send JSON data
     scrapling extract post "https://api.site.com" response.json --json '{"username": "test", "action": "search"}'
     ```
+    Get the available options for the command with `scrapling extract post --help` as follows:
     ```bash
     Usage: scrapling extract post [OPTIONS] URL OUTPUT_FILE
     # Send JSON data
     scrapling extract put "https://scrapling.requestcatcher.com/put" response.json --json '{"username": "test", "action": "search"}'
     ```
+    Get the available options for the command with `scrapling extract put --help` as follows:
     ```bash
     Usage: scrapling extract put [OPTIONS] URL OUTPUT_FILE
     # Send JSON data
     scrapling extract delete "https://scrapling.requestcatcher.com/" response.txt --impersonate "chrome"
     ```
+    Get the available options for the command with `scrapling extract delete --help` as follows:
     ```bash
     Usage: scrapling extract delete [OPTIONS] URL OUTPUT_FILE
     # Run in visible browser mode (helpful for debugging)
     scrapling extract fetch "https://scrapling.requestcatcher.com/" page.html --no-headless --disable-resources
     ```
+    Get the available options for the command with `scrapling extract fetch --help` as follows:
     ```bash
     Usage: scrapling extract fetch [OPTIONS] URL OUTPUT_FILE
       --wait INTEGER                              Additional wait time in milliseconds after page load (default: 0)
       -s, --css-selector TEXT                     CSS selector to extract specific content from the page. It returns all matches.
       --wait-selector TEXT                        CSS selector to wait for before proceeding
+      --locale TEXT                               Specify user locale. Defaults to the system default locale.
+      ---real-chrome/--no-real-chrome             If you have a Chrome browser installed on your device, enable this, and the Fetcher will launch an instance of your browser and use it. (default: False)
       --proxy TEXT                                Proxy URL in format "http://username:password@host:port"
       -H, --extra-headers TEXT                    Extra headers in format "Key: Value" (can be used multiple times)
       --help                                      Show this message and exit.
     # Solve Cloudflare challenges
     scrapling extract stealthy-fetch "https://nopecha.com/demo/cloudflare" data.txt --solve-cloudflare --css-selector "#padded_content a"
+    # Use a proxy for anonymity.
     scrapling extract stealthy-fetch "https://site.com" content.md --proxy "http://proxy-server:8080"
     ```
+    Get the available options for the command with `scrapling extract stealthy-fetch --help` as follows:
     ```bash
     Usage: scrapling extract stealthy-fetch [OPTIONS] URL OUTPUT_FILE
     Options:
       --headless / --no-headless                  Run browser in headless mode (default: True)
       --disable-resources / --enable-resources    Drop unnecessary resources for speed boost (default: False)
       --block-webrtc / --allow-webrtc             Block WebRTC entirely (default: False)
       --solve-cloudflare / --no-solve-cloudflare  Solve Cloudflare challenges (default: False)
       --allow-webgl / --block-webgl               Allow WebGL (default: True)
       --network-idle / --no-network-idle          Wait for network idle (default: False)
+      ---real-chrome/--no-real-chrom              If you have a Chrome browser installed on your device, enable this, and the Fetcher will launch an instance of your browser and use it. (default: False)
+      --hide-canvas/--show-canvas                 Add noise to canvas operations (default: False)
       --timeout INTEGER                           Timeout in milliseconds (default: 30000)
       --wait INTEGER                              Additional wait time in milliseconds after page load (default: 0)
       -s, --css-selector TEXT                     CSS selector to extract specific content from the page. It returns all matches.
       --wait-selector TEXT                        CSS selector to wait for before proceeding
+      --hide-canvas / --show-canvas               Add noise to canvas operations (default: False)
       --proxy TEXT                                Proxy URL in format "http://username:password@host:port"
       -H, --extra-headers TEXT                    Extra headers in format "Key: Value" (can be used multiple times)
       --help                                      Show this message and exit.
     ```
+## When to use each command
 If you are not a Web Scraping expert and can't decide what to choose, you can use the following formula to help you decide:

docs/cli/interactive-shell.md CHANGED Viewed

@@ -4,7 +4,7 @@
 **Powerful Web Scraping REPL for Developers and Data Scientists**
-The Scrapling Interactive Shell is an enhanced IPython-based environment designed specifically for Web Scraping tasks. It provides instant access to all Scrapling features, clever shortcuts, automatic page management, and advanced tools like curl command conversion.
 > 💡 **Prerequisites:**
 >
@@ -129,7 +129,9 @@ View scraped pages in your browser:
 ### Curl Command Integration
-The shell provides a few functions to help you convert curl commands from the browser DevTools to `Fetcher` requests, which are `uncurl` and `curl2fetcher`. First, you need to copy a request as a curl command like the following:
 <img src="../../assets/scrapling_shell_curl.png" title="Copying a request as a curl command from Chrome" alt="Copying a request as a curl command from Chrome" style="width: 70%;"/>

 **Powerful Web Scraping REPL for Developers and Data Scientists**
+The Scrapling Interactive Shell is an enhanced IPython-based environment designed specifically for Web Scraping tasks. It provides instant access to all Scrapling features, clever shortcuts, automatic page management, and advanced tools, such as conversion of the curl command.
 > 💡 **Prerequisites:**
 >
 ### Curl Command Integration
+The shell provides a few functions to help you convert curl commands from the browser DevTools to `Fetcher` requests: `uncurl` and `curl2fetcher`.
+First, you need to copy a request as a curl command like the following:
 <img src="../../assets/scrapling_shell_curl.png" title="Copying a request as a curl command from Chrome" alt="Copying a request as a curl command from Chrome" style="width: 70%;"/>

docs/cli/overview.md CHANGED Viewed

@@ -27,4 +27,4 @@ and the installation of the fetchers' dependencies with the following command
 ```bash
 scrapling install
 ```
-This downloads all browsers with their system dependencies and fingerprint manipulation dependencies.

 ```bash
 scrapling install
 ```
+This downloads all browsers, along with their system dependencies and fingerprint manipulation dependencies.