Karim shoair commited on
Commit
23f6dfd
·
1 Parent(s): c6d4e9e

docs: Update all benchmarks

Browse files
Files changed (1) hide show
  1. docs/benchmarks.md +16 -33
docs/benchmarks.md CHANGED
@@ -1,44 +1,27 @@
1
- Scrapling isn't just powerful - it's also blazing fast. Scrapling implements many best practices, design patterns, and numerous optimizations to save fractions of seconds. All of that while focusing exclusively on parsing HTML documents.
2
 
3
- Here are benchmarks comparing Scrapling's parsing speed to popular Python libraries in two tests.
4
 
5
- ### Text Extraction Speed Test
6
-
7
- This test consists of extracting the text content of 5000 nested div elements.
8
-
9
- Here are the results comparing Scrapling to all well-known parsing libraries:
10
 
 
11
 
12
  | # | Library | Time (ms) | vs Scrapling |
13
  |---|:-----------------:|:---------:|:------------:|
14
- | 1 | Scrapling | 5.44 | 1.0x |
15
- | 2 | Parsel/Scrapy | 5.53 | 1.017x |
16
- | 3 | Raw Lxml | 6.76 | 1.243x |
17
- | 4 | PyQuery | 21.96 | 4.037x |
18
- | 5 | Selectolax | 67.12 | 12.338x |
19
- | 6 | BS4 with Lxml | 1307.03 | 240.263x |
20
- | 7 | MechanicalSoup | 1322.64 | 243.132x |
21
- | 8 | BS4 with html5lib | 3373.75 | 620.175x |
22
-
23
- As you see, Scrapling is on par with Scrapy and slightly faster than Lxml, which both libraries are built on top of. These are the closest results to Scrapling. PyQuery is also built on top of Lxml, but Scrapling is four times faster.
24
-
25
- ### Extraction By Text Speed Test
26
-
27
- Scrapling can find elements based on its text content and find elements similar to these elements. The only known library with these two features, too, is AutoScraper.
28
 
29
- So, we compared this to see how fast Scrapling can be in these two tasks compared to AutoScraper.
30
 
31
- Here are the results:
32
 
33
  | Library | Time (ms) | vs Scrapling |
34
  |-------------|:---------:|:------------:|
35
- | Scrapling | 2.51 | 1.0x |
36
- | AutoScraper | 11.41 | 4.546x |
37
-
38
- Scrapling can find elements with more methods and returns the entire element's `Adaptor` object, not only text like AutoScraper. So, to make this test fair, both libraries will extract an element with text, find similar elements, and then extract the text content for all of them.
39
-
40
- As you see, Scrapling is still 4.5 times faster at the same task.
41
-
42
- If we made Scrapling extract the elements only without stopping to extract each element's text, we would get speed twice as fast as this, but as I said, to make it fair comparison a bit :smile:
43
-
44
- > All benchmarks' results are an average of 100 runs. See our [benchmarks.py](https://github.com/D4Vinci/Scrapling/blob/main/benchmarks.py) for methodology and to run your comparisons.
 
1
+ # Performance Benchmarks
2
 
3
+ Scrapling isn't just powerful—it's also blazing fast, and version 0.3 delivers exceptional performance improvements across all operations!
4
 
5
+ ## Benchmark Results
 
 
 
 
6
 
7
+ ### Text Extraction Speed Test (5000 nested elements)
8
 
9
  | # | Library | Time (ms) | vs Scrapling |
10
  |---|:-----------------:|:---------:|:------------:|
11
+ | 1 | Scrapling | 1.88 | 1.0x |
12
+ | 2 | Parsel/Scrapy | 1.96 | 1.043x |
13
+ | 3 | Raw Lxml | 2.32 | 1.234x |
14
+ | 4 | PyQuery | 20.2 | ~11x |
15
+ | 5 | Selectolax | 85.2 | ~45x |
16
+ | 6 | MechanicalSoup | 1305.84 | ~695x |
17
+ | 7 | BS4 with Lxml | 1307.92 | ~696x |
18
+ | 8 | BS4 with html5lib | 3336.28 | ~1775x |
 
 
 
 
 
 
19
 
20
+ ### Element Similarity & Text Search Performance
21
 
22
+ Scrapling's adaptive element finding capabilities significantly outperform alternatives:
23
 
24
  | Library | Time (ms) | vs Scrapling |
25
  |-------------|:---------:|:------------:|
26
+ | Scrapling | 2.02 | 1.0x |
27
+ | AutoScraper | 10.26 | 5.08x |