Karim shoair commited on
Commit ·
23f6dfd
1
Parent(s): c6d4e9e
docs: Update all benchmarks
Browse files- docs/benchmarks.md +16 -33
docs/benchmarks.md
CHANGED
|
@@ -1,44 +1,27 @@
|
|
| 1 |
-
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
##
|
| 6 |
-
|
| 7 |
-
This test consists of extracting the text content of 5000 nested div elements.
|
| 8 |
-
|
| 9 |
-
Here are the results comparing Scrapling to all well-known parsing libraries:
|
| 10 |
|
|
|
|
| 11 |
|
| 12 |
| # | Library | Time (ms) | vs Scrapling |
|
| 13 |
|---|:-----------------:|:---------:|:------------:|
|
| 14 |
-
| 1 | Scrapling |
|
| 15 |
-
| 2 | Parsel/Scrapy |
|
| 16 |
-
| 3 | Raw Lxml |
|
| 17 |
-
| 4 | PyQuery |
|
| 18 |
-
| 5 | Selectolax |
|
| 19 |
-
| 6 |
|
| 20 |
-
| 7 |
|
| 21 |
-
| 8 | BS4 with html5lib |
|
| 22 |
-
|
| 23 |
-
As you see, Scrapling is on par with Scrapy and slightly faster than Lxml, which both libraries are built on top of. These are the closest results to Scrapling. PyQuery is also built on top of Lxml, but Scrapling is four times faster.
|
| 24 |
-
|
| 25 |
-
### Extraction By Text Speed Test
|
| 26 |
-
|
| 27 |
-
Scrapling can find elements based on its text content and find elements similar to these elements. The only known library with these two features, too, is AutoScraper.
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
| Library | Time (ms) | vs Scrapling |
|
| 34 |
|-------------|:---------:|:------------:|
|
| 35 |
-
| Scrapling | 2.
|
| 36 |
-
| AutoScraper |
|
| 37 |
-
|
| 38 |
-
Scrapling can find elements with more methods and returns the entire element's `Adaptor` object, not only text like AutoScraper. So, to make this test fair, both libraries will extract an element with text, find similar elements, and then extract the text content for all of them.
|
| 39 |
-
|
| 40 |
-
As you see, Scrapling is still 4.5 times faster at the same task.
|
| 41 |
-
|
| 42 |
-
If we made Scrapling extract the elements only without stopping to extract each element's text, we would get speed twice as fast as this, but as I said, to make it fair comparison a bit :smile:
|
| 43 |
-
|
| 44 |
-
> All benchmarks' results are an average of 100 runs. See our [benchmarks.py](https://github.com/D4Vinci/Scrapling/blob/main/benchmarks.py) for methodology and to run your comparisons.
|
|
|
|
| 1 |
+
# Performance Benchmarks
|
| 2 |
|
| 3 |
+
Scrapling isn't just powerful—it's also blazing fast, and version 0.3 delivers exceptional performance improvements across all operations!
|
| 4 |
|
| 5 |
+
## Benchmark Results
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
+
### Text Extraction Speed Test (5000 nested elements)
|
| 8 |
|
| 9 |
| # | Library | Time (ms) | vs Scrapling |
|
| 10 |
|---|:-----------------:|:---------:|:------------:|
|
| 11 |
+
| 1 | Scrapling | 1.88 | 1.0x |
|
| 12 |
+
| 2 | Parsel/Scrapy | 1.96 | 1.043x |
|
| 13 |
+
| 3 | Raw Lxml | 2.32 | 1.234x |
|
| 14 |
+
| 4 | PyQuery | 20.2 | ~11x |
|
| 15 |
+
| 5 | Selectolax | 85.2 | ~45x |
|
| 16 |
+
| 6 | MechanicalSoup | 1305.84 | ~695x |
|
| 17 |
+
| 7 | BS4 with Lxml | 1307.92 | ~696x |
|
| 18 |
+
| 8 | BS4 with html5lib | 3336.28 | ~1775x |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
### Element Similarity & Text Search Performance
|
| 21 |
|
| 22 |
+
Scrapling's adaptive element finding capabilities significantly outperform alternatives:
|
| 23 |
|
| 24 |
| Library | Time (ms) | vs Scrapling |
|
| 25 |
|-------------|:---------:|:------------:|
|
| 26 |
+
| Scrapling | 2.02 | 1.0x |
|
| 27 |
+
| AutoScraper | 10.26 | 5.08x |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|