Update README.md
Browse files
README.md
CHANGED
|
@@ -24,6 +24,7 @@ better input = better output<br>
|
|
| 24 |
* docling_by_sevenof9_v1.py - Python, you need nvidia RTX to run it fast<br>
|
| 25 |
all other older versions<br><br>
|
| 26 |
<b>⇨</b> give me a ❤️, if you like ;)<br><br>
|
|
|
|
| 27 |
|
| 28 |
on github
|
| 29 |
https://github.com/kalle07/parsing
|
|
@@ -57,13 +58,12 @@ I work with "<b>pdfplumber/pdfminer</b>" none OCR, so its very fast!<br>
|
|
| 57 |
<li>tested on 300 PDF files ~30000 pages</li>
|
| 58 |
</ul>
|
| 59 |
<br>
|
| 60 |
-
This I have created with my brain and the help of
|
| 61 |
It is really hard for me with GUI and the Function and in addition to compile it.<br>
|
| 62 |
For the python-file you need to import missing libraries.<br>
|
| 63 |
Of course there is a lot of need for optimization(save/error-handling) or the use of other parser libraries, but it's a start.
|
| 64 |
<br><br>
|
| 65 |
-
|
| 66 |
-
Give me a hand if you can ;)<br>
|
| 67 |
...
|
| 68 |
<br>
|
| 69 |
I also have a "<b>docling</b>" parser with OCR (GPU is need for fast processing), its only be a python-file, not compiled.<br>
|
|
|
|
| 24 |
* docling_by_sevenof9_v1.py - Python, you need nvidia RTX to run it fast<br>
|
| 25 |
all other older versions<br><br>
|
| 26 |
<b>⇨</b> give me a ❤️, if you like ;)<br><br>
|
| 27 |
+
Check the PDF before converting it to text: go to any page, ideally one at the beginning and one at the end, select the text with the mouse and copy it into an editor (can you see what you copied?)... if that doesn't work, this parser won't work and neither will any other program! To do this, you must remove the copy protection, or the page is just an image and you must use OCR first.<br>
|
| 28 |
|
| 29 |
on github
|
| 30 |
https://github.com/kalle07/parsing
|
|
|
|
| 58 |
<li>tested on 300 PDF files ~30000 pages</li>
|
| 59 |
</ul>
|
| 60 |
<br>
|
| 61 |
+
This I have created with my brain and the help of Ai, Iam not a coder... sorry so I will not fulfill any wishes unless there are real errors.<br>
|
| 62 |
It is really hard for me with GUI and the Function and in addition to compile it.<br>
|
| 63 |
For the python-file you need to import missing libraries.<br>
|
| 64 |
Of course there is a lot of need for optimization(save/error-handling) or the use of other parser libraries, but it's a start.
|
| 65 |
<br><br>
|
| 66 |
+
Give me a hand if you can ;) for more implementation or more stable.<br>
|
|
|
|
| 67 |
...
|
| 68 |
<br>
|
| 69 |
I also have a "<b>docling</b>" parser with OCR (GPU is need for fast processing), its only be a python-file, not compiled.<br>
|