KevinHuSh
commited on
Commit
·
e319829
1
Parent(s):
2d09c38
fix exception in pdf parser (#584)
Browse files### What problem does this PR solve?
#451
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
deepdoc/parser/pdf_parser.py
CHANGED
|
@@ -470,7 +470,8 @@ class RAGFlowPdfParser:
|
|
| 470 |
continue
|
| 471 |
|
| 472 |
if re.match(r"[0-9]{2,3}/[0-9]{3}$", up["text"]) \
|
| 473 |
-
or re.match(r"[0-9]{2,3}/[0-9]{3}$", down["text"])
|
|
|
|
| 474 |
i += 1
|
| 475 |
continue
|
| 476 |
|
|
|
|
| 470 |
continue
|
| 471 |
|
| 472 |
if re.match(r"[0-9]{2,3}/[0-9]{3}$", up["text"]) \
|
| 473 |
+
or re.match(r"[0-9]{2,3}/[0-9]{3}$", down["text"]) \
|
| 474 |
+
or not down["text"].strip():
|
| 475 |
i += 1
|
| 476 |
continue
|
| 477 |
|