ESGToolKit / pdf_parser.py
GirishaBuilds01's picture
Update pdf_parser.py
2e5646b verified
raw
history blame contribute delete
181 Bytes
import fitz
def extract_text(file):
doc = fitz.open(stream=file.read(), filetype="pdf")
text = ""
for page in doc:
text += page.get_text()
return text