streamlit transformers==4.40.1 torch pytesseract pillow newspaper3k lxml_html_clean