streamlit PyMuPDF pandas nltk regex sentence-transformers torch