pytesseract textacy regex nltk scipy==1.12.0 gensim networkx headline-gen==2.6 opencv-python