Spaces:
Sleeping
Sleeping
| # =============================== | |
| # π¦ Embedding + Vector Search | |
| # =============================== | |
| chromadb | |
| sentence-transformers # Compatible with huggingface-hub 0.30.1 | |
| torch # Stable with sentence-transformers | |
| # =============================== | |
| # π€ LLM-Based QA | |
| # =============================== | |
| transformers # Works well with huggingface-hub 0.30.1 | |
| accelerate | |
| huggingface-hub # Compatible with transformers 4.37.2 | |
| # =============================== | |
| # π PDF Parsing | |
| # =============================== | |
| pymupdf # PyMuPDF for full-page text extraction | |
| pdfminer.six # Optional: structured layout extraction | |
| # =============================== | |
| # πΌοΈ OCR + Image Handling | |
| # =============================== | |
| pytesseract # Requires separate install of Tesseract binary | |
| Pillow | |
| # =============================== | |
| # π UI Interface | |
| # =============================== | |
| gradio # Gradio 4+ for modern UI | |
| requests | |
| # =============================== | |
| # π Utilities and Fixes | |
| # =============================== | |
| beautifulsoup4 # Parsing for HTML-in-PDFs (e.g., diagrams/tables) | |
| pydantic # Chromadb is not yet compatible with pydantic 2.x | |
| numpy # Ensures compatibility with chromadb and transformers | |
| tqdm # Progress bar (used in embedding scripts) | |
| # Natural Language Toolkit========= | |
| nltk | |
| docx2txt |