--- title: Docurizzer emoji: 📄 colorFrom: blue colorTo: purple sdk: streamlit sdk_version: "1.32.0" app_file: app.py pinned: false --- # 📄🖼 Docurizzer - Document Summarizer A Streamlit application that extracts text from PDFs and images, then summarizes them using AI. ## Features - **PDF Text Extraction**: Extract text from PDF documents using pdfplumber - **Image OCR**: Extract text from images (PNG, JPG, JPEG) using Tesseract OCR - **AI Summarization**: Summarize extracted text using T5-small model ## How to Use 1. Upload a PDF or image file 2. Preview the extracted text 3. Click "Summarize" to generate a summary ## Local Development ```bash # Install dependencies pip install -r requirements.txt # Install Tesseract OCR (macOS) brew install tesseract # Install Tesseract OCR (Ubuntu/Debian) sudo apt-get install tesseract-ocr # Run the app streamlit run app.py ``` ## Hugging Face Spaces Deployment This app is configured for deployment on Hugging Face Spaces: - `requirements.txt` - Python dependencies - `packages.txt` - System packages (Tesseract OCR) ## Tech Stack - **Streamlit** - Web interface - **Transformers** - T5-small for summarization - **pdfplumber** - PDF text extraction - **pytesseract** - Image OCR - **Pillow** - Image processing