Spaces:
Running
Running
| title: Docurizzer | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: streamlit | |
| sdk_version: "1.32.0" | |
| app_file: app.py | |
| pinned: false | |
| # ππΌ Docurizzer - Document Summarizer | |
| A Streamlit application that extracts text from PDFs and images, then summarizes them using AI. | |
| ## Features | |
| - **PDF Text Extraction**: Extract text from PDF documents using pdfplumber | |
| - **Image OCR**: Extract text from images (PNG, JPG, JPEG) using Tesseract OCR | |
| - **AI Summarization**: Summarize extracted text using T5-small model | |
| ## How to Use | |
| 1. Upload a PDF or image file | |
| 2. Preview the extracted text | |
| 3. Click "Summarize" to generate a summary | |
| ## Local Development | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Install Tesseract OCR (macOS) | |
| brew install tesseract | |
| # Install Tesseract OCR (Ubuntu/Debian) | |
| sudo apt-get install tesseract-ocr | |
| # Run the app | |
| streamlit run app.py | |
| ``` | |
| ## Hugging Face Spaces Deployment | |
| This app is configured for deployment on Hugging Face Spaces: | |
| - `requirements.txt` - Python dependencies | |
| - `packages.txt` - System packages (Tesseract OCR) | |
| ## Tech Stack | |
| - **Streamlit** - Web interface | |
| - **Transformers** - T5-small for summarization | |
| - **pdfplumber** - PDF text extraction | |
| - **pytesseract** - Image OCR | |
| - **Pillow** - Image processing | |