docurizer / README.md
the-carnage's picture
Add Space metadata
b00aa55
---
title: Docurizzer
emoji: πŸ“„
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: "1.32.0"
app_file: app.py
pinned: false
---
# πŸ“„πŸ–Ό Docurizzer - Document Summarizer
A Streamlit application that extracts text from PDFs and images, then summarizes them using AI.
## Features
- **PDF Text Extraction**: Extract text from PDF documents using pdfplumber
- **Image OCR**: Extract text from images (PNG, JPG, JPEG) using Tesseract OCR
- **AI Summarization**: Summarize extracted text using T5-small model
## How to Use
1. Upload a PDF or image file
2. Preview the extracted text
3. Click "Summarize" to generate a summary
## Local Development
```bash
# Install dependencies
pip install -r requirements.txt
# Install Tesseract OCR (macOS)
brew install tesseract
# Install Tesseract OCR (Ubuntu/Debian)
sudo apt-get install tesseract-ocr
# Run the app
streamlit run app.py
```
## Hugging Face Spaces Deployment
This app is configured for deployment on Hugging Face Spaces:
- `requirements.txt` - Python dependencies
- `packages.txt` - System packages (Tesseract OCR)
## Tech Stack
- **Streamlit** - Web interface
- **Transformers** - T5-small for summarization
- **pdfplumber** - PDF text extraction
- **pytesseract** - Image OCR
- **Pillow** - Image processing