Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.54.0
metadata
title: Docurizzer
emoji: π
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.32.0
app_file: app.py
pinned: false
ππΌ Docurizzer - Document Summarizer
A Streamlit application that extracts text from PDFs and images, then summarizes them using AI.
Features
- PDF Text Extraction: Extract text from PDF documents using pdfplumber
- Image OCR: Extract text from images (PNG, JPG, JPEG) using Tesseract OCR
- AI Summarization: Summarize extracted text using T5-small model
How to Use
- Upload a PDF or image file
- Preview the extracted text
- Click "Summarize" to generate a summary
Local Development
# Install dependencies
pip install -r requirements.txt
# Install Tesseract OCR (macOS)
brew install tesseract
# Install Tesseract OCR (Ubuntu/Debian)
sudo apt-get install tesseract-ocr
# Run the app
streamlit run app.py
Hugging Face Spaces Deployment
This app is configured for deployment on Hugging Face Spaces:
requirements.txt- Python dependenciespackages.txt- System packages (Tesseract OCR)
Tech Stack
- Streamlit - Web interface
- Transformers - T5-small for summarization
- pdfplumber - PDF text extraction
- pytesseract - Image OCR
- Pillow - Image processing