Spaces:

the-carnage
/

docurizer

Running

App Files Files Community

docurizer / README.md

the-carnage

Add Space metadata

b00aa55 2 months ago

preview code

raw

history blame contribute delete

1.29 kB

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade

metadata

title: Docurizzer
emoji: 📄
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.32.0
app_file: app.py
pinned: false

📄🖼 Docurizzer - Document Summarizer

A Streamlit application that extracts text from PDFs and images, then summarizes them using AI.

Features

PDF Text Extraction: Extract text from PDF documents using pdfplumber
Image OCR: Extract text from images (PNG, JPG, JPEG) using Tesseract OCR
AI Summarization: Summarize extracted text using T5-small model

How to Use

Upload a PDF or image file
Preview the extracted text
Click "Summarize" to generate a summary

Local Development

# Install dependencies
pip install -r requirements.txt

# Install Tesseract OCR (macOS)
brew install tesseract

# Install Tesseract OCR (Ubuntu/Debian)
sudo apt-get install tesseract-ocr

# Run the app
streamlit run app.py

Hugging Face Spaces Deployment

This app is configured for deployment on Hugging Face Spaces:

requirements.txt - Python dependencies
packages.txt - System packages (Tesseract OCR)

Tech Stack

Streamlit - Web interface
Transformers - T5-small for summarization
pdfplumber - PDF text extraction
pytesseract - Image OCR
Pillow - Image processing