docurizer / README.md
the-carnage's picture
Add Space metadata
b00aa55

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade
metadata
title: Docurizzer
emoji: πŸ“„
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.32.0
app_file: app.py
pinned: false

πŸ“„πŸ–Ό Docurizzer - Document Summarizer

A Streamlit application that extracts text from PDFs and images, then summarizes them using AI.

Features

  • PDF Text Extraction: Extract text from PDF documents using pdfplumber
  • Image OCR: Extract text from images (PNG, JPG, JPEG) using Tesseract OCR
  • AI Summarization: Summarize extracted text using T5-small model

How to Use

  1. Upload a PDF or image file
  2. Preview the extracted text
  3. Click "Summarize" to generate a summary

Local Development

# Install dependencies
pip install -r requirements.txt

# Install Tesseract OCR (macOS)
brew install tesseract

# Install Tesseract OCR (Ubuntu/Debian)
sudo apt-get install tesseract-ocr

# Run the app
streamlit run app.py

Hugging Face Spaces Deployment

This app is configured for deployment on Hugging Face Spaces:

  • requirements.txt - Python dependencies
  • packages.txt - System packages (Tesseract OCR)

Tech Stack

  • Streamlit - Web interface
  • Transformers - T5-small for summarization
  • pdfplumber - PDF text extraction
  • pytesseract - Image OCR
  • Pillow - Image processing