---
title: Docurizzer
emoji: 📄
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: "1.32.0"
app_file: app.py
pinned: false
---

# 📄🖼 Docurizzer - Document Summarizer

A Streamlit application that extracts text from PDFs and images, then summarizes them using AI.

## Features

- **PDF Text Extraction**: Extract text from PDF documents using pdfplumber
- **Image OCR**: Extract text from images (PNG, JPG, JPEG) using Tesseract OCR
- **AI Summarization**: Summarize extracted text using T5-small model

## How to Use

1. Upload a PDF or image file
2. Preview the extracted text
3. Click "Summarize" to generate a summary

## Local Development

```bash
# Install dependencies
pip install -r requirements.txt

# Install Tesseract OCR (macOS)
brew install tesseract

# Install Tesseract OCR (Ubuntu/Debian)
sudo apt-get install tesseract-ocr

# Run the app
streamlit run app.py
```

## Hugging Face Spaces Deployment

This app is configured for deployment on Hugging Face Spaces:

- `requirements.txt` - Python dependencies
- `packages.txt` - System packages (Tesseract OCR)

## Tech Stack

- **Streamlit** - Web interface
- **Transformers** - T5-small for summarization
- **pdfplumber** - PDF text extraction
- **pytesseract** - Image OCR
- **Pillow** - Image processing