Spaces:

the-carnage
/

docurizer

Running

docurizer / README.md

Add Space metadata

b00aa55 2 months ago

1.29 kB

	---
	title: Docurizzer
	emoji: 📄
	colorFrom: blue
	colorTo: purple
	sdk: streamlit
	sdk_version: "1.32.0"
	app_file: app.py
	pinned: false
	---

	# 📄🖼 Docurizzer - Document Summarizer

	A Streamlit application that extracts text from PDFs and images, then summarizes them using AI.

	## Features

	- PDF Text Extraction: Extract text from PDF documents using pdfplumber
	- Image OCR: Extract text from images (PNG, JPG, JPEG) using Tesseract OCR
	- AI Summarization: Summarize extracted text using T5-small model

	## How to Use

	1. Upload a PDF or image file
	2. Preview the extracted text
	3. Click "Summarize" to generate a summary

	## Local Development

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Install Tesseract OCR (macOS)
	brew install tesseract

	# Install Tesseract OCR (Ubuntu/Debian)
	sudo apt-get install tesseract-ocr

	# Run the app
	streamlit run app.py
	```

	## Hugging Face Spaces Deployment

	This app is configured for deployment on Hugging Face Spaces:

	- `requirements.txt` - Python dependencies
	- `packages.txt` - System packages (Tesseract OCR)

	## Tech Stack

	- Streamlit - Web interface
	- Transformers - T5-small for summarization
	- pdfplumber - PDF text extraction
	- pytesseract - Image OCR
	- Pillow - Image processing