Spaces:

JatinAutonomousLabs
/

PDF_analyst

Paused

App Files Files Community

PDF_analyst / README.md

JatsTheAIGen

Shorten description further

cbd78de 4 months ago

preview code

raw

history blame contribute delete

5.47 kB

	---
	title: PDF Analysis & Orchestrator
	emoji: 📄
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: mit
	short_description: AI-powered PDF analysis with advanced features
	---

	# 📄 PDF Analysis & Orchestrator

	A powerful, intelligent PDF analysis tool that provides comprehensive document processing through AI-powered agents. This application offers advanced features including document chunking, caching, streaming responses, batch processing, and custom prompt management.

	## 🚀 Features

	### Core Analysis
	- AI-Powered Analysis: GPT-4 powered document analysis with context-aware responses
	- Audience Adaptation: Automatically adapts explanations for different audiences
	- Document Segmentation: Identifies and segments documents by themes and topics
	- Multi-Agent Orchestration: Specialized AI agents for different analysis aspects

	### Performance Optimizations
	- Document Chunking: Smart processing of large documents (>15k chars) with sentence boundary detection
	- Caching System: PDF text extraction caching for improved performance
	- Streaming Responses: Real-time progress updates and status indicators
	- Configurable Parameters: Adjustable chunk sizes and processing options

	### Enhanced Features
	- Batch Processing: Handle multiple PDFs simultaneously with comprehensive reporting
	- Result Export: Export analysis results in TXT, JSON, and PDF formats
	- Custom Prompts: Save, manage, and reuse custom analysis prompts
	- Progress Indicators: Real-time feedback during long-running analyses
	- Session Management: Per-user session isolation with persistent storage

	## 🎯 Use Cases

	- Document Summarization: Create concise summaries of complex documents
	- Technical Explanation: Explain technical content for general audiences
	- Executive Summaries: Generate high-level overviews for decision makers
	- Content Analysis: Extract key findings and insights from documents
	- Batch Processing: Analyze multiple documents with consistent instructions
	- Research Assistance: Process and analyze research papers and reports

	## 🛠️ Setup

	### Prerequisites
	- Python 3.10+
	- OpenAI API key

	### Installation

	1. Clone the repository:
	```bash
	git clone https://huggingface.co/spaces/your-username/pdf-analysis-orchestrator
	cd pdf-analysis-orchestrator
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Set up environment variables:
	```bash
	export OPENAI_API_KEY="sk-your-api-key-here"
	```

	4. Run the application:
	```bash
	python app.py
	```

	## 📖 Usage

	### Single Document Analysis
	1. Upload a PDF document
	2. Enter your analysis instructions
	3. Choose analysis options (streaming, chunk size)
	4. Click "Analyze & Orchestrate"
	5. View results and export if needed

	### Batch Processing
	1. Upload multiple PDF files
	2. Enter batch analysis instructions
	3. Click "Process Batch"
	4. Review comprehensive batch results

	### Custom Prompts
	1. Go to "Manage Prompts" tab
	2. Create custom prompt templates
	3. Organize by categories
	4. Reuse prompts across analyses

	## 🏗️ Architecture

	### Core Components
	- AnalysisAgent: Primary analysis engine using GPT-4
	- CollaborationAgent: Provides reviewer-style feedback
	- ConversationAgent: Handles user interaction
	- MasterOrchestrator: Coordinates agent interactions

	### Key Files
	- `app.py`: Main application with Gradio interface
	- `agents.py`: AI agent implementations with streaming support
	- `config.py`: Centralized configuration management
	- `utils/`: Utility functions for PDF processing, caching, and export

	## 🔧 Configuration

	### Environment Variables
	- `OPENAI_API_KEY`: Required OpenAI API key
	- `OPENAI_MODEL`: Model to use (default: gpt-4)
	- `CHUNK_SIZE`: Document chunk size (default: 15000)
	- `CACHE_ENABLED`: Enable caching (default: true)
	- `ANALYSIS_MAX_UPLOAD_MB`: Max upload size in MB (default: 50)

	### Model Configuration
	- Temperature: 0.2 (consistent, focused responses)
	- Max tokens: 1000 (concise but comprehensive)
	- System prompts: Designed for high-quality output

	## 📊 Performance

	- Response Time: Typically 2-5 seconds for analysis
	- File Size Limit: 50MB (configurable)
	- Concurrent Users: Supports multiple simultaneous sessions
	- Memory Usage: Optimized for efficient processing
	- Caching: Reduces processing time for repeated documents

	## 🔒 Security

	- File size validation
	- Session isolation
	- Secure file handling
	- No persistent storage of sensitive data
	- Environment-based configuration

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Add tests if applicable
	5. Submit a pull request

	## 📝 License

	This project is licensed under the MIT License - see the LICENSE file for details.

	## 🙏 Acknowledgments

	- Built on the successful Analysis & Orchestrate feature from Sharmaji ka PDF Blaster V1
	- Powered by OpenAI's GPT-4 model
	- UI framework: Gradio
	- PDF processing: pdfplumber

	## 📞 Support

	For issues and questions:
	1. Check the documentation
	2. Review existing issues
	3. Create a new issue with detailed information

	---

	Note: This project focuses exclusively on the Analysis & Orchestrate functionality, providing the same high-quality results in a streamlined, focused package with enhanced performance and user experience.