File size: 5,472 Bytes
88d2f36 2c5e855 88d2f36 2c5e855 88d2f36 2c5e855 cbd78de 88d2f36 2c5e855 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
---
title: PDF Analysis & Orchestrator
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
short_description: AI-powered PDF analysis with advanced features
---
# π PDF Analysis & Orchestrator
A powerful, intelligent PDF analysis tool that provides comprehensive document processing through AI-powered agents. This application offers advanced features including document chunking, caching, streaming responses, batch processing, and custom prompt management.
## π Features
### Core Analysis
- **AI-Powered Analysis**: GPT-4 powered document analysis with context-aware responses
- **Audience Adaptation**: Automatically adapts explanations for different audiences
- **Document Segmentation**: Identifies and segments documents by themes and topics
- **Multi-Agent Orchestration**: Specialized AI agents for different analysis aspects
### Performance Optimizations
- **Document Chunking**: Smart processing of large documents (>15k chars) with sentence boundary detection
- **Caching System**: PDF text extraction caching for improved performance
- **Streaming Responses**: Real-time progress updates and status indicators
- **Configurable Parameters**: Adjustable chunk sizes and processing options
### Enhanced Features
- **Batch Processing**: Handle multiple PDFs simultaneously with comprehensive reporting
- **Result Export**: Export analysis results in TXT, JSON, and PDF formats
- **Custom Prompts**: Save, manage, and reuse custom analysis prompts
- **Progress Indicators**: Real-time feedback during long-running analyses
- **Session Management**: Per-user session isolation with persistent storage
## π― Use Cases
- **Document Summarization**: Create concise summaries of complex documents
- **Technical Explanation**: Explain technical content for general audiences
- **Executive Summaries**: Generate high-level overviews for decision makers
- **Content Analysis**: Extract key findings and insights from documents
- **Batch Processing**: Analyze multiple documents with consistent instructions
- **Research Assistance**: Process and analyze research papers and reports
## π οΈ Setup
### Prerequisites
- Python 3.10+
- OpenAI API key
### Installation
1. **Clone the repository:**
```bash
git clone https://huggingface.co/spaces/your-username/pdf-analysis-orchestrator
cd pdf-analysis-orchestrator
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Set up environment variables:**
```bash
export OPENAI_API_KEY="sk-your-api-key-here"
```
4. **Run the application:**
```bash
python app.py
```
## π Usage
### Single Document Analysis
1. Upload a PDF document
2. Enter your analysis instructions
3. Choose analysis options (streaming, chunk size)
4. Click "Analyze & Orchestrate"
5. View results and export if needed
### Batch Processing
1. Upload multiple PDF files
2. Enter batch analysis instructions
3. Click "Process Batch"
4. Review comprehensive batch results
### Custom Prompts
1. Go to "Manage Prompts" tab
2. Create custom prompt templates
3. Organize by categories
4. Reuse prompts across analyses
## ποΈ Architecture
### Core Components
- **AnalysisAgent**: Primary analysis engine using GPT-4
- **CollaborationAgent**: Provides reviewer-style feedback
- **ConversationAgent**: Handles user interaction
- **MasterOrchestrator**: Coordinates agent interactions
### Key Files
- `app.py`: Main application with Gradio interface
- `agents.py`: AI agent implementations with streaming support
- `config.py`: Centralized configuration management
- `utils/`: Utility functions for PDF processing, caching, and export
## π§ Configuration
### Environment Variables
- `OPENAI_API_KEY`: Required OpenAI API key
- `OPENAI_MODEL`: Model to use (default: gpt-4)
- `CHUNK_SIZE`: Document chunk size (default: 15000)
- `CACHE_ENABLED`: Enable caching (default: true)
- `ANALYSIS_MAX_UPLOAD_MB`: Max upload size in MB (default: 50)
### Model Configuration
- **Temperature**: 0.2 (consistent, focused responses)
- **Max tokens**: 1000 (concise but comprehensive)
- **System prompts**: Designed for high-quality output
## π Performance
- **Response Time**: Typically 2-5 seconds for analysis
- **File Size Limit**: 50MB (configurable)
- **Concurrent Users**: Supports multiple simultaneous sessions
- **Memory Usage**: Optimized for efficient processing
- **Caching**: Reduces processing time for repeated documents
## π Security
- File size validation
- Session isolation
- Secure file handling
- No persistent storage of sensitive data
- Environment-based configuration
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## π License
This project is licensed under the MIT License - see the LICENSE file for details.
## π Acknowledgments
- Built on the successful Analysis & Orchestrate feature from Sharmaji ka PDF Blaster V1
- Powered by OpenAI's GPT-4 model
- UI framework: Gradio
- PDF processing: pdfplumber
## π Support
For issues and questions:
1. Check the documentation
2. Review existing issues
3. Create a new issue with detailed information
---
**Note**: This project focuses exclusively on the Analysis & Orchestrate functionality, providing the same high-quality results in a streamlined, focused package with enhanced performance and user experience. |