File size: 5,472 Bytes
88d2f36
2c5e855
 
 
 
88d2f36
2c5e855
88d2f36
 
2c5e855
cbd78de
88d2f36
 
2c5e855
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
title: PDF Analysis & Orchestrator
emoji: πŸ“„
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
short_description: AI-powered PDF analysis with advanced features
---

# πŸ“„ PDF Analysis & Orchestrator

A powerful, intelligent PDF analysis tool that provides comprehensive document processing through AI-powered agents. This application offers advanced features including document chunking, caching, streaming responses, batch processing, and custom prompt management.

## πŸš€ Features

### Core Analysis
- **AI-Powered Analysis**: GPT-4 powered document analysis with context-aware responses
- **Audience Adaptation**: Automatically adapts explanations for different audiences
- **Document Segmentation**: Identifies and segments documents by themes and topics
- **Multi-Agent Orchestration**: Specialized AI agents for different analysis aspects

### Performance Optimizations
- **Document Chunking**: Smart processing of large documents (>15k chars) with sentence boundary detection
- **Caching System**: PDF text extraction caching for improved performance
- **Streaming Responses**: Real-time progress updates and status indicators
- **Configurable Parameters**: Adjustable chunk sizes and processing options

### Enhanced Features
- **Batch Processing**: Handle multiple PDFs simultaneously with comprehensive reporting
- **Result Export**: Export analysis results in TXT, JSON, and PDF formats
- **Custom Prompts**: Save, manage, and reuse custom analysis prompts
- **Progress Indicators**: Real-time feedback during long-running analyses
- **Session Management**: Per-user session isolation with persistent storage

## 🎯 Use Cases

- **Document Summarization**: Create concise summaries of complex documents
- **Technical Explanation**: Explain technical content for general audiences
- **Executive Summaries**: Generate high-level overviews for decision makers
- **Content Analysis**: Extract key findings and insights from documents
- **Batch Processing**: Analyze multiple documents with consistent instructions
- **Research Assistance**: Process and analyze research papers and reports

## πŸ› οΈ Setup

### Prerequisites
- Python 3.10+
- OpenAI API key

### Installation

1. **Clone the repository:**
   ```bash
   git clone https://huggingface.co/spaces/your-username/pdf-analysis-orchestrator
   cd pdf-analysis-orchestrator
   ```

2. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up environment variables:**
   ```bash
   export OPENAI_API_KEY="sk-your-api-key-here"
   ```

4. **Run the application:**
   ```bash
   python app.py
   ```

## πŸ“– Usage

### Single Document Analysis
1. Upload a PDF document
2. Enter your analysis instructions
3. Choose analysis options (streaming, chunk size)
4. Click "Analyze & Orchestrate"
5. View results and export if needed

### Batch Processing
1. Upload multiple PDF files
2. Enter batch analysis instructions
3. Click "Process Batch"
4. Review comprehensive batch results

### Custom Prompts
1. Go to "Manage Prompts" tab
2. Create custom prompt templates
3. Organize by categories
4. Reuse prompts across analyses

## πŸ—οΈ Architecture

### Core Components
- **AnalysisAgent**: Primary analysis engine using GPT-4
- **CollaborationAgent**: Provides reviewer-style feedback
- **ConversationAgent**: Handles user interaction
- **MasterOrchestrator**: Coordinates agent interactions

### Key Files
- `app.py`: Main application with Gradio interface
- `agents.py`: AI agent implementations with streaming support
- `config.py`: Centralized configuration management
- `utils/`: Utility functions for PDF processing, caching, and export

## πŸ”§ Configuration

### Environment Variables
- `OPENAI_API_KEY`: Required OpenAI API key
- `OPENAI_MODEL`: Model to use (default: gpt-4)
- `CHUNK_SIZE`: Document chunk size (default: 15000)
- `CACHE_ENABLED`: Enable caching (default: true)
- `ANALYSIS_MAX_UPLOAD_MB`: Max upload size in MB (default: 50)

### Model Configuration
- **Temperature**: 0.2 (consistent, focused responses)
- **Max tokens**: 1000 (concise but comprehensive)
- **System prompts**: Designed for high-quality output

## πŸ“Š Performance

- **Response Time**: Typically 2-5 seconds for analysis
- **File Size Limit**: 50MB (configurable)
- **Concurrent Users**: Supports multiple simultaneous sessions
- **Memory Usage**: Optimized for efficient processing
- **Caching**: Reduces processing time for repeated documents

## πŸ”’ Security

- File size validation
- Session isolation
- Secure file handling
- No persistent storage of sensitive data
- Environment-based configuration

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

## πŸ™ Acknowledgments

- Built on the successful Analysis & Orchestrate feature from Sharmaji ka PDF Blaster V1
- Powered by OpenAI's GPT-4 model
- UI framework: Gradio
- PDF processing: pdfplumber

## πŸ“ž Support

For issues and questions:
1. Check the documentation
2. Review existing issues
3. Create a new issue with detailed information

---

**Note**: This project focuses exclusively on the Analysis & Orchestrate functionality, providing the same high-quality results in a streamlined, focused package with enhanced performance and user experience.