Spaces:

JatinAutonomousLabs
/

PDF_analyst

Paused

File size: 5,472 Bytes

88d2f36
2c5e855
 
 
 
88d2f36
2c5e855
88d2f36
 
2c5e855
cbd78de
88d2f36
 
2c5e855

---
title: PDF Analysis & Orchestrator
emoji: 📄
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
short_description: AI-powered PDF analysis with advanced features
---

# 📄 PDF Analysis & Orchestrator

A powerful, intelligent PDF analysis tool that provides comprehensive document processing through AI-powered agents. This application offers advanced features including document chunking, caching, streaming responses, batch processing, and custom prompt management.

## 🚀 Features

### Core Analysis
- **AI-Powered Analysis**: GPT-4 powered document analysis with context-aware responses
- **Audience Adaptation**: Automatically adapts explanations for different audiences
- **Document Segmentation**: Identifies and segments documents by themes and topics
- **Multi-Agent Orchestration**: Specialized AI agents for different analysis aspects

### Performance Optimizations
- **Document Chunking**: Smart processing of large documents (>15k chars) with sentence boundary detection
- **Caching System**: PDF text extraction caching for improved performance
- **Streaming Responses**: Real-time progress updates and status indicators
- **Configurable Parameters**: Adjustable chunk sizes and processing options

### Enhanced Features
- **Batch Processing**: Handle multiple PDFs simultaneously with comprehensive reporting
- **Result Export**: Export analysis results in TXT, JSON, and PDF formats
- **Custom Prompts**: Save, manage, and reuse custom analysis prompts
- **Progress Indicators**: Real-time feedback during long-running analyses
- **Session Management**: Per-user session isolation with persistent storage

## 🎯 Use Cases

- **Document Summarization**: Create concise summaries of complex documents
- **Technical Explanation**: Explain technical content for general audiences
- **Executive Summaries**: Generate high-level overviews for decision makers
- **Content Analysis**: Extract key findings and insights from documents
- **Batch Processing**: Analyze multiple documents with consistent instructions
- **Research Assistance**: Process and analyze research papers and reports

## 🛠️ Setup

### Prerequisites
- Python 3.10+
- OpenAI API key

### Installation

1. **Clone the repository:**
   ```bash
   git clone https://huggingface.co/spaces/your-username/pdf-analysis-orchestrator
   cd pdf-analysis-orchestrator
   ```

2. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up environment variables:**
   ```bash
   export OPENAI_API_KEY="sk-your-api-key-here"
   ```

4. **Run the application:**
   ```bash
   python app.py
   ```

## 📖 Usage

### Single Document Analysis
1. Upload a PDF document
2. Enter your analysis instructions
3. Choose analysis options (streaming, chunk size)
4. Click "Analyze & Orchestrate"
5. View results and export if needed

### Batch Processing
1. Upload multiple PDF files
2. Enter batch analysis instructions
3. Click "Process Batch"
4. Review comprehensive batch results

### Custom Prompts
1. Go to "Manage Prompts" tab
2. Create custom prompt templates
3. Organize by categories
4. Reuse prompts across analyses

## 🏗️ Architecture

### Core Components
- **AnalysisAgent**: Primary analysis engine using GPT-4
- **CollaborationAgent**: Provides reviewer-style feedback
- **ConversationAgent**: Handles user interaction
- **MasterOrchestrator**: Coordinates agent interactions

### Key Files
- `app.py`: Main application with Gradio interface
- `agents.py`: AI agent implementations with streaming support
- `config.py`: Centralized configuration management
- `utils/`: Utility functions for PDF processing, caching, and export

## 🔧 Configuration

### Environment Variables
- `OPENAI_API_KEY`: Required OpenAI API key
- `OPENAI_MODEL`: Model to use (default: gpt-4)
- `CHUNK_SIZE`: Document chunk size (default: 15000)
- `CACHE_ENABLED`: Enable caching (default: true)
- `ANALYSIS_MAX_UPLOAD_MB`: Max upload size in MB (default: 50)

### Model Configuration
- **Temperature**: 0.2 (consistent, focused responses)
- **Max tokens**: 1000 (concise but comprehensive)
- **System prompts**: Designed for high-quality output

## 📊 Performance

- **Response Time**: Typically 2-5 seconds for analysis
- **File Size Limit**: 50MB (configurable)
- **Concurrent Users**: Supports multiple simultaneous sessions
- **Memory Usage**: Optimized for efficient processing
- **Caching**: Reduces processing time for repeated documents

## 🔒 Security

- File size validation
- Session isolation
- Secure file handling
- No persistent storage of sensitive data
- Environment-based configuration

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## 📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🙏 Acknowledgments

- Built on the successful Analysis & Orchestrate feature from Sharmaji ka PDF Blaster V1
- Powered by OpenAI's GPT-4 model
- UI framework: Gradio
- PDF processing: pdfplumber

## 📞 Support

For issues and questions:
1. Check the documentation
2. Review existing issues
3. Create a new issue with detailed information

---

**Note**: This project focuses exclusively on the Analysis & Orchestrate functionality, providing the same high-quality results in a streamlined, focused package with enhanced performance and user experience.