# Unified AI Services A comprehensive AI platform that integrates Named Entity Recognition (NER), Optical Character Recognition (OCR), and Retrieval-Augmented Generation (RAG) services into a unified application. ## ๐ŸŒŸ Features ### Core Services - **NER Service** (Port 8500): Advanced named entity recognition with relationship extraction - **OCR Service** (Port 8400): Document processing with Azure Document Intelligence - **RAG Service** (Port 8401): Vector search and document retrieval - **Unified App** (Port 8000): Coordinated workflows and service management ### Key Capabilities - โœ… Multi-language support (Thai + English) - โœ… Complex relationship extraction - โœ… Entity deduplication - โœ… Graph database exports (Neo4j, GraphML, GEXF) - โœ… Vector search with semantic similarity - โœ… Document processing (PDF, images, text) - โœ… Real-time service health monitoring - โœ… Unified workflows combining all services - โœ… Comprehensive API documentation ## ๐Ÿš€ Quick Start ### Prerequisites - Python 3.8 or higher - PostgreSQL with vector extension support - Azure OpenAI account - Azure Document Intelligence account - DeepSeek API account (for advanced NER) ### Automated Setup 1. **Clone and navigate to the project directory** ```bash cd unified-ai-services ``` 2. **Run the automated setup** ```bash python setup.py ``` This will: - Check your Python environment - Create necessary directories - Help you configure .env file - Install dependencies - Validate configuration - Create startup scripts 3. **Start the unified application** ```bash python app.py ``` Or use the generated scripts: - Windows: `start_services.bat` - Unix/Linux/Mac: `./start_services.sh` 4. **Run comprehensive tests** ```bash python test_unified.py ``` Or use the generated scripts: - Windows: `run_tests.bat` - Unix/Linux/Mac: `./run_tests.sh` ### Manual Setup If you prefer manual setup: 1. **Install dependencies** ```bash pip install -r requirements.txt ``` 2. **Create .env file** (copy from .env.example) ```bash cp .env.example .env # Edit .env with your configuration ``` 3. **Set up directories** ```bash mkdir -p services exports logs temp tests data ``` 4. **Place service files in the services directory** ``` services/ โ”œโ”€โ”€ ner_service.py โ”œโ”€โ”€ ocr_service.py โ””โ”€โ”€ rag_service.py ``` ## ๐Ÿ“ Project Structure ``` unified-ai-services/ โ”œโ”€โ”€ app.py # Main unified application โ”œโ”€โ”€ configs.py # Centralized configuration โ”œโ”€โ”€ setup.py # Automated setup script โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ test_unified.py # Comprehensive test suite โ”œโ”€โ”€ .env # Environment configuration โ”œโ”€โ”€ services/ # Individual service files โ”‚ โ”œโ”€โ”€ ner_service.py # NER service implementation โ”‚ โ”œโ”€โ”€ ocr_service.py # OCR service implementation โ”‚ โ””โ”€โ”€ rag_service.py # RAG service implementation โ”œโ”€โ”€ exports/ # Generated export files โ”œโ”€โ”€ logs/ # Application logs โ”œโ”€โ”€ temp/ # Temporary files โ”œโ”€โ”€ tests/ # Additional test files โ””โ”€โ”€ data/ # Data files ``` ## โš™๏ธ Configuration ### Environment Variables The system uses a `.env` file for configuration. Key variables include: #### Server Configuration ```bash HOST=0.0.0.0 DEBUG=True MAIN_PORT=8000 NER_PORT=8500 OCR_PORT=8400 RAG_PORT=8401 ``` #### Database Configuration ```bash POSTGRES_HOST=your-postgres-server.com POSTGRES_PORT=5432 POSTGRES_USER=your-username POSTGRES_PASSWORD=your-password POSTGRES_DATABASE=postgres ``` #### Azure OpenAI Configuration ```bash AZURE_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/ AZURE_OPENAI_API_KEY=your-api-key EMBEDDING_MODEL=text-embedding-3-large ``` #### DeepSeek Configuration ```bash DEEPSEEK_ENDPOINT=https://your-deepseek-endpoint/ DEEPSEEK_API_KEY=your-deepseek-key DEEPSEEK_MODEL=DeepSeek-R1-0528 ``` #### Azure Document Intelligence Configuration ```bash AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-di.cognitiveservices.azure.com/ AZURE_DOCUMENT_INTELLIGENCE_KEY=your-di-key ``` #### Azure Storage Configuration ```bash AZURE_STORAGE_ACCOUNT_URL=https://yourstorage.blob.core.windows.net/ AZURE_BLOB_SAS_TOKEN=your-sas-token BLOB_CONTAINER=historylog ``` ## ๐Ÿ”ง API Documentation Once running, access the interactive API documentation: - **Unified API**: http://localhost:8000/docs - **NER Service**: http://localhost:8500/docs - **OCR Service**: http://localhost:8400/docs - **RAG Service**: http://localhost:8401/docs ## ๐ŸŽฏ API Usage Examples ### 1. Unified Analysis (Text + RAG Indexing) ```python import httpx async def unified_analysis(): data = { "text": "Your text content here...", "extract_relationships": True, "include_embeddings": False, "generate_graph_files": True, "export_formats": ["neo4j", "json"], "enable_rag_indexing": True, "rag_title": "My Document", "rag_keywords": ["keyword1", "keyword2"] } async with httpx.AsyncClient() as client: response = await client.post("http://localhost:8000/analyze/unified", json=data) return response.json() ``` ### 2. Combined Search with NER Analysis ```python async def combined_search(): data = { "query": "search query here", "limit": 10, "similarity_threshold": 0.2, "include_ner_analysis": True } async with httpx.AsyncClient() as client: response = await client.post("http://localhost:8000/search/combined", json=data) return response.json() ``` ### 3. File Upload Analysis ```python async def analyze_file(): files = {"file": ("document.pdf", open("document.pdf", "rb"), "application/pdf")} data = { "extract_relationships": "true", "generate_graph_files": "true", "export_formats": "neo4j,json" } async with httpx.AsyncClient() as client: response = await client.post("http://localhost:8000/ner/analyze/file", files=files, data=data) return response.json() ``` ## ๐Ÿงช Testing ### Comprehensive Test Suite The project includes comprehensive tests covering: - โœ… Service health checks - โœ… Individual service functionality - โœ… Unified workflow testing - โœ… Service proxy functionality - โœ… Error handling and resilience - โœ… Performance testing - โœ… File upload/download testing Run tests with: ```bash python test_unified.py ``` ### Individual Service Tests Test individual services: ```bash # Test NER service python test_ner.py # Test RAG service python test_rag.py ``` ### Quick Health Check ```bash curl http://localhost:8000/health ``` ## ๐Ÿ” Monitoring and Health Checks ### Health Endpoints - **Unified System**: `GET /health` - **Individual Services**: `GET /ner/health`, `GET /ocr/health`, `GET /rag/health` - **Detailed Status**: `GET /status` - **Service Discovery**: `GET /services` ### Monitoring Features - Real-time service health monitoring - Response time tracking - Service uptime monitoring - Error rate tracking - Resource usage monitoring ## ๐Ÿ“Š Service Architecture ```mermaid graph TB Client[Client Applications] subgraph "Unified AI Services (Port 8000)" UA[Unified App] Proxy[Service Proxies] Health[Health Monitor] end subgraph "Core Services" NER[NER Service
Port 8500] OCR[OCR Service
Port 8400] RAG[RAG Service
Port 8401] end subgraph "External Services" Azure[Azure Services] DeepSeek[DeepSeek API] DB[(PostgreSQL)] end Client --> UA UA --> Proxy Proxy --> NER Proxy --> OCR Proxy --> RAG NER --> Azure NER --> DeepSeek NER --> DB OCR --> Azure RAG --> Azure RAG --> DB RAG --> OCR ``` ## ๐Ÿ› ๏ธ Development ### Adding New Features 1. **Service Modifications**: Update individual service files in `services/` 2. **Unified Workflows**: Modify `app.py` for new combined workflows 3. **Configuration**: Update `configs.py` for new settings 4. **Tests**: Add tests to `test_unified.py` ### Debugging 1. **Check Service Logs**: Services log to console 2. **Health Checks**: Use `/health` endpoints 3. **Configuration**: Run `python configs.py` to validate 4. **Database**: Check PostgreSQL connectivity 5. **Azure Services**: Verify API keys and endpoints ### Service Management Start individual services for development: ```bash # Start NER service only cd services && python ner_service.py # Start OCR service only cd services && python ocr_service.py # Start RAG service only cd services && python rag_service.py ``` ## ๐Ÿšจ Troubleshooting ### Common Issues #### 1. Services Won't Start - Check port availability: `netstat -an | grep :8000` - Verify Python dependencies: `pip list` - Check .env configuration: `python configs.py` #### 2. Database Connection Issues - Verify PostgreSQL is running - Check connection string in .env - Test connectivity: `python -c "import asyncpg; asyncio.run(asyncpg.connect('your-connection-string'))"` #### 3. Azure Service Issues - Verify API keys and endpoints - Check Azure service status - Review rate limits and quotas #### 4. Performance Issues - Monitor resource usage: `top` or Task Manager - Check database performance - Review log files for errors ### Error Codes - **500**: Internal service error - **503**: Service unavailable - **400**: Bad request (check input data) - **422**: Validation error - **404**: Endpoint not found ## ๐Ÿ“ˆ Performance Optimization ### Recommended Settings #### Production Configuration ```bash DEBUG=False MAX_FILE_SIZE=50 REQUEST_TIMEOUT=300 CHUNK_SIZE=1000 CHUNK_OVERLAP=200 ``` #### Database Optimization - Use connection pooling - Configure appropriate indexes - Monitor query performance - Regular maintenance #### Service Optimization - Enable caching where appropriate - Use async operations - Optimize batch processing - Monitor memory usage ## ๐Ÿ” Security Considerations ### API Security - Implement authentication/authorization as needed - Use HTTPS in production - Validate all input data - Rate limiting ### Data Security - Secure database connections (SSL) - Encrypt sensitive data - Regular security updates - Monitor access logs ### Azure Security - Rotate API keys regularly - Use managed identities where possible - Monitor usage and costs - Follow Azure security best practices ## ๐Ÿ“ License This project is licensed under the MIT License - see the LICENSE file for details. ## ๐Ÿค Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Add tests for new functionality 5. Run the test suite 6. Submit a pull request ## ๐Ÿ“ž Support For support and questions: 1. Check this README for common issues 2. Review the test suite for usage examples 3. Check service logs for error details 4. Verify configuration with `python configs.py` ## ๐ŸŽฏ Roadmap ### Current Version (1.0.0) - โœ… Unified service integration - โœ… Comprehensive testing - โœ… Multi-language support - โœ… Graph database exports ### Future Enhancements - ๐Ÿ”„ Advanced caching mechanisms - ๐Ÿ”„ Enhanced monitoring and analytics - ๐Ÿ”„ Additional export formats - ๐Ÿ”„ Improved error recovery - ๐Ÿ”„ Performance optimizations - ๐Ÿ”„ Additional language support