| # Unified AI Services | |
| A comprehensive AI platform that integrates Named Entity Recognition (NER), Optical Character Recognition (OCR), and Retrieval-Augmented Generation (RAG) services into a unified application. | |
| ## π Features | |
| ### Core Services | |
| - **NER Service** (Port 8500): Advanced named entity recognition with relationship extraction | |
| - **OCR Service** (Port 8400): Document processing with Azure Document Intelligence | |
| - **RAG Service** (Port 8401): Vector search and document retrieval | |
| - **Unified App** (Port 8000): Coordinated workflows and service management | |
| ### Key Capabilities | |
| - β Multi-language support (Thai + English) | |
| - β Complex relationship extraction | |
| - β Entity deduplication | |
| - β Graph database exports (Neo4j, GraphML, GEXF) | |
| - β Vector search with semantic similarity | |
| - β Document processing (PDF, images, text) | |
| - β Real-time service health monitoring | |
| - β Unified workflows combining all services | |
| - β Comprehensive API documentation | |
| ## π Quick Start | |
| ### Prerequisites | |
| - Python 3.8 or higher | |
| - PostgreSQL with vector extension support | |
| - Azure OpenAI account | |
| - Azure Document Intelligence account | |
| - DeepSeek API account (for advanced NER) | |
| ### Automated Setup | |
| 1. **Clone and navigate to the project directory** | |
| ```bash | |
| cd unified-ai-services | |
| ``` | |
| 2. **Run the automated setup** | |
| ```bash | |
| python setup.py | |
| ``` | |
| This will: | |
| - Check your Python environment | |
| - Create necessary directories | |
| - Help you configure .env file | |
| - Install dependencies | |
| - Validate configuration | |
| - Create startup scripts | |
| 3. **Start the unified application** | |
| ```bash | |
| python app.py | |
| ``` | |
| Or use the generated scripts: | |
| - Windows: `start_services.bat` | |
| - Unix/Linux/Mac: `./start_services.sh` | |
| 4. **Run comprehensive tests** | |
| ```bash | |
| python test_unified.py | |
| ``` | |
| Or use the generated scripts: | |
| - Windows: `run_tests.bat` | |
| - Unix/Linux/Mac: `./run_tests.sh` | |
| ### Manual Setup | |
| If you prefer manual setup: | |
| 1. **Install dependencies** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Create .env file** (copy from .env.example) | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env with your configuration | |
| ``` | |
| 3. **Set up directories** | |
| ```bash | |
| mkdir -p services exports logs temp tests data | |
| ``` | |
| 4. **Place service files in the services directory** | |
| ``` | |
| services/ | |
| βββ ner_service.py | |
| βββ ocr_service.py | |
| βββ rag_service.py | |
| ``` | |
| ## π Project Structure | |
| ``` | |
| unified-ai-services/ | |
| βββ app.py # Main unified application | |
| βββ configs.py # Centralized configuration | |
| βββ setup.py # Automated setup script | |
| βββ requirements.txt # Python dependencies | |
| βββ test_unified.py # Comprehensive test suite | |
| βββ .env # Environment configuration | |
| βββ services/ # Individual service files | |
| β βββ ner_service.py # NER service implementation | |
| β βββ ocr_service.py # OCR service implementation | |
| β βββ rag_service.py # RAG service implementation | |
| βββ exports/ # Generated export files | |
| βββ logs/ # Application logs | |
| βββ temp/ # Temporary files | |
| βββ tests/ # Additional test files | |
| βββ data/ # Data files | |
| ``` | |
| ## βοΈ Configuration | |
| ### Environment Variables | |
| The system uses a `.env` file for configuration. Key variables include: | |
| #### Server Configuration | |
| ```bash | |
| HOST=0.0.0.0 | |
| DEBUG=True | |
| MAIN_PORT=8000 | |
| NER_PORT=8500 | |
| OCR_PORT=8400 | |
| RAG_PORT=8401 | |
| ``` | |
| #### Database Configuration | |
| ```bash | |
| POSTGRES_HOST=your-postgres-server.com | |
| POSTGRES_PORT=5432 | |
| POSTGRES_USER=your-username | |
| POSTGRES_PASSWORD=your-password | |
| POSTGRES_DATABASE=postgres | |
| ``` | |
| #### Azure OpenAI Configuration | |
| ```bash | |
| AZURE_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/ | |
| AZURE_OPENAI_API_KEY=your-api-key | |
| EMBEDDING_MODEL=text-embedding-3-large | |
| ``` | |
| #### DeepSeek Configuration | |
| ```bash | |
| DEEPSEEK_ENDPOINT=https://your-deepseek-endpoint/ | |
| DEEPSEEK_API_KEY=your-deepseek-key | |
| DEEPSEEK_MODEL=DeepSeek-R1-0528 | |
| ``` | |
| #### Azure Document Intelligence Configuration | |
| ```bash | |
| AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-di.cognitiveservices.azure.com/ | |
| AZURE_DOCUMENT_INTELLIGENCE_KEY=your-di-key | |
| ``` | |
| #### Azure Storage Configuration | |
| ```bash | |
| AZURE_STORAGE_ACCOUNT_URL=https://yourstorage.blob.core.windows.net/ | |
| AZURE_BLOB_SAS_TOKEN=your-sas-token | |
| BLOB_CONTAINER=historylog | |
| ``` | |
| ## π§ API Documentation | |
| Once running, access the interactive API documentation: | |
| - **Unified API**: http://localhost:8000/docs | |
| - **NER Service**: http://localhost:8500/docs | |
| - **OCR Service**: http://localhost:8400/docs | |
| - **RAG Service**: http://localhost:8401/docs | |
| ## π― API Usage Examples | |
| ### 1. Unified Analysis (Text + RAG Indexing) | |
| ```python | |
| import httpx | |
| async def unified_analysis(): | |
| data = { | |
| "text": "Your text content here...", | |
| "extract_relationships": True, | |
| "include_embeddings": False, | |
| "generate_graph_files": True, | |
| "export_formats": ["neo4j", "json"], | |
| "enable_rag_indexing": True, | |
| "rag_title": "My Document", | |
| "rag_keywords": ["keyword1", "keyword2"] | |
| } | |
| async with httpx.AsyncClient() as client: | |
| response = await client.post("http://localhost:8000/analyze/unified", json=data) | |
| return response.json() | |
| ``` | |
| ### 2. Combined Search with NER Analysis | |
| ```python | |
| async def combined_search(): | |
| data = { | |
| "query": "search query here", | |
| "limit": 10, | |
| "similarity_threshold": 0.2, | |
| "include_ner_analysis": True | |
| } | |
| async with httpx.AsyncClient() as client: | |
| response = await client.post("http://localhost:8000/search/combined", json=data) | |
| return response.json() | |
| ``` | |
| ### 3. File Upload Analysis | |
| ```python | |
| async def analyze_file(): | |
| files = {"file": ("document.pdf", open("document.pdf", "rb"), "application/pdf")} | |
| data = { | |
| "extract_relationships": "true", | |
| "generate_graph_files": "true", | |
| "export_formats": "neo4j,json" | |
| } | |
| async with httpx.AsyncClient() as client: | |
| response = await client.post("http://localhost:8000/ner/analyze/file", files=files, data=data) | |
| return response.json() | |
| ``` | |
| ## π§ͺ Testing | |
| ### Comprehensive Test Suite | |
| The project includes comprehensive tests covering: | |
| - β Service health checks | |
| - β Individual service functionality | |
| - β Unified workflow testing | |
| - β Service proxy functionality | |
| - β Error handling and resilience | |
| - β Performance testing | |
| - β File upload/download testing | |
| Run tests with: | |
| ```bash | |
| python test_unified.py | |
| ``` | |
| ### Individual Service Tests | |
| Test individual services: | |
| ```bash | |
| # Test NER service | |
| python test_ner.py | |
| # Test RAG service | |
| python test_rag.py | |
| ``` | |
| ### Quick Health Check | |
| ```bash | |
| curl http://localhost:8000/health | |
| ``` | |
| ## π Monitoring and Health Checks | |
| ### Health Endpoints | |
| - **Unified System**: `GET /health` | |
| - **Individual Services**: `GET /ner/health`, `GET /ocr/health`, `GET /rag/health` | |
| - **Detailed Status**: `GET /status` | |
| - **Service Discovery**: `GET /services` | |
| ### Monitoring Features | |
| - Real-time service health monitoring | |
| - Response time tracking | |
| - Service uptime monitoring | |
| - Error rate tracking | |
| - Resource usage monitoring | |
| ## π Service Architecture | |
| ```mermaid | |
| graph TB | |
| Client[Client Applications] | |
| subgraph "Unified AI Services (Port 8000)" | |
| UA[Unified App] | |
| Proxy[Service Proxies] | |
| Health[Health Monitor] | |
| end | |
| subgraph "Core Services" | |
| NER[NER Service<br/>Port 8500] | |
| OCR[OCR Service<br/>Port 8400] | |
| RAG[RAG Service<br/>Port 8401] | |
| end | |
| subgraph "External Services" | |
| Azure[Azure Services] | |
| DeepSeek[DeepSeek API] | |
| DB[(PostgreSQL)] | |
| end | |
| Client --> UA | |
| UA --> Proxy | |
| Proxy --> NER | |
| Proxy --> OCR | |
| Proxy --> RAG | |
| NER --> Azure | |
| NER --> DeepSeek | |
| NER --> DB | |
| OCR --> Azure | |
| RAG --> Azure | |
| RAG --> DB | |
| RAG --> OCR | |
| ``` | |
| ## π οΈ Development | |
| ### Adding New Features | |
| 1. **Service Modifications**: Update individual service files in `services/` | |
| 2. **Unified Workflows**: Modify `app.py` for new combined workflows | |
| 3. **Configuration**: Update `configs.py` for new settings | |
| 4. **Tests**: Add tests to `test_unified.py` | |
| ### Debugging | |
| 1. **Check Service Logs**: Services log to console | |
| 2. **Health Checks**: Use `/health` endpoints | |
| 3. **Configuration**: Run `python configs.py` to validate | |
| 4. **Database**: Check PostgreSQL connectivity | |
| 5. **Azure Services**: Verify API keys and endpoints | |
| ### Service Management | |
| Start individual services for development: | |
| ```bash | |
| # Start NER service only | |
| cd services && python ner_service.py | |
| # Start OCR service only | |
| cd services && python ocr_service.py | |
| # Start RAG service only | |
| cd services && python rag_service.py | |
| ``` | |
| ## π¨ Troubleshooting | |
| ### Common Issues | |
| #### 1. Services Won't Start | |
| - Check port availability: `netstat -an | grep :8000` | |
| - Verify Python dependencies: `pip list` | |
| - Check .env configuration: `python configs.py` | |
| #### 2. Database Connection Issues | |
| - Verify PostgreSQL is running | |
| - Check connection string in .env | |
| - Test connectivity: `python -c "import asyncpg; asyncio.run(asyncpg.connect('your-connection-string'))"` | |
| #### 3. Azure Service Issues | |
| - Verify API keys and endpoints | |
| - Check Azure service status | |
| - Review rate limits and quotas | |
| #### 4. Performance Issues | |
| - Monitor resource usage: `top` or Task Manager | |
| - Check database performance | |
| - Review log files for errors | |
| ### Error Codes | |
| - **500**: Internal service error | |
| - **503**: Service unavailable | |
| - **400**: Bad request (check input data) | |
| - **422**: Validation error | |
| - **404**: Endpoint not found | |
| ## π Performance Optimization | |
| ### Recommended Settings | |
| #### Production Configuration | |
| ```bash | |
| DEBUG=False | |
| MAX_FILE_SIZE=50 | |
| REQUEST_TIMEOUT=300 | |
| CHUNK_SIZE=1000 | |
| CHUNK_OVERLAP=200 | |
| ``` | |
| #### Database Optimization | |
| - Use connection pooling | |
| - Configure appropriate indexes | |
| - Monitor query performance | |
| - Regular maintenance | |
| #### Service Optimization | |
| - Enable caching where appropriate | |
| - Use async operations | |
| - Optimize batch processing | |
| - Monitor memory usage | |
| ## π Security Considerations | |
| ### API Security | |
| - Implement authentication/authorization as needed | |
| - Use HTTPS in production | |
| - Validate all input data | |
| - Rate limiting | |
| ### Data Security | |
| - Secure database connections (SSL) | |
| - Encrypt sensitive data | |
| - Regular security updates | |
| - Monitor access logs | |
| ### Azure Security | |
| - Rotate API keys regularly | |
| - Use managed identities where possible | |
| - Monitor usage and costs | |
| - Follow Azure security best practices | |
| ## π License | |
| This project is licensed under the MIT License - see the LICENSE file for details. | |
| ## π€ Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Make your changes | |
| 4. Add tests for new functionality | |
| 5. Run the test suite | |
| 6. Submit a pull request | |
| ## π Support | |
| For support and questions: | |
| 1. Check this README for common issues | |
| 2. Review the test suite for usage examples | |
| 3. Check service logs for error details | |
| 4. Verify configuration with `python configs.py` | |
| ## π― Roadmap | |
| ### Current Version (1.0.0) | |
| - β Unified service integration | |
| - β Comprehensive testing | |
| - β Multi-language support | |
| - β Graph database exports | |
| ### Future Enhancements | |
| - π Advanced caching mechanisms | |
| - π Enhanced monitoring and analytics | |
| - π Additional export formats | |
| - π Improved error recovery | |
| - π Performance optimizations | |
| - π Additional language support |