# Luna OCR Backend Real OCR processing backend using Gemini AI for intelligent text extraction and formatting. ## 🚀 Quick Start ### 1. Install Dependencies ```bash cd server npm install ``` ### 2. Start the Server ```bash npm start # or for development with auto-reload: npm run dev ``` ### 3. Test the API ```bash curl http://localhost:3001/api/health ``` ## 📡 API Endpoints ### Health Check ``` GET /api/health ``` ### OCR Processing ``` POST /api/ocr Content-Type: multipart/form-data Parameters: - file: Image file (PNG, JPG, WebP) or PDF - apiKey: Google Gemini API key - mode: "standard" or "structured" ``` ## 🔧 Configuration ### Environment Variables Create a `.env` file (optional): ``` PORT=3001 MAX_FILE_SIZE=10485760 ``` ### Supported File Types - **Images**: PNG, JPG, JPEG, WebP - **Documents**: PDF (converted to images) - **Max Size**: 10MB per file ## 🎯 Processing Modes ### Standard Mode - Uses Gemini 1.5 Flash (faster) - Returns clean plain text - Good for simple text extraction ### Structured Mode - Uses Gemini 1.5 Pro (more intelligent) - Returns formatted Markdown - Creates tables, headers, lists automatically - Perfect for complex documents ## 📊 Response Format ```json { "success": true, "data": { "fileName": "document.png", "fileSize": 1234567, "processingMode": "structured", "extractedText": "# Document Title\n\n...", "formats": { "txt": "plain text version", "md": "markdown version", "json": { "metadata": {...}, "content": {...} } }, "metadata": { "characterCount": 1500, "wordCount": 250, "lineCount": 45, "processedAt": "2024-01-01T12:00:00.000Z" } } } ``` ## 🛠️ Development ### Project Structure ``` server/ ├── server.js # Main server file ├── package.json # Dependencies ├── uploads/ # Temporary file storage └── README.md # This file ``` ### Key Features - **Image Enhancement**: Automatic image preprocessing for better OCR - **Smart Formatting**: Gemini AI creates beautiful Markdown output - **Multiple Formats**: Returns TXT, MD, and JSON formats - **Error Handling**: Comprehensive error handling and cleanup - **File Cleanup**: Automatic temporary file cleanup ## 🔑 Getting Gemini API Key 1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey) 2. Create a new API key 3. Copy the key and use it in the frontend ## 🚨 Troubleshooting ### Common Issues **"Cannot connect to OCR backend"** - Make sure server is running: `npm start` - Check port 3001 is not in use - Verify no firewall blocking **"Invalid API key"** - Check your Gemini API key is correct - Ensure API key has proper permissions - Try creating a new API key **"File too large"** - Maximum file size is 10MB - Compress images before uploading - For PDFs, try splitting into smaller files **"Processing failed"** - Check image quality (not too blurry) - Ensure text is clearly visible - Try different processing mode ### Debug Mode Set `NODE_ENV=development` for detailed logging: ```bash NODE_ENV=development npm start ``` ## 📝 Notes - Server runs on port 3001 by default - Temporary files are automatically cleaned up - CORS is enabled for frontend integration - Image enhancement improves OCR accuracy - Gemini AI provides intelligent text formatting ## 🔗 Integration The backend integrates seamlessly with the Luna OCR React frontend. Make sure both are running: 1. **Backend**: `cd server && npm start` (port 3001) 2. **Frontend**: `npm start` (port 3000) The frontend will automatically call the backend API for real OCR processing!