File size: 3,834 Bytes
373c769 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
# Luna OCR Backend
Real OCR processing backend using Gemini AI for intelligent text extraction and formatting.
## π Quick Start
### 1. Install Dependencies
```bash
cd server
npm install
```
### 2. Start the Server
```bash
npm start
# or for development with auto-reload:
npm run dev
```
### 3. Test the API
```bash
curl http://localhost:3001/api/health
```
## π‘ API Endpoints
### Health Check
```
GET /api/health
```
### OCR Processing
```
POST /api/ocr
Content-Type: multipart/form-data
Parameters:
- file: Image file (PNG, JPG, WebP) or PDF
- apiKey: Google Gemini API key
- mode: "standard" or "structured"
```
## π§ Configuration
### Environment Variables
Create a `.env` file (optional):
```
PORT=3001
MAX_FILE_SIZE=10485760
```
### Supported File Types
- **Images**: PNG, JPG, JPEG, WebP
- **Documents**: PDF (converted to images)
- **Max Size**: 10MB per file
## π― Processing Modes
### Standard Mode
- Uses Gemini 1.5 Flash (faster)
- Returns clean plain text
- Good for simple text extraction
### Structured Mode
- Uses Gemini 1.5 Pro (more intelligent)
- Returns formatted Markdown
- Creates tables, headers, lists automatically
- Perfect for complex documents
## π Response Format
```json
{
"success": true,
"data": {
"fileName": "document.png",
"fileSize": 1234567,
"processingMode": "structured",
"extractedText": "# Document Title\n\n...",
"formats": {
"txt": "plain text version",
"md": "markdown version",
"json": { "metadata": {...}, "content": {...} }
},
"metadata": {
"characterCount": 1500,
"wordCount": 250,
"lineCount": 45,
"processedAt": "2024-01-01T12:00:00.000Z"
}
}
}
```
## π οΈ Development
### Project Structure
```
server/
βββ server.js # Main server file
βββ package.json # Dependencies
βββ uploads/ # Temporary file storage
βββ README.md # This file
```
### Key Features
- **Image Enhancement**: Automatic image preprocessing for better OCR
- **Smart Formatting**: Gemini AI creates beautiful Markdown output
- **Multiple Formats**: Returns TXT, MD, and JSON formats
- **Error Handling**: Comprehensive error handling and cleanup
- **File Cleanup**: Automatic temporary file cleanup
## π Getting Gemini API Key
1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Create a new API key
3. Copy the key and use it in the frontend
## π¨ Troubleshooting
### Common Issues
**"Cannot connect to OCR backend"**
- Make sure server is running: `npm start`
- Check port 3001 is not in use
- Verify no firewall blocking
**"Invalid API key"**
- Check your Gemini API key is correct
- Ensure API key has proper permissions
- Try creating a new API key
**"File too large"**
- Maximum file size is 10MB
- Compress images before uploading
- For PDFs, try splitting into smaller files
**"Processing failed"**
- Check image quality (not too blurry)
- Ensure text is clearly visible
- Try different processing mode
### Debug Mode
Set `NODE_ENV=development` for detailed logging:
```bash
NODE_ENV=development npm start
```
## π Notes
- Server runs on port 3001 by default
- Temporary files are automatically cleaned up
- CORS is enabled for frontend integration
- Image enhancement improves OCR accuracy
- Gemini AI provides intelligent text formatting
## π Integration
The backend integrates seamlessly with the Luna OCR React frontend. Make sure both are running:
1. **Backend**: `cd server && npm start` (port 3001)
2. **Frontend**: `npm start` (port 3000)
The frontend will automatically call the backend API for real OCR processing! |