# Luna OCR Backend

Real OCR processing backend using Gemini AI for intelligent text extraction and formatting.

## 🚀 Quick Start

### 1. Install Dependencies
```bash
cd server
npm install
```

### 2. Start the Server
```bash
npm start
# or for development with auto-reload:
npm run dev
```

### 3. Test the API
```bash
curl http://localhost:3001/api/health
```

## 📡 API Endpoints

### Health Check
```
GET /api/health
```

### OCR Processing
```
POST /api/ocr
Content-Type: multipart/form-data

Parameters:
- file: Image file (PNG, JPG, WebP) or PDF
- apiKey: Google Gemini API key
- mode: "standard" or "structured"
```

## 🔧 Configuration

### Environment Variables
Create a `.env` file (optional):
```
PORT=3001
MAX_FILE_SIZE=10485760
```

### Supported File Types
- **Images**: PNG, JPG, JPEG, WebP
- **Documents**: PDF (converted to images)
- **Max Size**: 10MB per file

## 🎯 Processing Modes

### Standard Mode
- Uses Gemini 1.5 Flash (faster)
- Returns clean plain text
- Good for simple text extraction

### Structured Mode  
- Uses Gemini 1.5 Pro (more intelligent)
- Returns formatted Markdown
- Creates tables, headers, lists automatically
- Perfect for complex documents

## 📊 Response Format

```json
{
  "success": true,
  "data": {
    "fileName": "document.png",
    "fileSize": 1234567,
    "processingMode": "structured",
    "extractedText": "# Document Title\n\n...",
    "formats": {
      "txt": "plain text version",
      "md": "markdown version", 
      "json": { "metadata": {...}, "content": {...} }
    },
    "metadata": {
      "characterCount": 1500,
      "wordCount": 250,
      "lineCount": 45,
      "processedAt": "2024-01-01T12:00:00.000Z"
    }
  }
}
```

## 🛠️ Development

### Project Structure
```
server/
├── server.js          # Main server file
├── package.json       # Dependencies
├── uploads/           # Temporary file storage
└── README.md          # This file
```

### Key Features
- **Image Enhancement**: Automatic image preprocessing for better OCR
- **Smart Formatting**: Gemini AI creates beautiful Markdown output
- **Multiple Formats**: Returns TXT, MD, and JSON formats
- **Error Handling**: Comprehensive error handling and cleanup
- **File Cleanup**: Automatic temporary file cleanup

## 🔑 Getting Gemini API Key

1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Create a new API key
3. Copy the key and use it in the frontend

## 🚨 Troubleshooting

### Common Issues

**"Cannot connect to OCR backend"**
- Make sure server is running: `npm start`
- Check port 3001 is not in use
- Verify no firewall blocking

**"Invalid API key"**
- Check your Gemini API key is correct
- Ensure API key has proper permissions
- Try creating a new API key

**"File too large"**
- Maximum file size is 10MB
- Compress images before uploading
- For PDFs, try splitting into smaller files

**"Processing failed"**
- Check image quality (not too blurry)
- Ensure text is clearly visible
- Try different processing mode

### Debug Mode
Set `NODE_ENV=development` for detailed logging:
```bash
NODE_ENV=development npm start
```

## 📝 Notes

- Server runs on port 3001 by default
- Temporary files are automatically cleaned up
- CORS is enabled for frontend integration
- Image enhancement improves OCR accuracy
- Gemini AI provides intelligent text formatting

## 🔗 Integration

The backend integrates seamlessly with the Luna OCR React frontend. Make sure both are running:

1. **Backend**: `cd server && npm start` (port 3001)
2. **Frontend**: `npm start` (port 3000)

The frontend will automatically call the backend API for real OCR processing!