File size: 3,834 Bytes

373c769

# Luna OCR Backend

Real OCR processing backend using Gemini AI for intelligent text extraction and formatting.

## 🚀 Quick Start

### 1. Install Dependencies
```bash

cd server

npm install

```

### 2. Start the Server
```bash

npm start

# or for development with auto-reload:

npm run dev

```

### 3. Test the API
```bash

curl http://localhost:3001/api/health

```

## 📡 API Endpoints

### Health Check
```

GET /api/health

```

### OCR Processing
```

POST /api/ocr

Content-Type: multipart/form-data



Parameters:

- file: Image file (PNG, JPG, WebP) or PDF

- apiKey: Google Gemini API key

- mode: "standard" or "structured"

```

## 🔧 Configuration

### Environment Variables
Create a `.env` file (optional):
```

PORT=3001

MAX_FILE_SIZE=10485760

```

### Supported File Types
- **Images**: PNG, JPG, JPEG, WebP
- **Documents**: PDF (converted to images)
- **Max Size**: 10MB per file

## 🎯 Processing Modes

### Standard Mode
- Uses Gemini 1.5 Flash (faster)
- Returns clean plain text
- Good for simple text extraction

### Structured Mode  
- Uses Gemini 1.5 Pro (more intelligent)
- Returns formatted Markdown
- Creates tables, headers, lists automatically
- Perfect for complex documents

## 📊 Response Format

```json

{

  "success": true,

  "data": {

    "fileName": "document.png",

    "fileSize": 1234567,

    "processingMode": "structured",

    "extractedText": "# Document Title\n\n...",

    "formats": {

      "txt": "plain text version",

      "md": "markdown version", 

      "json": { "metadata": {...}, "content": {...} }

    },

    "metadata": {

      "characterCount": 1500,

      "wordCount": 250,

      "lineCount": 45,

      "processedAt": "2024-01-01T12:00:00.000Z"

    }

  }

}

```

## 🛠️ Development

### Project Structure
```

server/

├── server.js          # Main server file

├── package.json       # Dependencies

├── uploads/           # Temporary file storage

└── README.md          # This file

```

### Key Features
- **Image Enhancement**: Automatic image preprocessing for better OCR
- **Smart Formatting**: Gemini AI creates beautiful Markdown output
- **Multiple Formats**: Returns TXT, MD, and JSON formats
- **Error Handling**: Comprehensive error handling and cleanup
- **File Cleanup**: Automatic temporary file cleanup

## 🔑 Getting Gemini API Key

1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Create a new API key
3. Copy the key and use it in the frontend

## 🚨 Troubleshooting

### Common Issues

**"Cannot connect to OCR backend"**
- Make sure server is running: `npm start`
- Check port 3001 is not in use
- Verify no firewall blocking

**"Invalid API key"**
- Check your Gemini API key is correct
- Ensure API key has proper permissions
- Try creating a new API key

**"File too large"**
- Maximum file size is 10MB
- Compress images before uploading
- For PDFs, try splitting into smaller files

**"Processing failed"**
- Check image quality (not too blurry)
- Ensure text is clearly visible
- Try different processing mode

### Debug Mode
Set `NODE_ENV=development` for detailed logging:
```bash

NODE_ENV=development npm start

```

## 📝 Notes

- Server runs on port 3001 by default
- Temporary files are automatically cleaned up
- CORS is enabled for frontend integration
- Image enhancement improves OCR accuracy
- Gemini AI provides intelligent text formatting

## 🔗 Integration

The backend integrates seamlessly with the Luna OCR React frontend. Make sure both are running:

1. **Backend**: `cd server && npm start` (port 3001)
2. **Frontend**: `npm start` (port 3000)

The frontend will automatically call the backend API for real OCR processing!