File size: 3,834 Bytes
373c769
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# Luna OCR Backend

Real OCR processing backend using Gemini AI for intelligent text extraction and formatting.

## πŸš€ Quick Start

### 1. Install Dependencies
```bash

cd server

npm install

```

### 2. Start the Server
```bash

npm start

# or for development with auto-reload:

npm run dev

```

### 3. Test the API
```bash

curl http://localhost:3001/api/health

```

## πŸ“‘ API Endpoints

### Health Check
```

GET /api/health

```

### OCR Processing
```

POST /api/ocr

Content-Type: multipart/form-data



Parameters:

- file: Image file (PNG, JPG, WebP) or PDF

- apiKey: Google Gemini API key

- mode: "standard" or "structured"

```

## πŸ”§ Configuration

### Environment Variables
Create a `.env` file (optional):
```

PORT=3001

MAX_FILE_SIZE=10485760

```

### Supported File Types
- **Images**: PNG, JPG, JPEG, WebP
- **Documents**: PDF (converted to images)
- **Max Size**: 10MB per file

## 🎯 Processing Modes

### Standard Mode
- Uses Gemini 1.5 Flash (faster)
- Returns clean plain text
- Good for simple text extraction

### Structured Mode  
- Uses Gemini 1.5 Pro (more intelligent)
- Returns formatted Markdown
- Creates tables, headers, lists automatically
- Perfect for complex documents

## πŸ“Š Response Format

```json

{

  "success": true,

  "data": {

    "fileName": "document.png",

    "fileSize": 1234567,

    "processingMode": "structured",

    "extractedText": "# Document Title\n\n...",

    "formats": {

      "txt": "plain text version",

      "md": "markdown version", 

      "json": { "metadata": {...}, "content": {...} }

    },

    "metadata": {

      "characterCount": 1500,

      "wordCount": 250,

      "lineCount": 45,

      "processedAt": "2024-01-01T12:00:00.000Z"

    }

  }

}

```

## πŸ› οΈ Development

### Project Structure
```

server/

β”œβ”€β”€ server.js          # Main server file

β”œβ”€β”€ package.json       # Dependencies

β”œβ”€β”€ uploads/           # Temporary file storage

└── README.md          # This file

```

### Key Features
- **Image Enhancement**: Automatic image preprocessing for better OCR
- **Smart Formatting**: Gemini AI creates beautiful Markdown output
- **Multiple Formats**: Returns TXT, MD, and JSON formats
- **Error Handling**: Comprehensive error handling and cleanup
- **File Cleanup**: Automatic temporary file cleanup

## πŸ”‘ Getting Gemini API Key

1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Create a new API key
3. Copy the key and use it in the frontend

## 🚨 Troubleshooting

### Common Issues

**"Cannot connect to OCR backend"**
- Make sure server is running: `npm start`
- Check port 3001 is not in use
- Verify no firewall blocking

**"Invalid API key"**
- Check your Gemini API key is correct
- Ensure API key has proper permissions
- Try creating a new API key

**"File too large"**
- Maximum file size is 10MB
- Compress images before uploading
- For PDFs, try splitting into smaller files

**"Processing failed"**
- Check image quality (not too blurry)
- Ensure text is clearly visible
- Try different processing mode

### Debug Mode
Set `NODE_ENV=development` for detailed logging:
```bash

NODE_ENV=development npm start

```

## πŸ“ Notes

- Server runs on port 3001 by default
- Temporary files are automatically cleaned up
- CORS is enabled for frontend integration
- Image enhancement improves OCR accuracy
- Gemini AI provides intelligent text formatting

## πŸ”— Integration

The backend integrates seamlessly with the Luna OCR React frontend. Make sure both are running:

1. **Backend**: `cd server && npm start` (port 3001)
2. **Frontend**: `npm start` (port 3000)

The frontend will automatically call the backend API for real OCR processing!