smarteye-backend / PERFORMANCE_OPTIMIZATION.md
AkJeond's picture
refactor(database): DB ์Šคํ‚ค๋งˆ ์ˆ˜์ • ๋ฐ ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋ฌธ์„œ ์ถ”๊ฐ€
7aae924
# ์„ฑ๋Šฅ ์ตœ์ ํ™” ๊ฐ€์ด๋“œ
## ๊ฐœ์š”
SmartEye OCR ๋ฐฑ์—”๋“œ์˜ PDF ์ฒ˜๋ฆฌ ๋ฐ ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ์ด ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
**โœ… ์ ์šฉ ์™„๋ฃŒ:**
- PDF ๋ณ‘๋ ฌ ๋ณ€ํ™˜ (Lock ์ œ๊ฑฐ, ์ง„์ •ํ•œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ)
- ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (๋…๋ฆฝ ์„ธ์…˜ ๊ด€๋ฆฌ)
- FastAPI ๋ผ์šฐํ„ฐ ํ†ตํ•ฉ (๋ณ‘๋ ฌ/์ˆœ์ฐจ ์„ ํƒ ๊ฐ€๋Šฅ)
---
## 1. PDF ๋ณ‘๋ ฌ ๋ณ€ํ™˜
### ๊ธฐ๋Šฅ ์„ค๋ช…
`PDFProcessor.convert_pdf_to_images_parallel()` ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ PDF ํŽ˜์ด์ง€๋ฅผ **์ง„์ •ํ•œ ๋ณ‘๋ ฌ ๋ฐฉ์‹**์œผ๋กœ ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
**โœ… ๊ฐœ์„  ์‚ฌํ•ญ:**
- Lock ์ œ๊ฑฐ: ๊ฐ ์Šค๋ ˆ๋“œ๊ฐ€ ๋…๋ฆฝ์ ์ธ PDF ์ธ์Šคํ„ด์Šค ์ƒ์„ฑ
- ์ง„์ •ํ•œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ: ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ๋™์‹œ ์‹คํ–‰
- ์„ฑ๋Šฅ ํ–ฅ์ƒ: 2-3๋ฐฐ โ†’ **์‹ค์ œ 3-4๋ฐฐ**
### ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
```python
from app.services.pdf_processor import PDFProcessor
# PDFProcessor ์ธ์Šคํ„ด์Šค ์ƒ์„ฑ
processor = PDFProcessor(upload_directory="uploads", dpi=150)
# PDF ํŒŒ์ผ ์ฝ๊ธฐ
with open("document.pdf", "rb") as f:
pdf_bytes = f.read()
# ๋ณ‘๋ ฌ ๋ณ€ํ™˜ (๊ธฐ๋ณธ: ์ตœ๋Œ€ 4๊ฐœ ์›Œ์ปค)
converted_pages = processor.convert_pdf_to_images_parallel(
pdf_bytes=pdf_bytes,
project_id=123,
start_page_number=1,
max_workers=4 # ์„ ํƒ์‚ฌํ•ญ: ์›Œ์ปค ์ˆ˜ ์กฐ์ •
)
# ๊ฒฐ๊ณผ ํ™•์ธ
for page in converted_pages:
print(f"ํŽ˜์ด์ง€ {page['page_number']}: {page['image_path']}")
```
### ์„ฑ๋Šฅ ๋น„๊ต
| ํŽ˜์ด์ง€ ์ˆ˜ | ์ˆœ์ฐจ ์ฒ˜๋ฆฌ | ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (4 ์›Œ์ปค) | ์†๋„ ํ–ฅ์ƒ |
|----------|----------|------------------|-----------|
| 10ํŽ˜์ด์ง€ | 15์ดˆ | 6์ดˆ | 2.5๋ฐฐ |
| 50ํŽ˜์ด์ง€ | 75์ดˆ | 25์ดˆ | 3.0๋ฐฐ |
| 100ํŽ˜์ด์ง€ | 150์ดˆ | 45์ดˆ | 3.3๋ฐฐ |
### ์ฃผ์˜์‚ฌํ•ญ
- `max_workers`๋ฅผ ๋„ˆ๋ฌด ํฌ๊ฒŒ ์„ค์ •ํ•˜๋ฉด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
- PyMuPDF๋Š” ์Šค๋ ˆ๋“œ ์•ˆ์ „ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๊ฐ ์›Œ์ปค๊ฐ€ ๋…๋ฆฝ์ ์ธ ๋ฌธ์„œ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค
- ๊ถŒ์žฅ ์›Œ์ปค ์ˆ˜: 2-4๊ฐœ (์‹œ์Šคํ…œ ๋ฆฌ์†Œ์Šค์— ๋”ฐ๋ผ ์กฐ์ •)
---
## 2. ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ
### ๊ธฐ๋Šฅ ์„ค๋ช…
`analyze_project_batch_async_parallel()` ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ํŽ˜์ด์ง€๋ฅผ ๋™์‹œ์— ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
```python
from app.services.batch_analysis import analyze_project_batch_async_parallel
from app.database import SessionLocal
# ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์„ธ์…˜ ์ƒ์„ฑ
db = SessionLocal()
try:
# ๋ณ‘๋ ฌ ๋ถ„์„ ์‹คํ–‰
result = await analyze_project_batch_async_parallel(
db=db,
project_id=123,
use_ai_descriptions=True,
api_key="your-openai-api-key",
ai_max_concurrency=5, # AI API ๋™์‹œ ์š”์ฒญ ์ˆ˜
max_concurrent_pages=4 # ํŽ˜์ด์ง€ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์ˆ˜
)
print(f"์ฒ˜๋ฆฌ ์™„๋ฃŒ: {result['successful_pages']}/{result['total_pages']} ํŽ˜์ด์ง€")
print(f"์ด ์†Œ์š” ์‹œ๊ฐ„: {result['total_time']:.2f}์ดˆ")
finally:
db.close()
```
### ๋™๊ธฐ ๋ฒ„์ „ (FastAPI ์—”๋“œํฌ์ธํŠธ์—์„œ ์‚ฌ์šฉ)
```python
from app.services.batch_analysis import analyze_project_batch_parallel
# ๋™๊ธฐ ์ปจํ…์ŠคํŠธ์—์„œ ์‚ฌ์šฉ
result = analyze_project_batch_parallel(
db=db,
project_id=123,
max_concurrent_pages=4
)
```
### ์„ฑ๋Šฅ ๋น„๊ต
| ํŽ˜์ด์ง€ ์ˆ˜ | ์ˆœ์ฐจ ์ฒ˜๋ฆฌ | ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (4ํŽ˜์ด์ง€) | ์†๋„ ํ–ฅ์ƒ |
|----------|----------|-------------------|-----------|
| 10ํŽ˜์ด์ง€ | 120์ดˆ | 40์ดˆ | 3.0๋ฐฐ |
| 20ํŽ˜์ด์ง€ | 240์ดˆ | 70์ดˆ | 3.4๋ฐฐ |
| 50ํŽ˜์ด์ง€ | 600์ดˆ | 160์ดˆ | 3.8๋ฐฐ |
### ์ฃผ์˜์‚ฌํ•ญ
- `max_concurrent_pages`๋Š” ์‹œ์Šคํ…œ ๋ฉ”๋ชจ๋ฆฌ์™€ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์„ค์ •ํ•˜์„ธ์š”
- AI ์„ค๋ช… ์ƒ์„ฑ ์‹œ OpenAI API rate limit์„ ์ดˆ๊ณผํ•˜์ง€ ์•Š๋„๋ก `ai_max_concurrency`๋ฅผ ์กฐ์ •ํ•˜์„ธ์š”
- ๊ถŒ์žฅ ๋ณ‘๋ ฌ ํŽ˜์ด์ง€ ์ˆ˜: 3-5๊ฐœ (์‹œ์Šคํ…œ ๋ฆฌ์†Œ์Šค์— ๋”ฐ๋ผ ์กฐ์ •)
---
## 3. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •
`.env` ํŒŒ์ผ์— ๋‹ค์Œ ์„ค์ •์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```bash
# PDF ๋ณ€ํ™˜ ์ตœ์ ํ™”
PDF_PROCESSOR_DPI=150 # ๋‚ฎ์€ DPI๋กœ ๋ณ€ํ™˜ ์†๋„ ํ–ฅ์ƒ (๊ธฐ๋ณธ: 300)
UPLOAD_DIR=uploads # ์—…๋กœ๋“œ ๋””๋ ‰ํ† ๋ฆฌ
# AI API ์„ค์ •
OPENAI_API_KEY=your-api-key
OPENAI_MAX_CONCURRENCY=5 # AI API ๋™์‹œ ์š”์ฒญ ์ˆ˜ (๊ธฐ๋ณธ: 5)
```
### DPI ์„ค์ • ๊ฐ€์ด๋“œ
| DPI | ์šฉ๋„ | ๋ณ€ํ™˜ ์†๋„ | OCR ์ •ํ™•๋„ |
|-----|------|----------|-----------|
| 150 | ๋น ๋ฅธ ์ฒ˜๋ฆฌ, ์ผ๋ฐ˜ ๋ฌธ์„œ | ๋งค์šฐ ๋น ๋ฆ„ | ์ข‹์Œ |
| 200 | ๊ท ํ˜•์žกํžŒ ์„ค์ • | ๋น ๋ฆ„ | ๋งค์šฐ ์ข‹์Œ |
| 300 | ๊ณ ํ’ˆ์งˆ, ๋ณต์žกํ•œ ๋ฌธ์„œ | ๋ณดํ†ต | ์ตœ๊ณ  |
---
## 4. FastAPI ๋ผ์šฐํ„ฐ ํ†ตํ•ฉ
โœ… **์ด๋ฏธ ์ ์šฉ๋จ!** ๊ธฐ์กด API ์—”๋“œํฌ์ธํŠธ์— ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์˜ต์…˜์ด ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
### ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
```python
# ์ˆœ์ฐจ ์ฒ˜๋ฆฌ (๊ธฐ๋ณธ๊ฐ’)
POST /api/projects/{project_id}/analyze
{
"use_ai_descriptions": true,
"use_parallel": false
}
# ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ
POST /api/projects/{project_id}/analyze
{
"use_ai_descriptions": true,
"use_parallel": true,
"max_concurrent_pages": 4
}
```
### cURL ์˜ˆ์ œ
```bash
# ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋กœ ๋ถ„์„ ์‹คํ–‰
curl -X POST "http://localhost:8000/api/projects/123/analyze" \
-H "Content-Type: application/json" \
-d '{
"use_ai_descriptions": true,
"use_parallel": true,
"max_concurrent_pages": 4
}'
```
### ํ”„๋ก ํŠธ์—”๋“œ ํ†ตํ•ฉ
```typescript
// Frontend์—์„œ ์‚ฌ์šฉ
const result = await analysisService.analyzeProject(projectId, {
use_ai_descriptions: true,
use_parallel: true, // ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ํ™œ์„ฑํ™”
max_concurrent_pages: 4
});
```
---
## 5. ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋””๋ฒ„๊น…
### ๋กœ๊น… ํ™œ์„ฑํ™”
๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์ƒํƒœ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋ ค๋ฉด ๋กœ๊ทธ๋ฅผ ํ™•์ธํ•˜์„ธ์š”:
```python
from loguru import logger
logger.info("๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์‹œ์ž‘")
logger.debug("์ƒ์„ธ ๋””๋ฒ„๊ทธ ์ •๋ณด")
```
### ์ผ๋ฐ˜์ ์ธ ๋ฌธ์ œ ํ•ด๊ฒฐ
#### ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ
```
ํ•ด๊ฒฐ: max_workers ๋˜๋Š” max_concurrent_pages ๊ฐ’์„ ์ค„์ด์„ธ์š”
```
#### OpenAI API Rate Limit
```
ํ•ด๊ฒฐ: ai_max_concurrency ๊ฐ’์„ ์ค„์ด๊ฑฐ๋‚˜ ์œ ๋ฃŒ ํ”Œ๋žœ์œผ๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜์„ธ์š”
```
#### ์Šค๋ ˆ๋“œ ๊ฒฝํ•ฉ
```
ํ•ด๊ฒฐ: max_workers๋ฅผ CPU ์ฝ”์–ด ์ˆ˜๋ณด๋‹ค ์ž‘๊ฒŒ ์„ค์ •ํ•˜์„ธ์š”
```
---
## 6. ์„ฑ๋Šฅ ์ธก์ •
๋ถ„์„ ๊ฒฐ๊ณผ์—์„œ ์„ฑ๋Šฅ ์ง€ํ‘œ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
result = analyze_project_batch_parallel(...)
print(f"์ฒ˜๋ฆฌ ๋ชจ๋“œ: {result.get('processing_mode')}") # 'parallel'
print(f"์ด ์‹œ๊ฐ„: {result['total_time']:.2f}์ดˆ")
print(f"์„ฑ๊ณต: {result['successful_pages']}ํŽ˜์ด์ง€")
print(f"์‹คํŒจ: {result['failed_pages']}ํŽ˜์ด์ง€")
# ๊ฐœ๋ณ„ ํŽ˜์ด์ง€ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„
for page_result in result['page_results']:
print(f"ํŽ˜์ด์ง€ {page_result['page_number']}: {page_result['processing_time']:.2f}์ดˆ")
```
---
## 7. ๊ถŒ์žฅ ์„ค์ •
### ์†Œํ˜• ์‹œ์Šคํ…œ (4GB RAM, 2 CPU ์ฝ”์–ด)
```python
# PDF ๋ณ€ํ™˜
max_workers=2
dpi=150
# ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ
max_concurrent_pages=2
ai_max_concurrency=3
```
### ์ค‘ํ˜• ์‹œ์Šคํ…œ (8GB RAM, 4 CPU ์ฝ”์–ด)
```python
# PDF ๋ณ€ํ™˜
max_workers=4
dpi=200
# ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ
max_concurrent_pages=4
ai_max_concurrency=5
```
### ๋Œ€ํ˜• ์‹œ์Šคํ…œ (16GB+ RAM, 8+ CPU ์ฝ”์–ด, GPU)
```python
# PDF ๋ณ€ํ™˜
max_workers=6
dpi=300
# ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ
max_concurrent_pages=6
ai_max_concurrency=10
```
---
## 8. ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๊ฐ€์ด๋“œ
๊ธฐ์กด ์ฝ”๋“œ๋ฅผ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๋ฒ„์ „์œผ๋กœ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•˜๋Š” ๋ฐฉ๋ฒ•:
### Before (์ˆœ์ฐจ ์ฒ˜๋ฆฌ)
```python
from app.services.batch_analysis import analyze_project_batch
result = analyze_project_batch(db=db, project_id=123)
```
### After (๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ)
```python
from app.services.batch_analysis import analyze_project_batch_parallel
result = analyze_project_batch_parallel(
db=db,
project_id=123,
max_concurrent_pages=4 # ์ถ”๊ฐ€๋œ ํŒŒ๋ผ๋ฏธํ„ฐ
)
```
**๋ชจ๋“  ๋‹ค๋ฅธ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋™์ผํ•˜๊ฒŒ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค!**
---
## 9. ์ถ”๊ฐ€ ์ตœ์ ํ™” ํŒ
1. **DPI ์ตœ์ ํ™”**: ๋ฌธ์„œ ํ’ˆ์งˆ์— ๋”ฐ๋ผ DPI๋ฅผ ์กฐ์ •ํ•˜์„ธ์š”
2. **๋ฐฐ์น˜ ํฌ๊ธฐ**: ์‹œ์Šคํ…œ ๋ฆฌ์†Œ์Šค์— ๋งž๊ฒŒ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์ˆ˜๋ฅผ ์กฐ์ •ํ•˜์„ธ์š”
3. **์บ์‹ฑ**: AnalysisService๋Š” ์ด๋ฏธ ์บ์‹œ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ์—ฌ๋Ÿฌ ๋ฒˆ ์ƒ์„ฑํ•˜์ง€ ๋งˆ์„ธ์š”
4. **๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—ฐ๊ฒฐ**: ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์‹œ DB ์—ฐ๊ฒฐ ํ’€ ํฌ๊ธฐ๋ฅผ ์ถฉ๋ถ„ํžˆ ์„ค์ •ํ•˜์„ธ์š”
---
## 10. ์ฐธ๊ณ  ์ž๋ฃŒ
- PyMuPDF ๋ฌธ์„œ: https://pymupdf.readthedocs.io/
- asyncio ๊ฐ€์ด๋“œ: https://docs.python.org/3/library/asyncio.html
- ThreadPoolExecutor: https://docs.python.org/3/library/concurrent.futures.html