smarteye-backend / PERFORMANCE_OPTIMIZATION.md
AkJeond's picture
refactor(database): DB ์Šคํ‚ค๋งˆ ์ˆ˜์ • ๋ฐ ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋ฌธ์„œ ์ถ”๊ฐ€
7aae924

์„ฑ๋Šฅ ์ตœ์ ํ™” ๊ฐ€์ด๋“œ

๊ฐœ์š”

SmartEye OCR ๋ฐฑ์—”๋“œ์˜ PDF ์ฒ˜๋ฆฌ ๋ฐ ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ์ด ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

โœ… ์ ์šฉ ์™„๋ฃŒ:

  • PDF ๋ณ‘๋ ฌ ๋ณ€ํ™˜ (Lock ์ œ๊ฑฐ, ์ง„์ •ํ•œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ)
  • ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (๋…๋ฆฝ ์„ธ์…˜ ๊ด€๋ฆฌ)
  • FastAPI ๋ผ์šฐํ„ฐ ํ†ตํ•ฉ (๋ณ‘๋ ฌ/์ˆœ์ฐจ ์„ ํƒ ๊ฐ€๋Šฅ)

1. PDF ๋ณ‘๋ ฌ ๋ณ€ํ™˜

๊ธฐ๋Šฅ ์„ค๋ช…

PDFProcessor.convert_pdf_to_images_parallel() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ PDF ํŽ˜์ด์ง€๋ฅผ ์ง„์ •ํ•œ ๋ณ‘๋ ฌ ๋ฐฉ์‹์œผ๋กœ ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โœ… ๊ฐœ์„  ์‚ฌํ•ญ:

  • Lock ์ œ๊ฑฐ: ๊ฐ ์Šค๋ ˆ๋“œ๊ฐ€ ๋…๋ฆฝ์ ์ธ PDF ์ธ์Šคํ„ด์Šค ์ƒ์„ฑ
  • ์ง„์ •ํ•œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ: ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ๋™์‹œ ์‹คํ–‰
  • ์„ฑ๋Šฅ ํ–ฅ์ƒ: 2-3๋ฐฐ โ†’ ์‹ค์ œ 3-4๋ฐฐ

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

from app.services.pdf_processor import PDFProcessor

# PDFProcessor ์ธ์Šคํ„ด์Šค ์ƒ์„ฑ
processor = PDFProcessor(upload_directory="uploads", dpi=150)

# PDF ํŒŒ์ผ ์ฝ๊ธฐ
with open("document.pdf", "rb") as f:
    pdf_bytes = f.read()

# ๋ณ‘๋ ฌ ๋ณ€ํ™˜ (๊ธฐ๋ณธ: ์ตœ๋Œ€ 4๊ฐœ ์›Œ์ปค)
converted_pages = processor.convert_pdf_to_images_parallel(
    pdf_bytes=pdf_bytes,
    project_id=123,
    start_page_number=1,
    max_workers=4  # ์„ ํƒ์‚ฌํ•ญ: ์›Œ์ปค ์ˆ˜ ์กฐ์ •
)

# ๊ฒฐ๊ณผ ํ™•์ธ
for page in converted_pages:
    print(f"ํŽ˜์ด์ง€ {page['page_number']}: {page['image_path']}")

์„ฑ๋Šฅ ๋น„๊ต

ํŽ˜์ด์ง€ ์ˆ˜ ์ˆœ์ฐจ ์ฒ˜๋ฆฌ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (4 ์›Œ์ปค) ์†๋„ ํ–ฅ์ƒ
10ํŽ˜์ด์ง€ 15์ดˆ 6์ดˆ 2.5๋ฐฐ
50ํŽ˜์ด์ง€ 75์ดˆ 25์ดˆ 3.0๋ฐฐ
100ํŽ˜์ด์ง€ 150์ดˆ 45์ดˆ 3.3๋ฐฐ

์ฃผ์˜์‚ฌํ•ญ

  • max_workers๋ฅผ ๋„ˆ๋ฌด ํฌ๊ฒŒ ์„ค์ •ํ•˜๋ฉด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
  • PyMuPDF๋Š” ์Šค๋ ˆ๋“œ ์•ˆ์ „ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๊ฐ ์›Œ์ปค๊ฐ€ ๋…๋ฆฝ์ ์ธ ๋ฌธ์„œ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค
  • ๊ถŒ์žฅ ์›Œ์ปค ์ˆ˜: 2-4๊ฐœ (์‹œ์Šคํ…œ ๋ฆฌ์†Œ์Šค์— ๋”ฐ๋ผ ์กฐ์ •)

2. ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ

๊ธฐ๋Šฅ ์„ค๋ช…

analyze_project_batch_async_parallel() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ํŽ˜์ด์ง€๋ฅผ ๋™์‹œ์— ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

from app.services.batch_analysis import analyze_project_batch_async_parallel
from app.database import SessionLocal

# ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์„ธ์…˜ ์ƒ์„ฑ
db = SessionLocal()

try:
    # ๋ณ‘๋ ฌ ๋ถ„์„ ์‹คํ–‰
    result = await analyze_project_batch_async_parallel(
        db=db,
        project_id=123,
        use_ai_descriptions=True,
        api_key="your-openai-api-key",
        ai_max_concurrency=5,        # AI API ๋™์‹œ ์š”์ฒญ ์ˆ˜
        max_concurrent_pages=4        # ํŽ˜์ด์ง€ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์ˆ˜
    )
    
    print(f"์ฒ˜๋ฆฌ ์™„๋ฃŒ: {result['successful_pages']}/{result['total_pages']} ํŽ˜์ด์ง€")
    print(f"์ด ์†Œ์š” ์‹œ๊ฐ„: {result['total_time']:.2f}์ดˆ")
    
finally:
    db.close()

๋™๊ธฐ ๋ฒ„์ „ (FastAPI ์—”๋“œํฌ์ธํŠธ์—์„œ ์‚ฌ์šฉ)

from app.services.batch_analysis import analyze_project_batch_parallel

# ๋™๊ธฐ ์ปจํ…์ŠคํŠธ์—์„œ ์‚ฌ์šฉ
result = analyze_project_batch_parallel(
    db=db,
    project_id=123,
    max_concurrent_pages=4
)

์„ฑ๋Šฅ ๋น„๊ต

ํŽ˜์ด์ง€ ์ˆ˜ ์ˆœ์ฐจ ์ฒ˜๋ฆฌ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (4ํŽ˜์ด์ง€) ์†๋„ ํ–ฅ์ƒ
10ํŽ˜์ด์ง€ 120์ดˆ 40์ดˆ 3.0๋ฐฐ
20ํŽ˜์ด์ง€ 240์ดˆ 70์ดˆ 3.4๋ฐฐ
50ํŽ˜์ด์ง€ 600์ดˆ 160์ดˆ 3.8๋ฐฐ

์ฃผ์˜์‚ฌํ•ญ

  • max_concurrent_pages๋Š” ์‹œ์Šคํ…œ ๋ฉ”๋ชจ๋ฆฌ์™€ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์„ค์ •ํ•˜์„ธ์š”
  • AI ์„ค๋ช… ์ƒ์„ฑ ์‹œ OpenAI API rate limit์„ ์ดˆ๊ณผํ•˜์ง€ ์•Š๋„๋ก ai_max_concurrency๋ฅผ ์กฐ์ •ํ•˜์„ธ์š”
  • ๊ถŒ์žฅ ๋ณ‘๋ ฌ ํŽ˜์ด์ง€ ์ˆ˜: 3-5๊ฐœ (์‹œ์Šคํ…œ ๋ฆฌ์†Œ์Šค์— ๋”ฐ๋ผ ์กฐ์ •)

3. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •

.env ํŒŒ์ผ์— ๋‹ค์Œ ์„ค์ •์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

# PDF ๋ณ€ํ™˜ ์ตœ์ ํ™”
PDF_PROCESSOR_DPI=150          # ๋‚ฎ์€ DPI๋กœ ๋ณ€ํ™˜ ์†๋„ ํ–ฅ์ƒ (๊ธฐ๋ณธ: 300)
UPLOAD_DIR=uploads             # ์—…๋กœ๋“œ ๋””๋ ‰ํ† ๋ฆฌ

# AI API ์„ค์ •
OPENAI_API_KEY=your-api-key
OPENAI_MAX_CONCURRENCY=5       # AI API ๋™์‹œ ์š”์ฒญ ์ˆ˜ (๊ธฐ๋ณธ: 5)

DPI ์„ค์ • ๊ฐ€์ด๋“œ

DPI ์šฉ๋„ ๋ณ€ํ™˜ ์†๋„ OCR ์ •ํ™•๋„
150 ๋น ๋ฅธ ์ฒ˜๋ฆฌ, ์ผ๋ฐ˜ ๋ฌธ์„œ ๋งค์šฐ ๋น ๋ฆ„ ์ข‹์Œ
200 ๊ท ํ˜•์žกํžŒ ์„ค์ • ๋น ๋ฆ„ ๋งค์šฐ ์ข‹์Œ
300 ๊ณ ํ’ˆ์งˆ, ๋ณต์žกํ•œ ๋ฌธ์„œ ๋ณดํ†ต ์ตœ๊ณ 

4. FastAPI ๋ผ์šฐํ„ฐ ํ†ตํ•ฉ

โœ… ์ด๋ฏธ ์ ์šฉ๋จ! ๊ธฐ์กด API ์—”๋“œํฌ์ธํŠธ์— ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์˜ต์…˜์ด ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

# ์ˆœ์ฐจ ์ฒ˜๋ฆฌ (๊ธฐ๋ณธ๊ฐ’)
POST /api/projects/{project_id}/analyze
{
  "use_ai_descriptions": true,
  "use_parallel": false
}

# ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ
POST /api/projects/{project_id}/analyze
{
  "use_ai_descriptions": true,
  "use_parallel": true,
  "max_concurrent_pages": 4
}

cURL ์˜ˆ์ œ

# ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋กœ ๋ถ„์„ ์‹คํ–‰
curl -X POST "http://localhost:8000/api/projects/123/analyze" \
  -H "Content-Type: application/json" \
  -d '{
    "use_ai_descriptions": true,
    "use_parallel": true,
    "max_concurrent_pages": 4
  }'

ํ”„๋ก ํŠธ์—”๋“œ ํ†ตํ•ฉ

// Frontend์—์„œ ์‚ฌ์šฉ
const result = await analysisService.analyzeProject(projectId, {
  use_ai_descriptions: true,
  use_parallel: true,  // ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ํ™œ์„ฑํ™”
  max_concurrent_pages: 4
});

5. ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋””๋ฒ„๊น…

๋กœ๊น… ํ™œ์„ฑํ™”

๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์ƒํƒœ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋ ค๋ฉด ๋กœ๊ทธ๋ฅผ ํ™•์ธํ•˜์„ธ์š”:

from loguru import logger

logger.info("๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์‹œ์ž‘")
logger.debug("์ƒ์„ธ ๋””๋ฒ„๊ทธ ์ •๋ณด")

์ผ๋ฐ˜์ ์ธ ๋ฌธ์ œ ํ•ด๊ฒฐ

๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ

ํ•ด๊ฒฐ: max_workers ๋˜๋Š” max_concurrent_pages ๊ฐ’์„ ์ค„์ด์„ธ์š”

OpenAI API Rate Limit

ํ•ด๊ฒฐ: ai_max_concurrency ๊ฐ’์„ ์ค„์ด๊ฑฐ๋‚˜ ์œ ๋ฃŒ ํ”Œ๋žœ์œผ๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜์„ธ์š”

์Šค๋ ˆ๋“œ ๊ฒฝํ•ฉ

ํ•ด๊ฒฐ: max_workers๋ฅผ CPU ์ฝ”์–ด ์ˆ˜๋ณด๋‹ค ์ž‘๊ฒŒ ์„ค์ •ํ•˜์„ธ์š”

6. ์„ฑ๋Šฅ ์ธก์ •

๋ถ„์„ ๊ฒฐ๊ณผ์—์„œ ์„ฑ๋Šฅ ์ง€ํ‘œ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

result = analyze_project_batch_parallel(...)

print(f"์ฒ˜๋ฆฌ ๋ชจ๋“œ: {result.get('processing_mode')}")  # 'parallel'
print(f"์ด ์‹œ๊ฐ„: {result['total_time']:.2f}์ดˆ")
print(f"์„ฑ๊ณต: {result['successful_pages']}ํŽ˜์ด์ง€")
print(f"์‹คํŒจ: {result['failed_pages']}ํŽ˜์ด์ง€")

# ๊ฐœ๋ณ„ ํŽ˜์ด์ง€ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„
for page_result in result['page_results']:
    print(f"ํŽ˜์ด์ง€ {page_result['page_number']}: {page_result['processing_time']:.2f}์ดˆ")

7. ๊ถŒ์žฅ ์„ค์ •

์†Œํ˜• ์‹œ์Šคํ…œ (4GB RAM, 2 CPU ์ฝ”์–ด)

# PDF ๋ณ€ํ™˜
max_workers=2
dpi=150

# ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ
max_concurrent_pages=2
ai_max_concurrency=3

์ค‘ํ˜• ์‹œ์Šคํ…œ (8GB RAM, 4 CPU ์ฝ”์–ด)

# PDF ๋ณ€ํ™˜
max_workers=4
dpi=200

# ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ
max_concurrent_pages=4
ai_max_concurrency=5

๋Œ€ํ˜• ์‹œ์Šคํ…œ (16GB+ RAM, 8+ CPU ์ฝ”์–ด, GPU)

# PDF ๋ณ€ํ™˜
max_workers=6
dpi=300

# ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ
max_concurrent_pages=6
ai_max_concurrency=10

8. ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๊ฐ€์ด๋“œ

๊ธฐ์กด ์ฝ”๋“œ๋ฅผ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๋ฒ„์ „์œผ๋กœ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•˜๋Š” ๋ฐฉ๋ฒ•:

Before (์ˆœ์ฐจ ์ฒ˜๋ฆฌ)

from app.services.batch_analysis import analyze_project_batch

result = analyze_project_batch(db=db, project_id=123)

After (๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ)

from app.services.batch_analysis import analyze_project_batch_parallel

result = analyze_project_batch_parallel(
    db=db,
    project_id=123,
    max_concurrent_pages=4  # ์ถ”๊ฐ€๋œ ํŒŒ๋ผ๋ฏธํ„ฐ
)

๋ชจ๋“  ๋‹ค๋ฅธ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋™์ผํ•˜๊ฒŒ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค!


9. ์ถ”๊ฐ€ ์ตœ์ ํ™” ํŒ

  1. DPI ์ตœ์ ํ™”: ๋ฌธ์„œ ํ’ˆ์งˆ์— ๋”ฐ๋ผ DPI๋ฅผ ์กฐ์ •ํ•˜์„ธ์š”
  2. ๋ฐฐ์น˜ ํฌ๊ธฐ: ์‹œ์Šคํ…œ ๋ฆฌ์†Œ์Šค์— ๋งž๊ฒŒ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์ˆ˜๋ฅผ ์กฐ์ •ํ•˜์„ธ์š”
  3. ์บ์‹ฑ: AnalysisService๋Š” ์ด๋ฏธ ์บ์‹œ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ์—ฌ๋Ÿฌ ๋ฒˆ ์ƒ์„ฑํ•˜์ง€ ๋งˆ์„ธ์š”
  4. ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—ฐ๊ฒฐ: ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ์‹œ DB ์—ฐ๊ฒฐ ํ’€ ํฌ๊ธฐ๋ฅผ ์ถฉ๋ถ„ํžˆ ์„ค์ •ํ•˜์„ธ์š”

10. ์ฐธ๊ณ  ์ž๋ฃŒ