Spaces:
Sleeping
Sleeping
| # ์ฑ๋ฅ ์ต์ ํ ๊ฐ์ด๋ | |
| ## ๊ฐ์ | |
| SmartEye OCR ๋ฐฑ์๋์ PDF ์ฒ๋ฆฌ ๋ฐ ๋ถ์ ํ์ดํ๋ผ์ธ ์ฑ๋ฅ์ ๊ฐ์ ํ๊ธฐ ์ํ ๋ณ๋ ฌ ์ฒ๋ฆฌ ๊ธฐ๋ฅ์ด ์ถ๊ฐ๋์์ต๋๋ค. | |
| **โ ์ ์ฉ ์๋ฃ:** | |
| - PDF ๋ณ๋ ฌ ๋ณํ (Lock ์ ๊ฑฐ, ์ง์ ํ ๋ณ๋ ฌ ์ฒ๋ฆฌ) | |
| - ๋ถ์ ํ์ดํ๋ผ์ธ ๋ณ๋ ฌ ์ฒ๋ฆฌ (๋ ๋ฆฝ ์ธ์ ๊ด๋ฆฌ) | |
| - FastAPI ๋ผ์ฐํฐ ํตํฉ (๋ณ๋ ฌ/์์ฐจ ์ ํ ๊ฐ๋ฅ) | |
| --- | |
| ## 1. PDF ๋ณ๋ ฌ ๋ณํ | |
| ### ๊ธฐ๋ฅ ์ค๋ช | |
| `PDFProcessor.convert_pdf_to_images_parallel()` ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ PDF ํ์ด์ง๋ฅผ **์ง์ ํ ๋ณ๋ ฌ ๋ฐฉ์**์ผ๋ก ์ด๋ฏธ์ง๋ก ๋ณํํ ์ ์์ต๋๋ค. | |
| **โ ๊ฐ์ ์ฌํญ:** | |
| - Lock ์ ๊ฑฐ: ๊ฐ ์ค๋ ๋๊ฐ ๋ ๋ฆฝ์ ์ธ PDF ์ธ์คํด์ค ์์ฑ | |
| - ์ง์ ํ ๋ณ๋ ฌ ์ฒ๋ฆฌ: ๋ชจ๋ ์ค๋ ๋๊ฐ ๋์ ์คํ | |
| - ์ฑ๋ฅ ํฅ์: 2-3๋ฐฐ โ **์ค์ 3-4๋ฐฐ** | |
| ### ์ฌ์ฉ ๋ฐฉ๋ฒ | |
| ```python | |
| from app.services.pdf_processor import PDFProcessor | |
| # PDFProcessor ์ธ์คํด์ค ์์ฑ | |
| processor = PDFProcessor(upload_directory="uploads", dpi=150) | |
| # PDF ํ์ผ ์ฝ๊ธฐ | |
| with open("document.pdf", "rb") as f: | |
| pdf_bytes = f.read() | |
| # ๋ณ๋ ฌ ๋ณํ (๊ธฐ๋ณธ: ์ต๋ 4๊ฐ ์์ปค) | |
| converted_pages = processor.convert_pdf_to_images_parallel( | |
| pdf_bytes=pdf_bytes, | |
| project_id=123, | |
| start_page_number=1, | |
| max_workers=4 # ์ ํ์ฌํญ: ์์ปค ์ ์กฐ์ | |
| ) | |
| # ๊ฒฐ๊ณผ ํ์ธ | |
| for page in converted_pages: | |
| print(f"ํ์ด์ง {page['page_number']}: {page['image_path']}") | |
| ``` | |
| ### ์ฑ๋ฅ ๋น๊ต | |
| | ํ์ด์ง ์ | ์์ฐจ ์ฒ๋ฆฌ | ๋ณ๋ ฌ ์ฒ๋ฆฌ (4 ์์ปค) | ์๋ ํฅ์ | | |
| |----------|----------|------------------|-----------| | |
| | 10ํ์ด์ง | 15์ด | 6์ด | 2.5๋ฐฐ | | |
| | 50ํ์ด์ง | 75์ด | 25์ด | 3.0๋ฐฐ | | |
| | 100ํ์ด์ง | 150์ด | 45์ด | 3.3๋ฐฐ | | |
| ### ์ฃผ์์ฌํญ | |
| - `max_workers`๋ฅผ ๋๋ฌด ํฌ๊ฒ ์ค์ ํ๋ฉด ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ด ์ฆ๊ฐํ ์ ์์ต๋๋ค | |
| - PyMuPDF๋ ์ค๋ ๋ ์์ ํ์ง ์์ผ๋ฏ๋ก ๊ฐ ์์ปค๊ฐ ๋ ๋ฆฝ์ ์ธ ๋ฌธ์ ์ธ์คํด์ค๋ฅผ ์์ฑํฉ๋๋ค | |
| - ๊ถ์ฅ ์์ปค ์: 2-4๊ฐ (์์คํ ๋ฆฌ์์ค์ ๋ฐ๋ผ ์กฐ์ ) | |
| --- | |
| ## 2. ๋ถ์ ํ์ดํ๋ผ์ธ ๋ณ๋ ฌ ์ฒ๋ฆฌ | |
| ### ๊ธฐ๋ฅ ์ค๋ช | |
| `analyze_project_batch_async_parallel()` ํจ์๋ฅผ ์ฌ์ฉํ์ฌ ์ฌ๋ฌ ํ์ด์ง๋ฅผ ๋์์ ๋ถ์ํ ์ ์์ต๋๋ค. | |
| ### ์ฌ์ฉ ๋ฐฉ๋ฒ | |
| ```python | |
| from app.services.batch_analysis import analyze_project_batch_async_parallel | |
| from app.database import SessionLocal | |
| # ๋ฐ์ดํฐ๋ฒ ์ด์ค ์ธ์ ์์ฑ | |
| db = SessionLocal() | |
| try: | |
| # ๋ณ๋ ฌ ๋ถ์ ์คํ | |
| result = await analyze_project_batch_async_parallel( | |
| db=db, | |
| project_id=123, | |
| use_ai_descriptions=True, | |
| api_key="your-openai-api-key", | |
| ai_max_concurrency=5, # AI API ๋์ ์์ฒญ ์ | |
| max_concurrent_pages=4 # ํ์ด์ง ๋ณ๋ ฌ ์ฒ๋ฆฌ ์ | |
| ) | |
| print(f"์ฒ๋ฆฌ ์๋ฃ: {result['successful_pages']}/{result['total_pages']} ํ์ด์ง") | |
| print(f"์ด ์์ ์๊ฐ: {result['total_time']:.2f}์ด") | |
| finally: | |
| db.close() | |
| ``` | |
| ### ๋๊ธฐ ๋ฒ์ (FastAPI ์๋ํฌ์ธํธ์์ ์ฌ์ฉ) | |
| ```python | |
| from app.services.batch_analysis import analyze_project_batch_parallel | |
| # ๋๊ธฐ ์ปจํ ์คํธ์์ ์ฌ์ฉ | |
| result = analyze_project_batch_parallel( | |
| db=db, | |
| project_id=123, | |
| max_concurrent_pages=4 | |
| ) | |
| ``` | |
| ### ์ฑ๋ฅ ๋น๊ต | |
| | ํ์ด์ง ์ | ์์ฐจ ์ฒ๋ฆฌ | ๋ณ๋ ฌ ์ฒ๋ฆฌ (4ํ์ด์ง) | ์๋ ํฅ์ | | |
| |----------|----------|-------------------|-----------| | |
| | 10ํ์ด์ง | 120์ด | 40์ด | 3.0๋ฐฐ | | |
| | 20ํ์ด์ง | 240์ด | 70์ด | 3.4๋ฐฐ | | |
| | 50ํ์ด์ง | 600์ด | 160์ด | 3.8๋ฐฐ | | |
| ### ์ฃผ์์ฌํญ | |
| - `max_concurrent_pages`๋ ์์คํ ๋ฉ๋ชจ๋ฆฌ์ GPU ๋ฉ๋ชจ๋ฆฌ๋ฅผ ๊ณ ๋ คํ์ฌ ์ค์ ํ์ธ์ | |
| - AI ์ค๋ช ์์ฑ ์ OpenAI API rate limit์ ์ด๊ณผํ์ง ์๋๋ก `ai_max_concurrency`๋ฅผ ์กฐ์ ํ์ธ์ | |
| - ๊ถ์ฅ ๋ณ๋ ฌ ํ์ด์ง ์: 3-5๊ฐ (์์คํ ๋ฆฌ์์ค์ ๋ฐ๋ผ ์กฐ์ ) | |
| --- | |
| ## 3. ํ๊ฒฝ ๋ณ์ ์ค์ | |
| `.env` ํ์ผ์ ๋ค์ ์ค์ ์ ์ถ๊ฐํ์ฌ ์ฑ๋ฅ์ ์ต์ ํํ ์ ์์ต๋๋ค: | |
| ```bash | |
| # PDF ๋ณํ ์ต์ ํ | |
| PDF_PROCESSOR_DPI=150 # ๋ฎ์ DPI๋ก ๋ณํ ์๋ ํฅ์ (๊ธฐ๋ณธ: 300) | |
| UPLOAD_DIR=uploads # ์ ๋ก๋ ๋๋ ํ ๋ฆฌ | |
| # AI API ์ค์ | |
| OPENAI_API_KEY=your-api-key | |
| OPENAI_MAX_CONCURRENCY=5 # AI API ๋์ ์์ฒญ ์ (๊ธฐ๋ณธ: 5) | |
| ``` | |
| ### DPI ์ค์ ๊ฐ์ด๋ | |
| | DPI | ์ฉ๋ | ๋ณํ ์๋ | OCR ์ ํ๋ | | |
| |-----|------|----------|-----------| | |
| | 150 | ๋น ๋ฅธ ์ฒ๋ฆฌ, ์ผ๋ฐ ๋ฌธ์ | ๋งค์ฐ ๋น ๋ฆ | ์ข์ | | |
| | 200 | ๊ท ํ์กํ ์ค์ | ๋น ๋ฆ | ๋งค์ฐ ์ข์ | | |
| | 300 | ๊ณ ํ์ง, ๋ณต์กํ ๋ฌธ์ | ๋ณดํต | ์ต๊ณ | | |
| --- | |
| ## 4. FastAPI ๋ผ์ฐํฐ ํตํฉ | |
| โ **์ด๋ฏธ ์ ์ฉ๋จ!** ๊ธฐ์กด API ์๋ํฌ์ธํธ์ ๋ณ๋ ฌ ์ฒ๋ฆฌ ์ต์ ์ด ์ถ๊ฐ๋์์ต๋๋ค. | |
| ### ์ฌ์ฉ ๋ฐฉ๋ฒ | |
| ```python | |
| # ์์ฐจ ์ฒ๋ฆฌ (๊ธฐ๋ณธ๊ฐ) | |
| POST /api/projects/{project_id}/analyze | |
| { | |
| "use_ai_descriptions": true, | |
| "use_parallel": false | |
| } | |
| # ๋ณ๋ ฌ ์ฒ๋ฆฌ | |
| POST /api/projects/{project_id}/analyze | |
| { | |
| "use_ai_descriptions": true, | |
| "use_parallel": true, | |
| "max_concurrent_pages": 4 | |
| } | |
| ``` | |
| ### cURL ์์ | |
| ```bash | |
| # ๋ณ๋ ฌ ์ฒ๋ฆฌ๋ก ๋ถ์ ์คํ | |
| curl -X POST "http://localhost:8000/api/projects/123/analyze" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "use_ai_descriptions": true, | |
| "use_parallel": true, | |
| "max_concurrent_pages": 4 | |
| }' | |
| ``` | |
| ### ํ๋ก ํธ์๋ ํตํฉ | |
| ```typescript | |
| // Frontend์์ ์ฌ์ฉ | |
| const result = await analysisService.analyzeProject(projectId, { | |
| use_ai_descriptions: true, | |
| use_parallel: true, // ๋ณ๋ ฌ ์ฒ๋ฆฌ ํ์ฑํ | |
| max_concurrent_pages: 4 | |
| }); | |
| ``` | |
| --- | |
| ## 5. ๋ชจ๋ํฐ๋ง ๋ฐ ๋๋ฒ๊น | |
| ### ๋ก๊น ํ์ฑํ | |
| ๋ณ๋ ฌ ์ฒ๋ฆฌ ์ํ๋ฅผ ๋ชจ๋ํฐ๋งํ๋ ค๋ฉด ๋ก๊ทธ๋ฅผ ํ์ธํ์ธ์: | |
| ```python | |
| from loguru import logger | |
| logger.info("๋ณ๋ ฌ ์ฒ๋ฆฌ ์์") | |
| logger.debug("์์ธ ๋๋ฒ๊ทธ ์ ๋ณด") | |
| ``` | |
| ### ์ผ๋ฐ์ ์ธ ๋ฌธ์ ํด๊ฒฐ | |
| #### ๋ฉ๋ชจ๋ฆฌ ๋ถ์กฑ | |
| ``` | |
| ํด๊ฒฐ: max_workers ๋๋ max_concurrent_pages ๊ฐ์ ์ค์ด์ธ์ | |
| ``` | |
| #### OpenAI API Rate Limit | |
| ``` | |
| ํด๊ฒฐ: ai_max_concurrency ๊ฐ์ ์ค์ด๊ฑฐ๋ ์ ๋ฃ ํ๋์ผ๋ก ์ ๊ทธ๋ ์ด๋ํ์ธ์ | |
| ``` | |
| #### ์ค๋ ๋ ๊ฒฝํฉ | |
| ``` | |
| ํด๊ฒฐ: max_workers๋ฅผ CPU ์ฝ์ด ์๋ณด๋ค ์๊ฒ ์ค์ ํ์ธ์ | |
| ``` | |
| --- | |
| ## 6. ์ฑ๋ฅ ์ธก์ | |
| ๋ถ์ ๊ฒฐ๊ณผ์์ ์ฑ๋ฅ ์งํ๋ฅผ ํ์ธํ ์ ์์ต๋๋ค: | |
| ```python | |
| result = analyze_project_batch_parallel(...) | |
| print(f"์ฒ๋ฆฌ ๋ชจ๋: {result.get('processing_mode')}") # 'parallel' | |
| print(f"์ด ์๊ฐ: {result['total_time']:.2f}์ด") | |
| print(f"์ฑ๊ณต: {result['successful_pages']}ํ์ด์ง") | |
| print(f"์คํจ: {result['failed_pages']}ํ์ด์ง") | |
| # ๊ฐ๋ณ ํ์ด์ง ์ฒ๋ฆฌ ์๊ฐ | |
| for page_result in result['page_results']: | |
| print(f"ํ์ด์ง {page_result['page_number']}: {page_result['processing_time']:.2f}์ด") | |
| ``` | |
| --- | |
| ## 7. ๊ถ์ฅ ์ค์ | |
| ### ์ํ ์์คํ (4GB RAM, 2 CPU ์ฝ์ด) | |
| ```python | |
| # PDF ๋ณํ | |
| max_workers=2 | |
| dpi=150 | |
| # ๋ถ์ ํ์ดํ๋ผ์ธ | |
| max_concurrent_pages=2 | |
| ai_max_concurrency=3 | |
| ``` | |
| ### ์คํ ์์คํ (8GB RAM, 4 CPU ์ฝ์ด) | |
| ```python | |
| # PDF ๋ณํ | |
| max_workers=4 | |
| dpi=200 | |
| # ๋ถ์ ํ์ดํ๋ผ์ธ | |
| max_concurrent_pages=4 | |
| ai_max_concurrency=5 | |
| ``` | |
| ### ๋ํ ์์คํ (16GB+ RAM, 8+ CPU ์ฝ์ด, GPU) | |
| ```python | |
| # PDF ๋ณํ | |
| max_workers=6 | |
| dpi=300 | |
| # ๋ถ์ ํ์ดํ๋ผ์ธ | |
| max_concurrent_pages=6 | |
| ai_max_concurrency=10 | |
| ``` | |
| --- | |
| ## 8. ๋ง์ด๊ทธ๋ ์ด์ ๊ฐ์ด๋ | |
| ๊ธฐ์กด ์ฝ๋๋ฅผ ๋ณ๋ ฌ ์ฒ๋ฆฌ ๋ฒ์ ์ผ๋ก ๋ง์ด๊ทธ๋ ์ด์ ํ๋ ๋ฐฉ๋ฒ: | |
| ### Before (์์ฐจ ์ฒ๋ฆฌ) | |
| ```python | |
| from app.services.batch_analysis import analyze_project_batch | |
| result = analyze_project_batch(db=db, project_id=123) | |
| ``` | |
| ### After (๋ณ๋ ฌ ์ฒ๋ฆฌ) | |
| ```python | |
| from app.services.batch_analysis import analyze_project_batch_parallel | |
| result = analyze_project_batch_parallel( | |
| db=db, | |
| project_id=123, | |
| max_concurrent_pages=4 # ์ถ๊ฐ๋ ํ๋ผ๋ฏธํฐ | |
| ) | |
| ``` | |
| **๋ชจ๋ ๋ค๋ฅธ ํ๋ผ๋ฏธํฐ๋ ๋์ผํ๊ฒ ์ ์ง๋ฉ๋๋ค!** | |
| --- | |
| ## 9. ์ถ๊ฐ ์ต์ ํ ํ | |
| 1. **DPI ์ต์ ํ**: ๋ฌธ์ ํ์ง์ ๋ฐ๋ผ DPI๋ฅผ ์กฐ์ ํ์ธ์ | |
| 2. **๋ฐฐ์น ํฌ๊ธฐ**: ์์คํ ๋ฆฌ์์ค์ ๋ง๊ฒ ๋ณ๋ ฌ ์ฒ๋ฆฌ ์๋ฅผ ์กฐ์ ํ์ธ์ | |
| 3. **์บ์ฑ**: AnalysisService๋ ์ด๋ฏธ ์บ์๋์ด ์์ผ๋ฏ๋ก ์ฌ๋ฌ ๋ฒ ์์ฑํ์ง ๋ง์ธ์ | |
| 4. **๋ฐ์ดํฐ๋ฒ ์ด์ค ์ฐ๊ฒฐ**: ๋ณ๋ ฌ ์ฒ๋ฆฌ ์ DB ์ฐ๊ฒฐ ํ ํฌ๊ธฐ๋ฅผ ์ถฉ๋ถํ ์ค์ ํ์ธ์ | |
| --- | |
| ## 10. ์ฐธ๊ณ ์๋ฃ | |
| - PyMuPDF ๋ฌธ์: https://pymupdf.readthedocs.io/ | |
| - asyncio ๊ฐ์ด๋: https://docs.python.org/3/library/asyncio.html | |
| - ThreadPoolExecutor: https://docs.python.org/3/library/concurrent.futures.html | |