File size: 1,312 Bytes
4a5a856
d1aa69e
 
4a5a856
 
 
d1aa69e
4a5a856
d1aa69e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
title: pdf2html
app_file: web_interface.py
sdk: gradio
sdk_version: 5.20.1
---
# pdf2html

PDF ํŒŒ์ผ์„ ๋‹จ์ผ ์ปฌ๋Ÿผ HTML๋กœ ๋ณ€ํ™˜ํ•˜๋Š” Python ํŒจํ‚ค์ง€์ž…๋‹ˆ๋‹ค.

## ์„ค์น˜

```bash
# Poetry๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ
poetry install

# ๋˜๋Š” pip ์‚ฌ์šฉ
pip install pdf2html
```

## ํ•„์š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

```bash
pip install PyMuPDF beautifulsoup4 langchain gradio gradio-pdf
```

## ์‚ฌ์šฉ๋ฒ•

### ๋ช…๋ น์ค„ ์ธํ„ฐํŽ˜์ด์Šค

```bash
# ์ง์ ‘ ์‹คํ–‰
poerty run python -m pdf2html ๊ฒฝ๋กœ/ํŒŒ์ผ๋ช….pdf
poerty run python -m pdf2html ๊ฒฝ๋กœ/ํŒŒ์ผ๋ช….pdf --output ์ถœ๋ ฅ๋””๋ ‰ํ† ๋ฆฌ

# ์„ค์น˜ ํ›„ ์‹คํ–‰
pdf2html ๊ฒฝ๋กœ/ํŒŒ์ผ๋ช….pdf
pdf2html ๊ฒฝ๋กœ/ํŒŒ์ผ๋ช….pdf --output ์ถœ๋ ฅ๋””๋ ‰ํ† ๋ฆฌ
```

### ์›น ์ธํ„ฐํŽ˜์ด์Šค

```bash
# ์ง์ ‘ ์‹คํ–‰
poetry run python -m web_interface

# ์„ค์น˜ ํ›„ ์‹คํ–‰
pdf2html-web
```

### Python ์ฝ”๋“œ์—์„œ ์‚ฌ์šฉ

```python
from pdf2html import PDFToHTMLConverter

converter = PDFToHTMLConverter("๊ฒฝ๋กœ/ํŒŒ์ผ๋ช….pdf")
output_path = converter.convert()
print(f"๋ณ€ํ™˜ ์™„๋ฃŒ: {output_path}")
```

## ์ฃผ์š” ๊ธฐ๋Šฅ

- PDF ๋ฌธ์„œ์˜ ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ํ‘œ ์ถ”์ถœ
- 1๋‹จ ์„ธ๋กœ ๋ ˆ์ด์•„์›ƒ์œผ๋กœ ์žฌ๊ตฌ์„ฑ
- ๋ฌธ๋‹จ ๊ตฌ์กฐ ๋ฐ ์„œ์‹ ์œ ์ง€
- ์ด๋ฏธ์ง€ ์ž๋™ ์ถ”์ถœ ๋ฐ ํฌํ•จ
- ํ‘œ ๊ตฌ์กฐ ๊ฐ์ง€ ๋ฐ HTML ํ…Œ์ด๋ธ”๋กœ ๋ณ€ํ™˜
- Gradio ๊ธฐ๋ฐ˜ ์›น ์ธํ„ฐํŽ˜์ด์Šค ์ œ๊ณต