|
|
--- |
|
|
title: pdf2html |
|
|
app_file: web_interface.py |
|
|
sdk: gradio |
|
|
sdk_version: 5.20.1 |
|
|
--- |
|
|
# pdf2html |
|
|
|
|
|
PDF ํ์ผ์ ๋จ์ผ ์ปฌ๋ผ HTML๋ก ๋ณํํ๋ Python ํจํค์ง์
๋๋ค. |
|
|
|
|
|
## ์ค์น |
|
|
|
|
|
```bash |
|
|
# Poetry๋ฅผ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ |
|
|
poetry install |
|
|
|
|
|
# ๋๋ pip ์ฌ์ฉ |
|
|
pip install pdf2html |
|
|
``` |
|
|
|
|
|
## ํ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ |
|
|
|
|
|
```bash |
|
|
pip install PyMuPDF beautifulsoup4 langchain gradio gradio-pdf |
|
|
``` |
|
|
|
|
|
## ์ฌ์ฉ๋ฒ |
|
|
|
|
|
### ๋ช
๋ น์ค ์ธํฐํ์ด์ค |
|
|
|
|
|
```bash |
|
|
# ์ง์ ์คํ |
|
|
poerty run python -m pdf2html ๊ฒฝ๋ก/ํ์ผ๋ช
.pdf |
|
|
poerty run python -m pdf2html ๊ฒฝ๋ก/ํ์ผ๋ช
.pdf --output ์ถ๋ ฅ๋๋ ํ ๋ฆฌ |
|
|
|
|
|
# ์ค์น ํ ์คํ |
|
|
pdf2html ๊ฒฝ๋ก/ํ์ผ๋ช
.pdf |
|
|
pdf2html ๊ฒฝ๋ก/ํ์ผ๋ช
.pdf --output ์ถ๋ ฅ๋๋ ํ ๋ฆฌ |
|
|
``` |
|
|
|
|
|
### ์น ์ธํฐํ์ด์ค |
|
|
|
|
|
```bash |
|
|
# ์ง์ ์คํ |
|
|
poetry run python -m web_interface |
|
|
|
|
|
# ์ค์น ํ ์คํ |
|
|
pdf2html-web |
|
|
``` |
|
|
|
|
|
### Python ์ฝ๋์์ ์ฌ์ฉ |
|
|
|
|
|
```python |
|
|
from pdf2html import PDFToHTMLConverter |
|
|
|
|
|
converter = PDFToHTMLConverter("๊ฒฝ๋ก/ํ์ผ๋ช
.pdf") |
|
|
output_path = converter.convert() |
|
|
print(f"๋ณํ ์๋ฃ: {output_path}") |
|
|
``` |
|
|
|
|
|
## ์ฃผ์ ๊ธฐ๋ฅ |
|
|
|
|
|
- PDF ๋ฌธ์์ ํ
์คํธ, ์ด๋ฏธ์ง, ํ ์ถ์ถ |
|
|
- 1๋จ ์ธ๋ก ๋ ์ด์์์ผ๋ก ์ฌ๊ตฌ์ฑ |
|
|
- ๋ฌธ๋จ ๊ตฌ์กฐ ๋ฐ ์์ ์ ์ง |
|
|
- ์ด๋ฏธ์ง ์๋ ์ถ์ถ ๋ฐ ํฌํจ |
|
|
- ํ ๊ตฌ์กฐ ๊ฐ์ง ๋ฐ HTML ํ
์ด๋ธ๋ก ๋ณํ |
|
|
- Gradio ๊ธฐ๋ฐ ์น ์ธํฐํ์ด์ค ์ ๊ณต |