bluewhale2025's picture
Initial commit: Add ParseAI document processor application
3022fd1
---
title: ParseAI Document Processor
emoji: πŸ“Š
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
---
# ParseAI - Document Processing and Analysis
ParseAIλŠ” PDF λ¬Έμ„œλ₯Ό μ²˜λ¦¬ν•˜κ³  λΆ„μ„ν•˜κΈ° μœ„ν•œ κ°•λ ₯ν•œ λ„κ΅¬μž…λ‹ˆλ‹€. λ¬Έμ„œμ—μ„œ ν…μŠ€νŠΈλ₯Ό μΆ”μΆœν•˜κ³ , μš”μ•½ν•˜λ©°, 벑터 검색을 톡해 κ΄€λ ¨ λ¬Έμ„œλ₯Ό 찾을 수 μžˆμŠ΅λ‹ˆλ‹€.
## πŸš€ μ£Όμš” κΈ°λŠ₯
- PDF λ¬Έμ„œ μ—…λ‘œλ“œ 및 ν…μŠ€νŠΈ μΆ”μΆœ
- λ¬Έμ„œ λ‚΄μš© μš”μ•½
- 벑터 기반 λ¬Έμ„œ 검색
- Gradio 기반의 μ‚¬μš©μž μΉœν™”μ μΈ μ›Ή μΈν„°νŽ˜μ΄μŠ€
## πŸ› οΈ 기술 μŠ€νƒ
- **Backend**: FastAPI
- **Frontend**: Gradio
- **NLP**: NLTK, Hugging Face Transformers
- **Vector Store**: Sentence Transformers
- **Container**: Docker
## πŸš€ λ‘œμ»¬μ—μ„œ μ‹€ν–‰ν•˜κΈ°
### 사전 μš”κ΅¬μ‚¬ν•­
- Docker 및 Docker Compose
- Python 3.9+
### ν™˜κ²½ λ³€μˆ˜ μ„€μ •
`.env` νŒŒμΌμ„ μƒμ„±ν•˜κ³  λ‹€μŒ λ³€μˆ˜λ“€μ„ μ„€μ •ν•˜μ„Έμš”:
```bash
# Hugging Face Hub configuration
HUGGINGFACE_HUB_TOKEN=your_hf_token_here
# Application configuration
UPLOAD_FOLDER=/app/data/uploads
NLTK_DATA=/app/nltk_data
```
### Dockerλ₯Ό μ‚¬μš©ν•œ μ‹€ν–‰
1. Docker 이미지 λΉŒλ“œ:
```bash
docker build -t parseai .
```
2. μ»¨ν…Œμ΄λ„ˆ μ‹€ν–‰:
```bash
docker run -d -p 7860:7860 --env-file .env parseai
```
3. μ›Ή λΈŒλΌμš°μ €μ—μ„œ 접속:
```
http://localhost:7860
```
## 🌐 Hugging Face Spaces에 λ°°ν¬ν•˜κΈ°
1. 이 μ €μž₯μ†Œλ₯Ό Hugging Face Spaces에 ν‘Έμ‹œν•©λ‹ˆλ‹€.
2. μ €μž₯μ†Œ μ„€μ •μ—μ„œ λ‹€μŒ ν™˜κ²½ λ³€μˆ˜λ₯Ό μ„€μ •ν•˜μ„Έμš”:
- `HUGGINGFACE_HUB_TOKEN`: Hugging Face API 토큰
- `UPLOAD_FOLDER`: `/app/data/uploads`
- `NLTK_DATA`: `/app/nltk_data`
## πŸ“ μ‚¬μš© 방법
1. **λ¬Έμ„œ μ—…λ‘œλ“œ** νƒ­μ—μ„œ PDF νŒŒμΌμ„ μ—…λ‘œλ“œν•˜μ„Έμš”.
2. **λ¬Έμ„œ 검색** νƒ­μ—μ„œ ν‚€μ›Œλ“œλ₯Ό μž…λ ₯ν•˜μ—¬ κ΄€λ ¨ λ¬Έμ„œλ₯Ό κ²€μƒ‰ν•˜μ„Έμš”.
## πŸ“Š μƒνƒœ 확인
μ• ν”Œλ¦¬μΌ€μ΄μ…˜ μƒνƒœλŠ” λ‹€μŒ μ—”λ“œν¬μΈνŠΈμ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€:
```
GET /health
```
## πŸ“„ λΌμ΄μ„ μŠ€
이 ν”„λ‘œμ νŠΈλŠ” [MIT λΌμ΄μ„ μŠ€](LICENSE) ν•˜μ— λ°°ν¬λ©λ‹ˆλ‹€.