File size: 2,127 Bytes
6e46f97
3022fd1
 
 
 
 
 
 
 
6e46f97
 
3022fd1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: ParseAI Document Processor
emoji: πŸ“Š
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: true
---

# ParseAI - Document Processing and Analysis

ParseAIλŠ” PDF λ¬Έμ„œλ₯Ό μ²˜λ¦¬ν•˜κ³  λΆ„μ„ν•˜κΈ° μœ„ν•œ κ°•λ ₯ν•œ λ„κ΅¬μž…λ‹ˆλ‹€. λ¬Έμ„œμ—μ„œ ν…μŠ€νŠΈλ₯Ό μΆ”μΆœν•˜κ³ , μš”μ•½ν•˜λ©°, 벑터 검색을 톡해 κ΄€λ ¨ λ¬Έμ„œλ₯Ό 찾을 수 μžˆμŠ΅λ‹ˆλ‹€.

## πŸš€ μ£Όμš” κΈ°λŠ₯

- PDF λ¬Έμ„œ μ—…λ‘œλ“œ 및 ν…μŠ€νŠΈ μΆ”μΆœ
- λ¬Έμ„œ λ‚΄μš© μš”μ•½
- 벑터 기반 λ¬Έμ„œ 검색
- Gradio 기반의 μ‚¬μš©μž μΉœν™”μ μΈ μ›Ή μΈν„°νŽ˜μ΄μŠ€

## πŸ› οΈ 기술 μŠ€νƒ

- **Backend**: FastAPI
- **Frontend**: Gradio
- **NLP**: NLTK, Hugging Face Transformers
- **Vector Store**: Sentence Transformers
- **Container**: Docker

## πŸš€ λ‘œμ»¬μ—μ„œ μ‹€ν–‰ν•˜κΈ°

### 사전 μš”κ΅¬μ‚¬ν•­

- Docker 및 Docker Compose
- Python 3.9+

### ν™˜κ²½ λ³€μˆ˜ μ„€μ •

`.env` νŒŒμΌμ„ μƒμ„±ν•˜κ³  λ‹€μŒ λ³€μˆ˜λ“€μ„ μ„€μ •ν•˜μ„Έμš”:

```bash
# Hugging Face Hub configuration
HUGGINGFACE_HUB_TOKEN=your_hf_token_here

# Application configuration
UPLOAD_FOLDER=/app/data/uploads
NLTK_DATA=/app/nltk_data
```

### Dockerλ₯Ό μ‚¬μš©ν•œ μ‹€ν–‰

1. Docker 이미지 λΉŒλ“œ:
   ```bash
   docker build -t parseai .
   ```

2. μ»¨ν…Œμ΄λ„ˆ μ‹€ν–‰:
   ```bash
   docker run -d -p 7860:7860 --env-file .env parseai
   ```

3. μ›Ή λΈŒλΌμš°μ €μ—μ„œ 접속:
   ```
   http://localhost:7860
   ```

## 🌐 Hugging Face Spaces에 λ°°ν¬ν•˜κΈ°

1. 이 μ €μž₯μ†Œλ₯Ό Hugging Face Spaces에 ν‘Έμ‹œν•©λ‹ˆλ‹€.
2. μ €μž₯μ†Œ μ„€μ •μ—μ„œ λ‹€μŒ ν™˜κ²½ λ³€μˆ˜λ₯Ό μ„€μ •ν•˜μ„Έμš”:
   - `HUGGINGFACE_HUB_TOKEN`: Hugging Face API 토큰
   - `UPLOAD_FOLDER`: `/app/data/uploads`
   - `NLTK_DATA`: `/app/nltk_data`

## πŸ“ μ‚¬μš© 방법

1. **λ¬Έμ„œ μ—…λ‘œλ“œ** νƒ­μ—μ„œ PDF νŒŒμΌμ„ μ—…λ‘œλ“œν•˜μ„Έμš”.
2. **λ¬Έμ„œ 검색** νƒ­μ—μ„œ ν‚€μ›Œλ“œλ₯Ό μž…λ ₯ν•˜μ—¬ κ΄€λ ¨ λ¬Έμ„œλ₯Ό κ²€μƒ‰ν•˜μ„Έμš”.

## πŸ“Š μƒνƒœ 확인

μ• ν”Œλ¦¬μΌ€μ΄μ…˜ μƒνƒœλŠ” λ‹€μŒ μ—”λ“œν¬μΈνŠΈμ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€:

```
GET /health
```

## πŸ“„ λΌμ΄μ„ μŠ€

이 ν”„λ‘œμ νŠΈλŠ” [MIT λΌμ΄μ„ μŠ€](LICENSE) ν•˜μ— λ°°ν¬λ©λ‹ˆλ‹€.