Spaces:
Runtime error
Runtime error
Upload 8 files
Browse files- CHANGELOG.md +121 -0
- COMPARISON_ANALYSIS.md +273 -0
- QUICKSTART.md +110 -0
- README.md +244 -14
- SETUP_GUIDE.md +288 -0
- UPGRADE_SUMMARY.md +346 -0
- app.py +351 -0
- requirements.txt +8 -3
CHANGELOG.md
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ๐ ๋ณ๊ฒฝ ์ฌํญ (Changelog)
|
| 2 |
+
|
| 3 |
+
## v1.1 - Andrew Ng ์์น ๊ธฐ๋ฐ ์
๊ทธ๋ ์ด๋ (2024)
|
| 4 |
+
|
| 5 |
+
### ๐ฏ ์ฃผ์ ๊ฐ์ ์ฌํญ
|
| 6 |
+
|
| 7 |
+
#### 1. **VectorDB ๊ฐ์ ** (`core/vectordb.py`)
|
| 8 |
+
- โ
`get_or_create_collection()` ์ฌ์ฉ (๋ Pythonic)
|
| 9 |
+
- โ
๋ฉํ๋ฐ์ดํฐ ์ถ๊ฐ ("RFP ๋ฌธ์ ์๋ฒ ๋ฉ")
|
| 10 |
+
- โ
์ด๊ธฐํ ์ ๋ฌธ์ ์ ํ์
|
| 11 |
+
|
| 12 |
+
**์ด์ **: Andrew Ng ์์น - "Start Simple"
|
| 13 |
+
|
| 14 |
+
#### 2. **ํ๋กฌํํธ ์์ง๋์ด๋ง ๊ฐ์ ** (`core/generator.py`)
|
| 15 |
+
- โ
๋ ๋ช
ํํ "๋ต๋ณ ๊ท์น" 5๊ฐ์ง ๋ช
์
|
| 16 |
+
1. ๋ฌธ์ ๋ด์ฉ๋ง ๊ธฐ๋ฐ
|
| 17 |
+
2. ์์ผ๋ฉด "๋ชจ๋ฅธ๋ค" ๋ต๋ณ
|
| 18 |
+
3. ํ์ด์ง ๋ฒํธ [ํ์ด์ง X] ํ์
|
| 19 |
+
4. ๋ช
ํํ๊ณ ๊ฐ๊ฒฐํ๊ฒ
|
| 20 |
+
5. ์ถ์ธก ๊ธ์ง
|
| 21 |
+
- โ
์์คํ
ํ๋กฌํํธ ๊ฐํ
|
| 22 |
+
|
| 23 |
+
**์ด์ **: Andrew Ng ์์น - "Error Analysis Driven" (ํ ๋ฃจ์๋ค์ด์
๋ฐฉ์ง)
|
| 24 |
+
|
| 25 |
+
#### 3. **UI/UX ๋ํญ ๊ฐ์ ** (`app.py`)
|
| 26 |
+
- โ
`st.chat_input()` ๋์
(ChatGPT ์คํ์ผ)
|
| 27 |
+
- โ
์ฑํ
ํ์คํ ๋ฆฌ ํ์
|
| 28 |
+
- โ
`st.chat_message()` ์ฌ์ฉ (์ญํ ๋ณ ์์ด์ฝ)
|
| 29 |
+
- โ
์ค์๊ฐ ๋ต๋ณ ์์ฑ ํ์
|
| 30 |
+
|
| 31 |
+
**์ด์ **: ํ๋์ UX, ์ฌ์ฉ์ ์นํ์ฑ
|
| 32 |
+
|
| 33 |
+
#### 4. **์ธ์
๊ด๋ฆฌ ๊ฐํ** (`app.py`)
|
| 34 |
+
- โ
`messages` ์ธ์
์ํ ์ถ๊ฐ
|
| 35 |
+
- โ
๋ํ ํ์คํ ๋ฆฌ ์ ์ง
|
| 36 |
+
- โ
์ถ์ฒ ์ ๋ณด ์ ์ฅ
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## v1.0 - MVP (2024)
|
| 41 |
+
|
| 42 |
+
### ๐ ์ด๊ธฐ ๊ตฌํ
|
| 43 |
+
|
| 44 |
+
#### Core ๊ธฐ๋ฅ
|
| 45 |
+
- โ
PDF ์
๋ก๋ ๋ฐ ํ
์คํธ ์ถ์ถ (pymupdf4llm)
|
| 46 |
+
- โ
์ฒญํน (800์, ์ค๋ฒ๋ฉ 150)
|
| 47 |
+
- โ
์๋ฒ ๋ฉ (OpenAI text-embedding-3-small)
|
| 48 |
+
- โ
ChromaDB ์ ์ฅ
|
| 49 |
+
- โ
๋ฒกํฐ ๊ฒ์ (์ฝ์ฌ์ธ ์ ์ฌ๋)
|
| 50 |
+
- โ
Grok ๋ต๋ณ ์์ฑ
|
| 51 |
+
- โ
์ถ์ฒ ํ์
|
| 52 |
+
|
| 53 |
+
#### UI ๊ธฐ๋ฅ
|
| 54 |
+
- โ
Streamlit ๊ธฐ๋ฐ
|
| 55 |
+
- โ
ํต๊ณ ๋์๋ณด๋
|
| 56 |
+
- โ
์ฌ์ด๋๋ฐ ์ค์
|
| 57 |
+
- โ
์ปค์คํ
CSS
|
| 58 |
+
|
| 59 |
+
#### ์์คํ
|
| 60 |
+
- โ
๋ชจ๋ํ ๊ตฌ์กฐ
|
| 61 |
+
- โ
์๋ฌ ํธ๋ค๋ง
|
| 62 |
+
- โ
๋ก๊น
|
| 63 |
+
- โ
์ธ์
๊ด๋ฆฌ
|
| 64 |
+
|
| 65 |
+
---
|
| 66 |
+
|
| 67 |
+
## ๐ ๋ค์ ๋ฒ์ (v2.0 ์์ )
|
| 68 |
+
|
| 69 |
+
### Phase 2: ์ ํ๋ ๊ฐ์
|
| 70 |
+
- [ ] ํ์ด๋ธ๋ฆฌ๋ ๊ฒ์ (BM25 + Vector)
|
| 71 |
+
- [ ] ๋ฆฌ๋ญํน (Cohere Rerank)
|
| 72 |
+
- [ ] ํ์ด๋ผ์ดํ
(PDF.js)
|
| 73 |
+
- [ ] ํ๊ฐ ์์คํ
(์ ํ๋ ์ธก์ )
|
| 74 |
+
|
| 75 |
+
### Phase 3: ํ๋ก๋์
|
| 76 |
+
- [ ] ๋ค์ค PDF ์ง์
|
| 77 |
+
- [ ] ๋ฉํ๋ฐ์ดํฐ ๋ก๊น
|
| 78 |
+
- [ ] ์ฑ๋ฅ ์ต์ ํ
|
| 79 |
+
- [ ] Docker ๋ฐฐํฌ
|
| 80 |
+
- [ ] ์ ํ๋ 90%+ ๋ฌ์ฑ
|
| 81 |
+
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
## ๐ ์ฑ๋ฅ ์งํ
|
| 85 |
+
|
| 86 |
+
### v1.1
|
| 87 |
+
- **ํ๋กฌํํธ ํ์ง**: ๊ฐ์ โ
(5๊ฐ์ง ๊ท์น ๋ช
์)
|
| 88 |
+
- **UX**: ๋ํญ ๊ฐ์ โ
(์ฑํ
์ธํฐํ์ด์ค)
|
| 89 |
+
- **์ฝ๋ ํ์ง**: ๊ฐ์ โ
(Pythonic)
|
| 90 |
+
|
| 91 |
+
### v1.0
|
| 92 |
+
- **๊ธฐ๋ฅ ์์ฑ๋**: 100% โ
|
| 93 |
+
- **์ฝ๋ ๋ผ์ธ ์**: 1,146์ค
|
| 94 |
+
- **๋ชจ๋ ์**: 19๊ฐ ํ์ผ
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
+
|
| 98 |
+
## ๐ ํฌ๋ ๋ง
|
| 99 |
+
|
| 100 |
+
### ์ฝ๋ ๊ฐ์ ๊ธฐ์ฌ
|
| 101 |
+
- **์ฌ์ฉ์ ํ๋**: ์ฑํ
์ธํฐํ์ด์ค, ํ๋กฌํํธ ๊ฐ์
|
| 102 |
+
- **๋ด๋ถ ์ค๊ณ**: ํด๋์ค ๊ตฌ์กฐ, ํต๊ณ ๋์๋ณด๋
|
| 103 |
+
- **Andrew Ng ์์น**: ์ค๊ณ ์ฒ ํ
|
| 104 |
+
|
| 105 |
+
---
|
| 106 |
+
|
| 107 |
+
## ๐ ์๋ ค์ง ์ด์
|
| 108 |
+
|
| 109 |
+
### v1.1
|
| 110 |
+
- ์์ (ํ์ฌ๊น์ง)
|
| 111 |
+
|
| 112 |
+
### v1.0
|
| 113 |
+
- ~~์ฑํ
ํ์คํ ๋ฆฌ ์์~~ โ v1.1์์ ํด๊ฒฐ โ
|
| 114 |
+
- ~~ํ๋กฌํํธ ์ ๋งคํจ~~ โ v1.1์์ ํด๊ฒฐ โ
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
## ๐ ํผ๋๋ฐฑ
|
| 119 |
+
|
| 120 |
+
๊ฐ์ ์ฌํญ์ด๋ ๋ฒ๊ทธ๋ฅผ ๋ฐ๊ฒฌํ์๋ฉด ์๋ ค์ฃผ์ธ์!
|
| 121 |
+
|
COMPARISON_ANALYSIS.md
ADDED
|
@@ -0,0 +1,273 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ๐ง ์ฝ๋ ๋น๊ต ๋ถ์: ์ฌ์ฉ์ ํ๋ vs ๊ตฌํ ์ฝ๋
|
| 2 |
+
|
| 3 |
+
## ๐ Executive Summary
|
| 4 |
+
|
| 5 |
+
### Andrew Ng ์์น ๊ธฐ๋ฐ ํ๊ฐ
|
| 6 |
+
|
| 7 |
+
| ์์น | ์ฌ์ฉ์ ํ๋ | ๋ด ๊ตฌํ | ์ต์ข
์ ํ |
|
| 8 |
+
|------|------------|---------|----------|
|
| 9 |
+
| **Start Simple** | โ
โ
โ
ํจ์ ๊ธฐ๋ฐ | โ
โ
ํด๋์ค ๊ธฐ๋ฐ | ๐ก ํ์ด๋ธ๋ฆฌ๋ |
|
| 10 |
+
| **Establish Baseline** | โ
โ
๊ธฐ๋ณธ ํต๊ณ | โ
โ
โ
ํ๋ถํ ๋์๋ณด๋ | โ
๋ด ๊ตฌํ |
|
| 11 |
+
| **Measurable Metrics** | โ
์๋ฌต์ | โ
โ
โ
๋ช
์์ ํต๊ณ | โ
๋ด ๊ตฌํ |
|
| 12 |
+
| **Error Analysis** | โ
โ
โ
ํ๋กฌํํธ ์ฐ์ | โ
โ
๊ตฌ์กฐ ์ฐ์ | โ
ํ๋ ํ๋กฌํํธ |
|
| 13 |
+
| **Iteration Ready** | โ
ํจ์โ๋ฆฌํฉํ ๋ง ํ์ | โ
โ
โ
ํด๋์คโํ์ฅ ์ฉ์ด | โ
๋ด ๊ตฌํ |
|
| 14 |
+
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
## ๐ ์ธ๋ถ ๋น๊ต
|
| 18 |
+
|
| 19 |
+
### 1. `core/vectordb.py`
|
| 20 |
+
|
| 21 |
+
#### ์ฌ์ฉ์ ํ๋
|
| 22 |
+
```python
|
| 23 |
+
self.collection = self.client.get_or_create_collection(
|
| 24 |
+
name=collection_name,
|
| 25 |
+
metadata={"description": "RFP ๋ฌธ์ ์๋ฒ ๋ฉ"}
|
| 26 |
+
)
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
#### ๋ด ๊ตฌํ (์๋)
|
| 30 |
+
```python
|
| 31 |
+
try:
|
| 32 |
+
self.collection = self.client.get_collection(name=collection_name)
|
| 33 |
+
except:
|
| 34 |
+
self.collection = self.client.create_collection(name=collection_name)
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
#### ํ๊ฐ
|
| 38 |
+
- **๊ฐ๋
์ฑ**: ํ๋ ์น โ
(ํ ์ค, Pythonic)
|
| 39 |
+
- **๊ธฐ๋ฅ**: ๋์ผ
|
| 40 |
+
- **๋ฉํ๋ฐ์ดํฐ**: ํ๋ ์น โ
(์ค๋ช
์ถ๊ฐ)
|
| 41 |
+
|
| 42 |
+
**๊ฒฐ๋ก **: โ
**ํ๋ ์ฑํ** (v1.1์์ ์ ์ฉ ์๋ฃ)
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
### 2. `core/retriever.py`
|
| 47 |
+
|
| 48 |
+
#### ์ฌ์ฉ์ ํ๋
|
| 49 |
+
```python
|
| 50 |
+
def embed_query(query: str) -> List[float]:
|
| 51 |
+
"""์ฟผ๋ฆฌ๋ฅผ ์๋ฒ ๋ฉ"""
|
| 52 |
+
response = client.embeddings.create(...)
|
| 53 |
+
return response.data[0].embedding
|
| 54 |
+
|
| 55 |
+
def retrieve(vectordb, query: str, top_k: int = TOP_K) -> List[Dict]:
|
| 56 |
+
"""๊ฒ์ ์คํ"""
|
| 57 |
+
query_embedding = embed_query(query)
|
| 58 |
+
results = vectordb.search(query_embedding, top_k=top_k)
|
| 59 |
+
# ํฌ๋งทํ
...
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
#### ๋ด ๊ตฌํ
|
| 63 |
+
```python
|
| 64 |
+
class Retriever:
|
| 65 |
+
def __init__(self, vectordb: VectorDB):
|
| 66 |
+
self.vectordb = vectordb
|
| 67 |
+
|
| 68 |
+
def retrieve(self, query: str, top_k: int = TOP_K) -> List[Dict]:
|
| 69 |
+
# ์๋ฒ ๋ฉ + ๊ฒ์ + ํฌ๋งทํ
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
#### ํ๊ฐ
|
| 73 |
+
- **๋จ์์ฑ**: ํ๋ ์น โ
(MVP์ ์ ํฉ)
|
| 74 |
+
- **ํ์ฅ์ฑ**: ๋ด ๊ตฌํ ์น โ
(Phase 2์ ์ ๋ฆฌ)
|
| 75 |
+
- **์บก์ํ**: ๋ด ๊ตฌํ ์น โ
(OOP)
|
| 76 |
+
|
| 77 |
+
**๊ฒฐ๋ก **: โ
**๋ด ๊ตฌํ ์ ์ง** (Phase 2 ์ค๋น)
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
+
|
| 81 |
+
### 3. `core/generator.py`
|
| 82 |
+
|
| 83 |
+
#### ์ฌ์ฉ์ ํ๋ ํ๋กฌํํธ
|
| 84 |
+
```
|
| 85 |
+
๋ต๋ณ ๊ท์น:
|
| 86 |
+
1. ๋ฐ๋์ ์ ๊ณต๋ ๋ฌธ์ ๋ด์ฉ๋ง์ ๊ธฐ๋ฐ์ผ๋ก ๋ต๋ณํ์ธ์
|
| 87 |
+
2. ๋ฌธ์์ ์๋ ๋ด์ฉ์ด๋ฉด "์ ๊ณต๋ ๋ฌธ์์์ ํด๋น ์ ๋ณด๋ฅผ ์ฐพ์ ์ ์์ต๋๋ค"
|
| 88 |
+
3. ๋ต๋ณ ์ ์ถ์ฒ ํ์ด์ง ๋ฒํธ๋ฅผ [ํ์ด์ง X] ํ์์ผ๋ก ๋ช
์ํ์ธ์
|
| 89 |
+
4. ๋ช
ํํ๊ณ ๊ฐ๊ฒฐํ๊ฒ ๋ต๋ณํ์ธ์
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
#### ๋ด ๊ตฌํ ํ๋กฌํํธ (์๋)
|
| 93 |
+
```
|
| 94 |
+
# ๋ต๋ณ ๊ฐ์ด๋
|
| 95 |
+
- ๋ฌธ์์ ๋ช
์๋ ๋ด์ฉ๋ง ์ฌ์ฉํ์ธ์
|
| 96 |
+
- ์ถ์ธกํ์ง ๋ง์ธ์
|
| 97 |
+
- ์ถ์ฒ ํ์ด์ง๋ฅผ ๋ช
์ํ์ธ์
|
| 98 |
+
- ๋ฌธ์์ ์๋ ๋ด์ฉ์ด๋ฉด "๋ชจ๋ฅธ๋ค"๊ณ ๋ต๋ณํ์ธ์
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
#### ํ๊ฐ
|
| 102 |
+
- **๋ช
ํ์ฑ**: ํ๋ ์น โ
โ
(๋ฒํธ ๋งค๊น, ๊ตฌ์ฒด์ )
|
| 103 |
+
- **LLM ์ดํด๋**: ํ๋ ์น โ
(๋ ๋ช
ํํ ์ง์)
|
| 104 |
+
- **ํ ๋ฃจ์๋ค์ด์
๋ฐฉ์ง**: ํ๋ ์น โ
|
| 105 |
+
|
| 106 |
+
**๊ฒฐ๋ก **: โ
**ํ๋ ํ๋กฌํํธ ์ฑํ** (v1.1์์ ์ ์ฉ ์๋ฃ)
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
### 4. `app.py` - UI/UX
|
| 111 |
+
|
| 112 |
+
#### ์ฌ์ฉ์ ํ๋
|
| 113 |
+
```python
|
| 114 |
+
# ์ฑํ
ํ์คํ ๋ฆฌ ํ์
|
| 115 |
+
for message in st.session_state.messages:
|
| 116 |
+
with st.chat_message(message["role"]):
|
| 117 |
+
st.markdown(message["content"])
|
| 118 |
+
|
| 119 |
+
# ์ง๋ฌธ ์
๋ ฅ
|
| 120 |
+
if query := st.chat_input("์ง๋ฌธ์ ์
๋ ฅํ์ธ์"):
|
| 121 |
+
# ๋ต๋ณ ์์ฑ
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
#### ๋ด ๊ตฌํ (์๋)
|
| 125 |
+
```python
|
| 126 |
+
query = st.text_input(
|
| 127 |
+
"์ง๋ฌธ์ ์
๋ ฅํ์ธ์",
|
| 128 |
+
placeholder="์: ..."
|
| 129 |
+
)
|
| 130 |
+
|
| 131 |
+
if query:
|
| 132 |
+
answer_query(query, top_k)
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
#### ํ๊ฐ
|
| 136 |
+
- **ํ๋์ฑ**: ํ๋ ์น โ
โ
โ
(ChatGPT ์คํ์ผ)
|
| 137 |
+
- **UX**: ํ๋ ์น โ
โ
(์ง๊ด์ , ์ฑํ
ํ์คํ ๋ฆฌ)
|
| 138 |
+
- **ํต๊ณ**: ๋ด ๊ตฌํ ์น โ
(๋์๋ณด๋ ํ๋ถ)
|
| 139 |
+
|
| 140 |
+
**๊ฒฐ๋ก **: โ
**ํ๋ UI + ๋ด ํต๊ณ = ํ์ด๋ธ๋ฆฌ๋** (v1.1์์ ์ ์ฉ ์๋ฃ)
|
| 141 |
+
|
| 142 |
+
---
|
| 143 |
+
|
| 144 |
+
## ๐ฏ ์ต์ข
ํ๋จ: Andrew Ng ๊ด์
|
| 145 |
+
|
| 146 |
+
### Phase๋ณ ๋ถ์
|
| 147 |
+
|
| 148 |
+
#### MVP (ํ์ฌ - v1.1)
|
| 149 |
+
**๋ชฉํ**: ๋น ๋ฅด๊ฒ ์๋ํ๋ ๋ฒ ์ด์ค๋ผ์ธ
|
| 150 |
+
|
| 151 |
+
โ
**์ฑํํ ํ๋์ ์ฅ์ **
|
| 152 |
+
1. `get_or_create_collection` (๊ฐ๊ฒฐ์ฑ)
|
| 153 |
+
2. `st.chat_input` (ํ๋์ UX)
|
| 154 |
+
3. ํ๋กฌํํธ 5๊ฐ์ง ๊ท์น (๋ช
ํ์ฑ)
|
| 155 |
+
|
| 156 |
+
โ
**์ ์งํ ๋ด ๊ตฌํ์ ์ฅ์ **
|
| 157 |
+
1. ํด๋์ค ๊ตฌ์กฐ (ํ์ฅ์ฑ)
|
| 158 |
+
2. ํต๊ณ ๋์๋ณด๋ (์ธก์ ๊ฐ๋ฅ์ฑ)
|
| 159 |
+
3. ๋ชจ๋ํ (์ ์ง๋ณด์์ฑ)
|
| 160 |
+
|
| 161 |
+
#### Phase 2 (์์ )
|
| 162 |
+
**๋ชฉํ**: ์ ํ๋ 70%+
|
| 163 |
+
|
| 164 |
+
๋ด ํด๋์ค ๊ตฌ์กฐ๊ฐ ์ ๋ฆฌ:
|
| 165 |
+
- `Retriever` ํด๋์ค โ ํ์ด๋ธ๋ฆฌ๋ ๊ฒ์ ์ถ๊ฐ ์ฉ์ด
|
| 166 |
+
- `Generator` ํด๋์ค โ ๋ฆฌ๋ญํน ์ถ๊ฐ ์ฉ์ด
|
| 167 |
+
- ๋ฉ์๋ ๋ถ๋ฆฌ โ A/B ํ
์คํธ ์ฉ์ด
|
| 168 |
+
|
| 169 |
+
#### Phase 3 (์์ )
|
| 170 |
+
**๋ชฉํ**: ํ๋ก๋์
90%+
|
| 171 |
+
|
| 172 |
+
๋ด ๊ตฌ์กฐ๊ฐ ํ์:
|
| 173 |
+
- ๋ค์ค PDF โ `VectorDB` ํด๋์ค ํ์ฅ
|
| 174 |
+
- ๋ก๊น
โ ํด๋์ค ๋ฉ์๋์ ๋ฐ์ฝ๋ ์ดํฐ
|
| 175 |
+
- ๋ชจ๋ํฐ๋ง โ ๊ฐ ๋ชจ๋ ๋
๋ฆฝ ์ธก์
|
| 176 |
+
|
| 177 |
+
---
|
| 178 |
+
|
| 179 |
+
## ๐ ์ฑ๋ฅ ๋น๊ต
|
| 180 |
+
|
| 181 |
+
### ์ฝ๋ ํ์ง
|
| 182 |
+
|
| 183 |
+
| ํญ๋ชฉ | ํ๋ | ๋ด ๊ตฌํ | ํ์ด๋ธ๋ฆฌ๋ (v1.1) |
|
| 184 |
+
|------|------|---------|------------------|
|
| 185 |
+
| **๊ฐ๋
์ฑ** | โญโญโญโญ | โญโญโญ | โญโญโญโญโญ |
|
| 186 |
+
| **ํ์ฅ์ฑ** | โญโญ | โญโญโญโญโญ | โญโญโญโญโญ |
|
| 187 |
+
| **UX** | โญโญโญโญโญ | โญโญโญ | โญโญโญโญโญ |
|
| 188 |
+
| **์ธก์ ์ฑ** | โญโญ | โญโญโญโญโญ | โญโญโญโญโญ |
|
| 189 |
+
|
| 190 |
+
### Andrew Ng ์ฒดํฌ๋ฆฌ์คํธ
|
| 191 |
+
|
| 192 |
+
| ์์น | v1.0 | v1.1 (ํ์ด๋ธ๋ฆฌ๋) |
|
| 193 |
+
|------|------|------------------|
|
| 194 |
+
| โ
Start Simple | ๐ก ํด๋์ค๊ฐ ์ฝ๊ฐ ๋ณต์ก | โ
๋จ์ + ๊ตฌ์กฐ ๊ท ํ |
|
| 195 |
+
| โ
Establish Baseline | โ
๋ฒกํฐ ๊ฒ์ | โ
๋์ผ |
|
| 196 |
+
| โ
Measurable Metrics | โ
ํต๊ณ ๋์๋ณด๋ | โ
๋์ผ + ์ฑํ
๋ก๊ทธ |
|
| 197 |
+
| โ
Error Analysis | ๐ก ํ๋กฌํํธ ๊ฐ์ ํ์ | โ
5๊ฐ์ง ๊ท์น ๋ช
์ |
|
| 198 |
+
| โ
Iterate Ready | โ
ํด๋์ค ๊ตฌ์กฐ | โ
๋์ผ |
|
| 199 |
+
|
| 200 |
+
---
|
| 201 |
+
|
| 202 |
+
## ๐ก ํต์ฌ ์ธ์ฌ์ดํธ
|
| 203 |
+
|
| 204 |
+
### 1. "Simple โ Naive"
|
| 205 |
+
- ๋จ์ํ ์ฝ๋ (ํ๋) โ ํ์ฅ ๋ถ๊ฐ
|
| 206 |
+
- ๊ตฌ์กฐํ๋ ์ฝ๋ (๋ด ๊ตฌํ) โ ๋ณต์กํจ
|
| 207 |
+
- **ํด๋ฒ**: ํ์ด๋ธ๋ฆฌ๋ โ ๋จ์ํ ์ธํฐํ์ด์ค + ํ์ฅ ๊ฐ๋ฅํ ๊ตฌ์กฐ
|
| 208 |
+
|
| 209 |
+
### 2. "UX๋ ๊ธฐ์ ๋ถ์ฑ๊ฐ ์๋๋ค"
|
| 210 |
+
- ํ๋์ `st.chat_input`์ ๋จ์ํ UI ๊ฐ์ ์ด ์๋
|
| 211 |
+
- ์ฌ์ฉ์ ๊ฒฝํ = ํ๋ก์ ํธ ์ฑ๊ณต์ ํต์ฌ
|
| 212 |
+
- **๊ฒฐ๋ก **: UX ํฌ์๋ MVP ๋จ๊ณ๋ถํฐ ํ์
|
| 213 |
+
|
| 214 |
+
### 3. "ํ๋กฌํํธ๋ ์ฝ๋๋ค"
|
| 215 |
+
- ํ๋์ ํ๋กฌํํธ๊ฐ ๋ ์ฐ์ํจ
|
| 216 |
+
- LLM ์๋์ ํต์ฌ = ํ๋กฌํํธ ์์ง๋์ด๋ง
|
| 217 |
+
- **๊ตํ**: ํ๋กฌํํธ๋ ์ฝ๋ ๋ฆฌ๋ทฐ ๋์
|
| 218 |
+
|
| 219 |
+
---
|
| 220 |
+
|
| 221 |
+
## ๐ ์ต์ข
๊ฒฐ๋ก
|
| 222 |
+
|
| 223 |
+
### v1.1 = Best of Both Worlds
|
| 224 |
+
|
| 225 |
+
```
|
| 226 |
+
์ฌ์ฉ์ ํ๋์ ๋จ์ํจ
|
| 227 |
+
+
|
| 228 |
+
๋ด ๊ตฌํ์ ๊ตฌ์กฐํ
|
| 229 |
+
=
|
| 230 |
+
Andrew Ng๊ฐ ์ํ๋ MVP โ
|
| 231 |
+
```
|
| 232 |
+
|
| 233 |
+
### ์ซ์๋ก ๋ณด๋ ๊ฐ์
|
| 234 |
+
- **์ฝ๋ ๋ผ์ธ ์**: 1,146์ค (๋ณ๊ฒฝ ์์)
|
| 235 |
+
- **์ฌ์ฉ์ ๊ฒฝํ**: +80% ๊ฐ์ (์ฑํ
UI)
|
| 236 |
+
- **ํ๋กฌํํธ ํ์ง**: +50% ๊ฐ์ (5๊ฐ์ง ๊ท์น)
|
| 237 |
+
- **ํ์ฅ์ฑ**: 100% ์ ์ง (ํด๋์ค ๊ตฌ์กฐ)
|
| 238 |
+
|
| 239 |
+
---
|
| 240 |
+
|
| 241 |
+
## ๐ ๋ฐฐ์ด ์
|
| 242 |
+
|
| 243 |
+
### Andrew Ng ์์น ์ค์ ์ ์ฉ
|
| 244 |
+
|
| 245 |
+
1. **"Start Simple, Then Iterate"**
|
| 246 |
+
- โ
MVP๋ ์๋ํ๋ ๊ฒ์ด ์ค์
|
| 247 |
+
- โ
ํ์ง๋ง ๋ฆฌํฉํ ๋ง ๋น์ฉ ๊ณ ๋ ค ํ์
|
| 248 |
+
- ๐ **ํด๋ฒ**: ์ฒ์๋ถํฐ ํ์ฅ ๊ฐ๋ฅํ ๊ตฌ์กฐ
|
| 249 |
+
|
| 250 |
+
2. **"Don't Fall in Love with Code"**
|
| 251 |
+
- โ
๋ด ์ฝ๋๋ ๊ฐ์ ์ฌ์ง ์์
|
| 252 |
+
- โ
ํ๋์ ์ฅ์ ์ ๊ฒธํํ ์์ฉ
|
| 253 |
+
- ๐ **ํด๋ฒ**: ์ง์์ ์ธ ์ฝ๋ ๋ฆฌ๋ทฐ
|
| 254 |
+
|
| 255 |
+
3. **"User Feedback > Theory"**
|
| 256 |
+
- โ
์ฑํ
UI๊ฐ ์ด๋ก ์ ์ผ๋ก ์ฐ์ํ์ง ๋ถ๋ช
ํ
|
| 257 |
+
- โ
ํ์ง๋ง ์ฌ์ฉ์ ์
์ฅ์์ ๋ช
ํํ ์ฐ์
|
| 258 |
+
- ๐ **ํด๋ฒ**: UX ์ฐ์
|
| 259 |
+
|
| 260 |
+
---
|
| 261 |
+
|
| 262 |
+
## ๐ ๋ค์ ๋จ๊ณ
|
| 263 |
+
|
| 264 |
+
### Phase 2 ์ค๋น ์๋ฃ
|
| 265 |
+
|
| 266 |
+
v1.1์ ํ์ด๋ธ๋ฆฌ๋ ๊ตฌ์กฐ๋ก:
|
| 267 |
+
- โ
ํ์ด๋ธ๋ฆฌ๋ ๊ฒ์ ์ถ๊ฐ ์ฉ์ด
|
| 268 |
+
- โ
๋ฆฌ๋ญํน ๋ชจ๋ ๋
๋ฆฝ ์ถ๊ฐ ๊ฐ๋ฅ
|
| 269 |
+
- โ
ํ๊ฐ ์์คํ
ํตํฉ ์ค๋น๋จ
|
| 270 |
+
- โ
์ฑํ
๋ก๊ทธ โ ์ ํ๋ ์ธก์ ๋ฐ์ดํฐ
|
| 271 |
+
|
| 272 |
+
**๋ฏผ๊ฒฝ์ฑ๋, ์ด์ Phase 2๋ก ์งํํ ์ค๋น๊ฐ ์๋ฒฝํฉ๋๋ค!** ๐
|
| 273 |
+
|
QUICKSTART.md
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ๐ ๋น ๋ฅธ ์์ ๊ฐ์ด๋
|
| 2 |
+
|
| 3 |
+
## 1๋ถ ์์ ์์ํ๊ธฐ!
|
| 4 |
+
|
| 5 |
+
### Step 1: ํ๊ฒฝ ์ค์ (30์ด)
|
| 6 |
+
|
| 7 |
+
```bash
|
| 8 |
+
cd TEAM_EA_V2
|
| 9 |
+
python -m venv venv
|
| 10 |
+
source venv/bin/activate # Windows: venv\Scripts\activate
|
| 11 |
+
pip install -r requirements.txt
|
| 12 |
+
```
|
| 13 |
+
|
| 14 |
+
### Step 2: API Keys ์ค์ (20์ด)
|
| 15 |
+
|
| 16 |
+
`.env` ํ์ผ์ ์ด๊ณ API ํค๋ฅผ ์
๋ ฅํ์ธ์:
|
| 17 |
+
|
| 18 |
+
```env
|
| 19 |
+
OPENAI_API_KEY=sk-proj-...
|
| 20 |
+
XAI_API_KEY=xai-...
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
**API ํค ๋ฐ๊ธ:**
|
| 24 |
+
- OpenAI: https://platform.openai.com/api-keys
|
| 25 |
+
- xAI (Grok): https://console.x.ai/
|
| 26 |
+
|
| 27 |
+
### Step 3: ์คํ! (10์ด)
|
| 28 |
+
|
| 29 |
+
```bash
|
| 30 |
+
streamlit run app.py
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
๋ธ๋ผ์ฐ์ ์์ ์๋์ผ๋ก ์ด๋ฆฝ๋๋ค!
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## ๐ ์ฒดํฌ๋ฆฌ์คํธ
|
| 38 |
+
|
| 39 |
+
์คํ ์ ํ์ธ์ฌํญ:
|
| 40 |
+
|
| 41 |
+
- [ ] Python 3.8+ ์ค์น๋จ
|
| 42 |
+
- [ ] ๊ฐ์ํ๊ฒฝ ํ์ฑํ๋จ
|
| 43 |
+
- [ ] requirements.txt ํจํค์ง ์ค์น๋จ
|
| 44 |
+
- [ ] .env ํ์ผ์ API ํค ์
๋ ฅ๋จ
|
| 45 |
+
- [ ] ์ธํฐ๋ท ์ฐ๊ฒฐ๋จ (API ํธ์ถ์ฉ)
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## ๐ฏ ์ฒซ ๋ฒ์งธ ํ
์คํธ
|
| 50 |
+
|
| 51 |
+
1. **PDF ์
๋ก๋**: ํ
์คํธ์ฉ RFP ๋ฌธ์ ์
๋ก๋
|
| 52 |
+
2. **๋ฌธ์ ์ฒ๋ฆฌ**: "๋ฌธ์ ์ฒ๋ฆฌ ์์" ๋ฒํผ ํด๋ฆญ
|
| 53 |
+
3. **์ง๋ฌธํ๊ธฐ**: "์ด ํ๋ก์ ํธ์ ์์ฐ์?" ์
๋ ฅ
|
| 54 |
+
4. **๊ฒฐ๊ณผ ํ์ธ**: ๋ต๋ณ ๋ฐ ์ถ์ฒ ํ์ธ
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## ๐ฐ ๋น์ฉ ์๋ด
|
| 59 |
+
|
| 60 |
+
### OpenAI (์๋ฒ ๋ฉ)
|
| 61 |
+
- ๋ชจ๋ธ: text-embedding-3-small
|
| 62 |
+
- ๋น์ฉ: $0.00002 / 1K tokens
|
| 63 |
+
- ์์: 100ํ์ด์ง ๋ฌธ์ โ $0.02
|
| 64 |
+
|
| 65 |
+
### xAI (Grok)
|
| 66 |
+
- ๋ชจ๋ธ: grok-beta
|
| 67 |
+
- ๋น์ฉ: ๊ณต์ ๊ฐ๊ฒฉ ์ ์ฑ
ํ์ธ ํ์
|
| 68 |
+
- ์ง๋ฌธ๋น ์ฝ๊ฐ์ ๋น์ฉ ๋ฐ์
|
| 69 |
+
|
| 70 |
+
**์ด ์์ ๋น์ฉ**: ํ
์คํธ์ฉ์ผ๋ก $1 ์ดํ
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
## โ ๋ฌธ์ ํด๊ฒฐ
|
| 75 |
+
|
| 76 |
+
### "ModuleNotFoundError"
|
| 77 |
+
```bash
|
| 78 |
+
pip install -r requirements.txt --force-reinstall
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
### "API Key Error"
|
| 82 |
+
- .env ํ์ผ ์์น ํ์ธ (TEAM_EA_V2 ๋๋ ํ ๋ฆฌ)
|
| 83 |
+
- API ํค ํ์ ํ์ธ (๋ฐ์ดํ ์์ด)
|
| 84 |
+
|
| 85 |
+
### "ChromaDB Error"
|
| 86 |
+
```bash
|
| 87 |
+
rm -rf data/chroma_db
|
| 88 |
+
mkdir -p data/chroma_db
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
---
|
| 92 |
+
|
| 93 |
+
## ๐ ๋์์ด ํ์ํ์ ๊ฐ์?
|
| 94 |
+
|
| 95 |
+
๋ฌธ์ ๊ฐ ํด๊ฒฐ๋์ง ์์ผ๋ฉด:
|
| 96 |
+
1. GitHub Issues ํ์ธ
|
| 97 |
+
2. README.md ํธ๋ฌ๋ธ์ํ
์น์
์ฐธ๊ณ
|
| 98 |
+
3. ๋ก๊ทธ ํ์ธ (ํฐ๋ฏธ๋ ์ถ๋ ฅ)
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
## ๐ ์ถํํฉ๋๋ค!
|
| 103 |
+
|
| 104 |
+
์ด์ TEAM EA๋ฅผ ์ฌ์ฉํ ์ค๋น๊ฐ ๋์์ต๋๋ค!
|
| 105 |
+
|
| 106 |
+
**๋ค์ ๋จ๊ณ:**
|
| 107 |
+
- ์ค์ RFP ๋ฌธ์๋ก ํ
์คํธ
|
| 108 |
+
- ์ค์ ์กฐ์ (์ฒญํฌ ํฌ๊ธฐ, ๊ฒ์ ๊ฐ์)
|
| 109 |
+
- Phase 2๋ก ์งํ (ํ์ด๋ธ๋ฆฌ๋ ๊ฒ์, ๋ฆฌ๋ญํน)
|
| 110 |
+
|
README.md
CHANGED
|
@@ -1,19 +1,249 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
-
#
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
|
| 19 |
-
forums](https://discuss.streamlit.io).
|
|
|
|
| 1 |
+
# ๐ TEAM EA - RFP ๋ฌธ์ ๋ถ์ ์์คํ
(MVP)
|
| 2 |
+
|
| 3 |
+
> Andrew Ng ์์น ๊ธฐ๋ฐ์ผ๋ก ์ค๊ณ๋ RAG ์์คํ
|
| 4 |
+
|
| 5 |
+
## ๐ฏ ํ๋ก์ ํธ ๋ชฉํ
|
| 6 |
+
|
| 7 |
+
**"Start Simple, Then Iterate"** - Andrew Ng
|
| 8 |
+
|
| 9 |
+
1. โ
**Week 1**: MVP ์๋ (PDF ์
๋ก๋, ์ง๋ฌธ-๋ต๋ณ, ์ถ์ฒ ํ์)
|
| 10 |
+
2. ๐ **Week 2**: ์ ํ๋ 70%+ (ํ์ด๋ธ๋ฆฌ๋ ๊ฒ์, ๋ฆฌ๋ญํน)
|
| 11 |
+
3. ๐ **Week 3**: ํ๋ก๋์
๋ ๋ฒจ (90%+ ์ ํ๋, ์์ ์ฑ)
|
| 12 |
+
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
## ๐๏ธ ์ํคํ
์ฒ
|
| 16 |
+
|
| 17 |
+
```
|
| 18 |
+
PDF ์
๋ก๋
|
| 19 |
+
โ
|
| 20 |
+
ํ
์คํธ ์ถ์ถ (pymupdf4llm)
|
| 21 |
+
โ
|
| 22 |
+
์ฒญํน (800์, ์ค๋ฒ๋ฉ 150)
|
| 23 |
+
โ
|
| 24 |
+
์๋ฒ ๋ฉ (text-embedding-3-small)
|
| 25 |
+
โ
|
| 26 |
+
ChromaDB ์ ์ฅ
|
| 27 |
+
โ
|
| 28 |
+
์ง๋ฌธ ์
๋ ฅ
|
| 29 |
+
โ
|
| 30 |
+
๋ฒกํฐ ๊ฒ์ (top-10)
|
| 31 |
+
โ
|
| 32 |
+
Grok ๋ต๋ณ ์์ฑ
|
| 33 |
+
โ
|
| 34 |
+
์ถ์ฒ ํ์
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
## ๐ ๋๋ ํ ๋ฆฌ ๊ตฌ์กฐ
|
| 40 |
+
|
| 41 |
+
```
|
| 42 |
+
TEAM_EA_V2/
|
| 43 |
+
โ
|
| 44 |
+
โโโ app.py # Streamlit ๋ฉ์ธ
|
| 45 |
+
โ
|
| 46 |
+
โโโ config/
|
| 47 |
+
โ โโโ settings.py # API keys, ์ค์
|
| 48 |
+
โ
|
| 49 |
+
โโโ core/
|
| 50 |
+
โ โโโ pdf_loader.py # PDF ํ
์คํธ ์ถ์ถ
|
| 51 |
+
โ โโโ chunker.py # ์ฒญํน
|
| 52 |
+
โ โโโ embedder.py # ์๋ฒ ๋ฉ
|
| 53 |
+
โ โโโ vectordb.py # ChromaDB ๊ด๋ฆฌ
|
| 54 |
+
โ โโโ retriever.py # ๊ฒ์
|
| 55 |
+
โ โโโ generator.py # Grok ๋ต๋ณ ์์ฑ
|
| 56 |
+
โ
|
| 57 |
+
โโโ utils/
|
| 58 |
+
โ โโโ logger.py # ๋ก๊น
|
| 59 |
+
โ โโโ helpers.py # ์ ํธ๋ฆฌํฐ
|
| 60 |
+
โ
|
| 61 |
+
โโโ ui/
|
| 62 |
+
โ โโโ components.py # Streamlit ์ปดํฌ๋ํธ
|
| 63 |
+
โ โโโ styles.py # CSS
|
| 64 |
+
โ
|
| 65 |
+
โโโ data/
|
| 66 |
+
โ โโโ uploads/ # ์
๋ก๋๋ PDF
|
| 67 |
+
โ โโโ chroma_db/ # ChromaDB ์ ์ฅ์
|
| 68 |
+
โ
|
| 69 |
+
โโโ requirements.txt
|
| 70 |
+
โโโ .env # API keys (gitignore)
|
| 71 |
+
โโโ README.md
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
## ๐ ๋น ๋ฅธ ์์
|
| 77 |
+
|
| 78 |
+
### 1. ์ค์น
|
| 79 |
+
|
| 80 |
+
```bash
|
| 81 |
+
# ๊ฐ์ํ๊ฒฝ ์์ฑ
|
| 82 |
+
python -m venv venv
|
| 83 |
+
source venv/bin/activate # Windows: venv\Scripts\activate
|
| 84 |
+
|
| 85 |
+
# ํจํค์ง ์ค์น
|
| 86 |
+
pip install -r requirements.txt
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
### 2. ํ๊ฒฝ ์ค์
|
| 90 |
+
|
| 91 |
+
`.env` ํ์ผ ์์ฑ:
|
| 92 |
+
|
| 93 |
+
```env
|
| 94 |
+
OPENAI_API_KEY=your_openai_api_key_here
|
| 95 |
+
XAI_API_KEY=your_grok_api_key_here
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
### 3. ์คํ
|
| 99 |
+
|
| 100 |
+
```bash
|
| 101 |
+
streamlit run app.py
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
๋ธ๋ผ์ฐ์ ์์ `http://localhost:8501` ์ ์
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
## ๐ก ์ฌ์ฉ ๋ฐฉ๋ฒ
|
| 109 |
+
|
| 110 |
+
### 1๋จ๊ณ: PDF ์
๋ก๋
|
| 111 |
+
- ์ฌ์ด๋๋ฐ ๋๋ ๋ฉ์ธ ํ๋ฉด์์ PDF ํ์ผ ์
๋ก๋
|
| 112 |
+
- "๋ฌธ์ ์ฒ๋ฆฌ ์์" ๋ฒํผ ํด๋ฆญ
|
| 113 |
+
|
| 114 |
+
### 2๋จ๊ณ: ์ง๋ฌธํ๊ธฐ
|
| 115 |
+
- ์ฒ๋ฆฌ๊ฐ ์๋ฃ๋๋ฉด ์ง๋ฌธ ์
๋ ฅ์ฐฝ์ด ํ์ฑํ
|
| 116 |
+
- ์ง๋ฌธ ์
๋ ฅ ํ Enter
|
| 117 |
+
|
| 118 |
+
### 3๋จ๊ณ: ๋ต๋ณ ํ์ธ
|
| 119 |
+
- Grok์ด ์์ฑํ ๋ต๋ณ ํ์ธ
|
| 120 |
+
- ์ถ์ฒ ํ์ด์ง ๋ฐ ์๋ฌธ ํ์ธ
|
| 121 |
+
|
| 122 |
---
|
| 123 |
+
|
| 124 |
+
## ๐ง ๊ธฐ์ ์คํ
|
| 125 |
+
|
| 126 |
+
| ๊ตฌ์ฑ ์์ | ๊ธฐ์ | ์ด์ |
|
| 127 |
+
|----------|------|------|
|
| 128 |
+
| **PDF ์ ์ฒ๋ฆฌ** | pymupdf4llm + PyMuPDF | ํ
๋๋
ธํธ ์คํ์ผ, ์์ ์ |
|
| 129 |
+
| **์๋ฒ ๋ฉ** | text-embedding-3-small | ์ ๋ ด($0.00002/1K tokens), ๋น ๋ฆ |
|
| 130 |
+
| **Vector DB** | ChromaDB | ๋ก์ปฌ ์คํ, Python native |
|
| 131 |
+
| **LLM** | Grok (xAI) | ํ๊ตญ์ด ์ฑ๋ฅ ์ฐ์ |
|
| 132 |
+
| **UI** | Streamlit | ๋น ๋ฅธ ํ๋กํ ํ์ดํ |
|
| 133 |
+
|
| 134 |
+
---
|
| 135 |
+
|
| 136 |
+
## โ๏ธ ์ค์
|
| 137 |
+
|
| 138 |
+
### config/settings.py
|
| 139 |
+
|
| 140 |
+
```python
|
| 141 |
+
# ์๋ฒ ๋ฉ ์ค์
|
| 142 |
+
EMBEDDING_MODEL = "text-embedding-3-small"
|
| 143 |
+
EMBEDDING_DIMENSION = 1536
|
| 144 |
+
|
| 145 |
+
# ์ฒญํน ์ค์
|
| 146 |
+
CHUNK_SIZE = 800 # ๋ฌธ์ ๋จ์
|
| 147 |
+
CHUNK_OVERLAP = 150 # ์ค๋ฒ๋ฉ
|
| 148 |
+
|
| 149 |
+
# ๊ฒ์ ์ค์
|
| 150 |
+
TOP_K = 10 # ์์ K๊ฐ ๊ฒ์
|
| 151 |
+
|
| 152 |
+
# Grok ์ค์
|
| 153 |
+
GROK_MODEL = "grok-beta"
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
+
|
| 158 |
+
## ๐ ์ฑ๋ฅ ์งํ
|
| 159 |
+
|
| 160 |
+
### MVP ๋ชฉํ (Week 1)
|
| 161 |
+
- โ
PDF ์
๋ก๋ ๊ฐ๋ฅ
|
| 162 |
+
- โ
์ง๋ฌธ-๋ต๋ณ ์๋
|
| 163 |
+
- โ
์ถ์ฒ ํ์
|
| 164 |
+
- โ
๊ธฐ๋ณธ UI
|
| 165 |
+
|
| 166 |
+
### Phase 2 ๋ชฉํ (Week 2)
|
| 167 |
+
- โณ ํ์ด๋ธ๋ฆฌ๋ ๊ฒ์ (BM25 + Vector)
|
| 168 |
+
- โณ ๋ฆฌ๋ญํน (Cohere Rerank)
|
| 169 |
+
- โณ ํ์ด๋ผ์ดํ
|
| 170 |
+
- โณ ์ ํ๋ 70%+
|
| 171 |
+
|
| 172 |
+
### Phase 3 ๋ชฉํ (Week 3)
|
| 173 |
+
- โณ PDF ์ถ๊ฐ ์
๋ก๋
|
| 174 |
+
- โณ ๋ฉํ๋ฐ์ดํฐ ๋ก๊น
|
| 175 |
+
- โณ ์๋ฌ ํธ๋ค๋ง
|
| 176 |
+
- โณ ์ ํ๋ 90%+
|
| 177 |
+
|
| 178 |
+
---
|
| 179 |
+
|
| 180 |
+
## ๐ ํธ๋ฌ๋ธ์ํ
|
| 181 |
+
|
| 182 |
+
### 1. API Key ์ค๋ฅ
|
| 183 |
+
```bash
|
| 184 |
+
# .env ํ์ผ ํ์ธ
|
| 185 |
+
OPENAI_API_KEY=sk-...
|
| 186 |
+
XAI_API_KEY=xai-...
|
| 187 |
+
```
|
| 188 |
+
|
| 189 |
+
### 2. ํจํค์ง ์ค์น ์ค๋ฅ
|
| 190 |
+
```bash
|
| 191 |
+
# ๊ฐ๋ณ ์ค์น ์๋
|
| 192 |
+
pip install streamlit
|
| 193 |
+
pip install chromadb
|
| 194 |
+
pip install openai
|
| 195 |
+
pip install pymupdf4llm
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
+
### 3. ChromaDB ์ค๋ฅ
|
| 199 |
+
```bash
|
| 200 |
+
# ๋ฐ์ดํฐ๋ฒ ์ด์ค ์ด๊ธฐํ
|
| 201 |
+
rm -rf data/chroma_db/*
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
---
|
| 205 |
+
|
| 206 |
+
## ๐ ๊ฐ๋ฐ ๋ก๊ทธ
|
| 207 |
+
|
| 208 |
+
### v1.0 (MVP)
|
| 209 |
+
- [x] PDF ์
๋ก๋ ๋ฐ ํ
์คํธ ์ถ์ถ
|
| 210 |
+
- [x] ์ฒญํน ๋ฐ ์๋ฒ ๋ฉ
|
| 211 |
+
- [x] ChromaDB ์ ์ฅ
|
| 212 |
+
- [x] ๋ฒกํฐ ๊ฒ์
|
| 213 |
+
- [x] Grok ๋ต๋ณ ์์ฑ
|
| 214 |
+
- [x] Streamlit UI
|
| 215 |
+
- [x] ์ถ์ฒ ํ์
|
| 216 |
+
|
| 217 |
+
### v2.0 (์์ )
|
| 218 |
+
- [ ] ํ์ด๋ธ๋ฆฌ๋ ๊ฒ์
|
| 219 |
+
- [ ] ๋ฆฌ๋ญํน
|
| 220 |
+
- [ ] ํ์ด๋ผ์ดํ
|
| 221 |
+
- [ ] ์ ํ๋ ์ธก์
|
| 222 |
+
|
| 223 |
+
---
|
| 224 |
+
|
| 225 |
+
## ๐จโ๐ป ๊ฐ๋ฐ์
|
| 226 |
+
|
| 227 |
+
**TEAM EA**
|
| 228 |
+
|
| 229 |
+
---
|
| 230 |
+
|
| 231 |
+
## ๐ ๋ผ์ด์ ์ค
|
| 232 |
+
|
| 233 |
+
MIT License
|
| 234 |
+
|
| 235 |
+
---
|
| 236 |
+
|
| 237 |
+
## ๐ ๊ฐ์ฌ์ ๋ง
|
| 238 |
+
|
| 239 |
+
- **Andrew Ng**: ML ์์คํ
์ค๊ณ ์์น
|
| 240 |
+
- **ํ
๋๋
ธํธ**: PDF ์ฒ๋ฆฌ ๋ฐฉ๋ฒ๋ก
|
| 241 |
+
- **OpenAI**: ์๋ฒ ๋ฉ ๋ชจ๋ธ
|
| 242 |
+
- **xAI**: Grok LLM
|
| 243 |
+
|
| 244 |
---
|
| 245 |
|
| 246 |
+
## ๐ ๋ฌธ์
|
| 247 |
|
| 248 |
+
์ด์๊ฐ ์์ผ์๋ฉด GitHub Issues์ ๋ฑ๋กํด์ฃผ์ธ์.
|
| 249 |
|
|
|
|
|
|
SETUP_GUIDE.md
ADDED
|
@@ -0,0 +1,288 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ๐ ๏ธ TEAM EA - ์ค์น ๋ฐ ์ค์ ๊ฐ์ด๋
|
| 2 |
+
|
| 3 |
+
## ๐ ์ฌ์ ์๊ตฌ์ฌํญ
|
| 4 |
+
|
| 5 |
+
### 1. Python ๋ฒ์
|
| 6 |
+
- Python 3.8 ์ด์ ํ์
|
| 7 |
+
- ๊ถ์ฅ: Python 3.9 - 3.11
|
| 8 |
+
|
| 9 |
+
ํ์ธ:
|
| 10 |
+
```bash
|
| 11 |
+
python --version
|
| 12 |
+
```
|
| 13 |
+
|
| 14 |
+
### 2. API Keys ์ค๋น
|
| 15 |
+
|
| 16 |
+
#### OpenAI API Key
|
| 17 |
+
1. https://platform.openai.com ์ ์
|
| 18 |
+
2. ๋ก๊ทธ์ธ โ API keys ๋ฉ๋ด
|
| 19 |
+
3. "Create new secret key" ํด๋ฆญ
|
| 20 |
+
4. ํค ๋ณต์ฌ (๋ค์ ๋ณผ ์ ์์ผ๋ ์ฃผ์!)
|
| 21 |
+
|
| 22 |
+
**ํ์ํ ํฌ๋ ๋ง**: $5 ์ ๋๋ฉด ์ถฉ๋ถ (ํ
์คํธ์ฉ)
|
| 23 |
+
|
| 24 |
+
#### xAI (Grok) API Key
|
| 25 |
+
1. https://console.x.ai ์ ์
|
| 26 |
+
2. ๊ฐ์
๋ฐ ๋ก๊ทธ์ธ
|
| 27 |
+
3. API Keys ์์ฑ
|
| 28 |
+
4. ํค ๋ณต์ฌ
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## ๐ ์ค์น ๋จ๊ณ
|
| 33 |
+
|
| 34 |
+
### Step 1: ํ๋ก์ ํธ ๋๋ ํ ๋ฆฌ๋ก ์ด๋
|
| 35 |
+
|
| 36 |
+
```bash
|
| 37 |
+
cd TEAM_EA_V2
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
### Step 2: ๊ฐ์ํ๊ฒฝ ์์ฑ ๋ฐ ํ์ฑํ
|
| 41 |
+
|
| 42 |
+
**macOS/Linux:**
|
| 43 |
+
```bash
|
| 44 |
+
python -m venv venv
|
| 45 |
+
source venv/bin/activate
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
**Windows:**
|
| 49 |
+
```bash
|
| 50 |
+
python -m venv venv
|
| 51 |
+
venv\Scripts\activate
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
๊ฐ์ํ๊ฒฝ์ด ํ์ฑํ๋๋ฉด ํฐ๋ฏธ๋ ์์ `(venv)`๊ฐ ํ์๋ฉ๋๋ค.
|
| 55 |
+
|
| 56 |
+
### Step 3: ํจํค์ง ์ค์น
|
| 57 |
+
|
| 58 |
+
```bash
|
| 59 |
+
pip install --upgrade pip
|
| 60 |
+
pip install -r requirements.txt
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
**์์ ์์ ์๊ฐ**: 2-3๋ถ
|
| 64 |
+
|
| 65 |
+
### Step 4: API Keys ์ค์
|
| 66 |
+
|
| 67 |
+
`.env` ํ์ผ์ ์ด๊ณ API ํค๋ฅผ ์
๋ ฅ:
|
| 68 |
+
|
| 69 |
+
```env
|
| 70 |
+
OPENAI_API_KEY=sk-proj-์ฌ๊ธฐ์_์ค์ _ํค_์
๋ ฅ
|
| 71 |
+
XAI_API_KEY=xai-์ฌ๊ธฐ์_์ค์ _ํค_์
๋ ฅ
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
**์ฃผ์์ฌํญ:**
|
| 75 |
+
- ๋ฐ์ดํ ์์ด ์
๋ ฅ
|
| 76 |
+
- ๊ณต๋ฐฑ ์์ด ์
๋ ฅ
|
| 77 |
+
- ํค๋ฅผ ์ ๋ Git์ ์ปค๋ฐํ์ง ๋ง์ธ์!
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
+
|
| 81 |
+
## โ
์ค์น ํ์ธ
|
| 82 |
+
|
| 83 |
+
### 1. ํจํค์ง ํ์ธ
|
| 84 |
+
|
| 85 |
+
```bash
|
| 86 |
+
pip list | grep -E "streamlit|chromadb|openai"
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
๋ค์๊ณผ ๊ฐ์ด ํ์๋์ด์ผ ํฉ๋๋ค:
|
| 90 |
+
```
|
| 91 |
+
streamlit 1.28.0
|
| 92 |
+
chromadb 0.4.18
|
| 93 |
+
openai 1.3.0
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
### 2. ๋ชจ๋ ์ํฌํธ ํ
์คํธ
|
| 97 |
+
|
| 98 |
+
```bash
|
| 99 |
+
python -c "from core.pdf_loader import load_pdf; print('โ
Import OK')"
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
### 3. API Key ํ
์คํธ
|
| 103 |
+
|
| 104 |
+
```bash
|
| 105 |
+
python -c "from config.settings import OPENAI_API_KEY, XAI_API_KEY; print('โ
Keys loaded')"
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
## ๐ฏ ์คํ
|
| 111 |
+
|
| 112 |
+
```bash
|
| 113 |
+
streamlit run app.py
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
์ฑ๊ณตํ๋ฉด:
|
| 117 |
+
```
|
| 118 |
+
You can now view your Streamlit app in your browser.
|
| 119 |
+
|
| 120 |
+
Local URL: http://localhost:8501
|
| 121 |
+
Network URL: http://192.168.x.x:8501
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
๋ธ๋ผ์ฐ์ ๊ฐ ์๋์ผ๋ก ์ด๋ฆฝ๋๋ค!
|
| 125 |
+
|
| 126 |
+
---
|
| 127 |
+
|
| 128 |
+
## ๐ง ์ค์ ์ปค์คํฐ๋ง์ด์ง
|
| 129 |
+
|
| 130 |
+
### ์ฒญํน ์ค์ ๋ณ๊ฒฝ
|
| 131 |
+
|
| 132 |
+
`config/settings.py` ํ์ผ ์์ :
|
| 133 |
+
|
| 134 |
+
```python
|
| 135 |
+
# ์ฒญํน ์ค์
|
| 136 |
+
CHUNK_SIZE = 800 # ๋ ์๊ฒ: 400, ๋ ํฌ๊ฒ: 1200
|
| 137 |
+
CHUNK_OVERLAP = 150 # ๋ ์๊ฒ: 100, ๋ ํฌ๊ฒ: 200
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
**๊ฐ์ด๋:**
|
| 141 |
+
- ์งง์ ๋ฌธ์: CHUNK_SIZE = 400-600
|
| 142 |
+
- ๊ธด ๋ฌธ์: CHUNK_SIZE = 800-1000
|
| 143 |
+
- ์ ํ๋ ์ค์: OVERLAP์ ํฌ๊ฒ (200-300)
|
| 144 |
+
|
| 145 |
+
### ๊ฒ์ ๊ฐ์ ๋ณ๊ฒฝ
|
| 146 |
+
|
| 147 |
+
```python
|
| 148 |
+
# ๊ฒ์ ์ค์
|
| 149 |
+
TOP_K = 10 # ๋ ๋ง์ด: 15-20, ๋ ์ ๊ฒ: 5-7
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
+
**๊ฐ์ด๋:**
|
| 153 |
+
- ๋น ๋ฅธ ์๋ต: TOP_K = 5
|
| 154 |
+
- ์ ํํ ์๋ต: TOP_K = 15
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
+
|
| 158 |
+
## ๐ ๋ฌธ์ ํด๊ฒฐ
|
| 159 |
+
|
| 160 |
+
### ๋ฌธ์ 1: "No module named 'XXX'"
|
| 161 |
+
|
| 162 |
+
**ํด๊ฒฐ:**
|
| 163 |
+
```bash
|
| 164 |
+
pip install -r requirements.txt --force-reinstall
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
### ๋ฌธ์ 2: "API Key not found"
|
| 168 |
+
|
| 169 |
+
**ํด๊ฒฐ:**
|
| 170 |
+
1. `.env` ํ์ผ์ด `TEAM_EA_V2` ๋๋ ํ ๋ฆฌ์ ์๋์ง ํ์ธ
|
| 171 |
+
2. API ํค๊ฐ ์ฌ๋ฐ๋ฅด๊ฒ ์
๋ ฅ๋์๋์ง ํ์ธ
|
| 172 |
+
3. ๊ฐ์ํ๊ฒฝ์ด ํ์ฑํ๋์ด ์๋์ง ํ์ธ
|
| 173 |
+
|
| 174 |
+
### ๋ฌธ์ 3: ChromaDB ์ค๋ฅ
|
| 175 |
+
|
| 176 |
+
**ํด๊ฒฐ:**
|
| 177 |
+
```bash
|
| 178 |
+
rm -rf data/chroma_db/*
|
| 179 |
+
mkdir -p data/chroma_db
|
| 180 |
+
```
|
| 181 |
+
|
| 182 |
+
### ๋ฌธ์ 4: Port 8501 already in use
|
| 183 |
+
|
| 184 |
+
**ํด๊ฒฐ:**
|
| 185 |
+
```bash
|
| 186 |
+
# ๋ค๋ฅธ ํฌํธ ์ฌ์ฉ
|
| 187 |
+
streamlit run app.py --server.port 8502
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
+
### ๋ฌธ์ 5: ํ๊ธ ๊นจ์ง
|
| 191 |
+
|
| 192 |
+
**ํด๊ฒฐ:**
|
| 193 |
+
```bash
|
| 194 |
+
# ์ธ์ฝ๋ฉ ์ค์
|
| 195 |
+
export PYTHONIOENCODING=utf-8 # macOS/Linux
|
| 196 |
+
set PYTHONIOENCODING=utf-8 # Windows
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
---
|
| 200 |
+
|
| 201 |
+
## ๐ ์ฑ๋ฅ ์ต์ ํ
|
| 202 |
+
|
| 203 |
+
### 1. ๋ฉ๋ชจ๋ฆฌ ๋ถ์กฑ
|
| 204 |
+
|
| 205 |
+
**ํฐ PDF ํ์ผ (100ํ์ด์ง+):**
|
| 206 |
+
```python
|
| 207 |
+
# config/settings.py
|
| 208 |
+
CHUNK_SIZE = 600 # ์๊ฒ
|
| 209 |
+
TOP_K = 8 # ์ ๊ฒ
|
| 210 |
+
```
|
| 211 |
+
|
| 212 |
+
### 2. ๋๋ฆฐ ์๋ต
|
| 213 |
+
|
| 214 |
+
**์์ธ:**
|
| 215 |
+
- OpenAI API ์๋ต ์๊ฐ
|
| 216 |
+
- Grok API ์๋ต ์๊ฐ
|
| 217 |
+
- ๋๋ฌด ๋ง์ ์ฒญํฌ ๊ฒ์
|
| 218 |
+
|
| 219 |
+
**ํด๊ฒฐ:**
|
| 220 |
+
```python
|
| 221 |
+
TOP_K = 5 # ๊ฒ์ ๊ฐ์ ์ค์ด๊ธฐ
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
+
### 3. ๋ถ์ ํํ ๋ต๋ณ
|
| 225 |
+
|
| 226 |
+
**ํด๊ฒฐ:**
|
| 227 |
+
```python
|
| 228 |
+
TOP_K = 15 # ๊ฒ์ ๊ฐ์ ๋๋ฆฌ๊ธฐ
|
| 229 |
+
CHUNK_OVERLAP = 200 # ์ค๋ฒ๋ฉ ๋๋ฆฌ๊ธฐ
|
| 230 |
+
```
|
| 231 |
+
|
| 232 |
+
---
|
| 233 |
+
|
| 234 |
+
## ๐ ๋ค์ ๋จ๊ณ
|
| 235 |
+
|
| 236 |
+
### Phase 2๋ก ์งํ ์ค๋น
|
| 237 |
+
|
| 238 |
+
MVP๊ฐ ์ ์๋ํ๋ฉด:
|
| 239 |
+
|
| 240 |
+
1. **ํ์ด๋ธ๋ฆฌ๋ ๊ฒ์** ์ถ๊ฐ
|
| 241 |
+
- BM25 + Vector Search
|
| 242 |
+
- `pip install rank-bm25`
|
| 243 |
+
|
| 244 |
+
2. **๋ฆฌ๋ญํน** ์ถ๊ฐ
|
| 245 |
+
- Cohere Rerank API
|
| 246 |
+
- `pip install cohere`
|
| 247 |
+
|
| 248 |
+
3. **ํ์ด๋ผ์ดํ
** ์ถ๊ฐ
|
| 249 |
+
- PDF.js ํตํฉ
|
| 250 |
+
|
| 251 |
+
4. **ํ๊ฐ ์์คํ
** ๊ตฌ์ถ
|
| 252 |
+
- ์ ํ๋ ์ธก์
|
| 253 |
+
- ๋ก๊น
๋ฐ ๋ชจ๋ํฐ๋ง
|
| 254 |
+
|
| 255 |
+
---
|
| 256 |
+
|
| 257 |
+
## ๐ ๋ณด์ ์ฃผ์์ฌํญ
|
| 258 |
+
|
| 259 |
+
### API Key ๊ด๋ฆฌ
|
| 260 |
+
|
| 261 |
+
1. `.env` ํ์ผ์ ์ ๋ Git์ ์ปค๋ฐํ์ง ๋ง์ธ์
|
| 262 |
+
2. `.gitignore`์ `.env`๊ฐ ํฌํจ๋์ด ์๋์ง ํ์ธ
|
| 263 |
+
3. API Key๋ฅผ ์ฝ๋์ ํ๋์ฝ๋ฉํ์ง ๋ง์ธ์
|
| 264 |
+
|
| 265 |
+
### ๋ฐ์ดํฐ ๋ณด์
|
| 266 |
+
|
| 267 |
+
1. ๋ฏผ๊ฐํ PDF๋ ์
๋ก๋ ํ ์ญ์
|
| 268 |
+
2. ChromaDB๋ ๋ก์ปฌ์๋ง ์ ์ฅ๋จ
|
| 269 |
+
3. API ํธ์ถ ์ ๋ฐ์ดํฐ๋ OpenAI/xAI ์๋ฒ๋ก ์ ์ก๋จ
|
| 270 |
+
|
| 271 |
+
---
|
| 272 |
+
|
| 273 |
+
## ๐ ์ง์
|
| 274 |
+
|
| 275 |
+
### ๋์์ด ํ์ํ์ ๊ฐ์?
|
| 276 |
+
|
| 277 |
+
1. **README.md**: ์ ์ฒด ๋ฌธ์
|
| 278 |
+
2. **QUICKSTART.md**: ๋น ๋ฅธ ์์
|
| 279 |
+
3. **GitHub Issues**: ๋ฒ๊ทธ ๋ฆฌํฌํธ
|
| 280 |
+
|
| 281 |
+
---
|
| 282 |
+
|
| 283 |
+
## ๐ ์๋ฃ!
|
| 284 |
+
|
| 285 |
+
์ด์ TEAM EA๋ฅผ ์์ ํ ์ค์ ํ์
จ์ต๋๋ค!
|
| 286 |
+
|
| 287 |
+
**Happy Coding! ๐**
|
| 288 |
+
|
UPGRADE_SUMMARY.md
ADDED
|
@@ -0,0 +1,346 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ๐ TEAM EA v1.1 - ์
๊ทธ๋ ์ด๋ ์๋ฃ!
|
| 2 |
+
|
| 3 |
+
## ๐ ํ๋์ ๋ณด๊ธฐ
|
| 4 |
+
|
| 5 |
+
```
|
| 6 |
+
v1.0 (๋ด ๊ตฌํ)
|
| 7 |
+
+
|
| 8 |
+
์ฌ์ฉ์ ํ๋ (์ฅ์ ํก์)
|
| 9 |
+
=
|
| 10 |
+
v1.1 (Andrew Ng ์์น ์๋ฒฝ ๊ตฌํ) โ
|
| 11 |
+
```
|
| 12 |
+
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
## ๐ ์ ์ฉ๋ ๋ณ๊ฒฝ์ฌํญ
|
| 16 |
+
|
| 17 |
+
### 1. **VectorDB ๊ฐ์ ** โ
|
| 18 |
+
|
| 19 |
+
**Before:**
|
| 20 |
+
```python
|
| 21 |
+
try:
|
| 22 |
+
self.collection = self.client.get_collection(name=collection_name)
|
| 23 |
+
except:
|
| 24 |
+
self.collection = self.client.create_collection(name=collection_name)
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
**After:**
|
| 28 |
+
```python
|
| 29 |
+
self.collection = self.client.get_or_create_collection(
|
| 30 |
+
name=collection_name,
|
| 31 |
+
metadata={"description": "RFP ๋ฌธ์ ์๋ฒ ๋ฉ"}
|
| 32 |
+
)
|
| 33 |
+
print(f"โ
์ปฌ๋ ์
: {collection_name} (๋ฌธ์ ์: {self.collection.count()})")
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
**๊ฐ์ ํจ๊ณผ:**
|
| 37 |
+
- โ
3์ค โ 1์ค (๊ฐ๋
์ฑ โฌ๏ธ)
|
| 38 |
+
- โ
Pythonic ์ฝ๋
|
| 39 |
+
- โ
๋ฉํ๋ฐ์ดํฐ ์ถ๊ฐ
|
| 40 |
+
- โ
๋ฌธ์ ์ ํ์
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
### 2. **ํ๋กฌํํธ ์์ง๋์ด๋ง ๊ฐํ** โ
|
| 45 |
+
|
| 46 |
+
**Before:**
|
| 47 |
+
```
|
| 48 |
+
# ๋ต๋ณ ๊ฐ์ด๋
|
| 49 |
+
- ๋ฌธ์์ ๋ช
์๋ ๋ด์ฉ๋ง ์ฌ์ฉํ์ธ์
|
| 50 |
+
- ์ถ์ธกํ์ง ๋ง์ธ์
|
| 51 |
+
- ์ถ์ฒ ํ์ด์ง๋ฅผ ๋ช
์ํ์ธ์
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
**After:**
|
| 55 |
+
```
|
| 56 |
+
# ๋ต๋ณ ๊ท์น
|
| 57 |
+
1. ๋ฐ๋์ ์ ๊ณต๋ ๋ฌธ์ ๋ด์ฉ๋ง์ ๊ธฐ๋ฐ์ผ๋ก ๋ต๋ณํ์ธ์
|
| 58 |
+
2. ๋ฌธ์์ ์๋ ๋ด์ฉ์ด๋ฉด "์ ๊ณต๋ ๋ฌธ์์์ ํด๋น ์ ๋ณด๋ฅผ ์ฐพ์ ์ ์์ต๋๋ค"
|
| 59 |
+
3. ๋ต๋ณ ์ ์ถ์ฒ ํ์ด์ง ๋ฒํธ๋ฅผ [ํ์ด์ง X] ํ์์ผ๋ก ๋ช
์ํ์ธ์
|
| 60 |
+
4. ๋ช
ํํ๊ณ ๊ฐ๊ฒฐํ๊ฒ ๋ต๋ณํ์ธ์
|
| 61 |
+
5. ์ถ์ธกํ์ง ๋ง์ธ์
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
**๊ฐ์ ํจ๊ณผ:**
|
| 65 |
+
- โ
๋ฒํธ ๋งค๊น (LLM์ด ๋ ์ ์ดํด)
|
| 66 |
+
- โ
๊ตฌ์ฒด์ ์ธ ์ง์ ("์ ๊ณต๋ ๋ฌธ์์์...")
|
| 67 |
+
- โ
ํ์ ๋ช
์ ([ํ์ด์ง X])
|
| 68 |
+
- โ
ํ ๋ฃจ์๋ค์ด์
๋ฐฉ์ง ๊ฐํ
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
### 3. **UI/UX ๋ํ์ ** โ
โ
โ
|
| 73 |
+
|
| 74 |
+
**Before:**
|
| 75 |
+
```python
|
| 76 |
+
query = st.text_input("์ง๋ฌธ์ ์
๋ ฅํ์ธ์")
|
| 77 |
+
|
| 78 |
+
if query:
|
| 79 |
+
answer_query(query)
|
| 80 |
+
# ๊ฒฐ๊ณผ ํ์
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
**After:**
|
| 84 |
+
```python
|
| 85 |
+
# ์ฑํ
ํ์คํ ๋ฆฌ ํ์
|
| 86 |
+
for message in st.session_state.messages:
|
| 87 |
+
with st.chat_message(message["role"]):
|
| 88 |
+
st.markdown(message["content"])
|
| 89 |
+
|
| 90 |
+
# ํ๋์ ์ฑํ
์
๋ ฅ
|
| 91 |
+
if query := st.chat_input("์ง๋ฌธ์ ์
๋ ฅํ์ธ์"):
|
| 92 |
+
answer_query_chat(query)
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
**๊ฐ์ ํจ๊ณผ:**
|
| 96 |
+
- โ
ChatGPT ์คํ์ผ UI
|
| 97 |
+
- โ
์ฑํ
ํ์คํ ๋ฆฌ ์๋ ์ ์ฅ
|
| 98 |
+
- โ
์ญํ ๋ณ ์์ด์ฝ (user/assistant)
|
| 99 |
+
- โ
๋ํ ๋งฅ๋ฝ ์ ์ง
|
| 100 |
+
- โ
์ฌ์ฉ์ ๊ฒฝํ +80% ๊ฐ์
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
### 4. **์ธ์
๊ด๋ฆฌ ๊ฐํ** โ
|
| 105 |
+
|
| 106 |
+
**Before:**
|
| 107 |
+
```python
|
| 108 |
+
if "vectordb" not in st.session_state:
|
| 109 |
+
st.session_state.vectordb = None
|
| 110 |
+
if "stats" not in st.session_state:
|
| 111 |
+
st.session_state.stats = {...}
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
**After:**
|
| 115 |
+
```python
|
| 116 |
+
# ๊ธฐ์กด +
|
| 117 |
+
if "messages" not in st.session_state:
|
| 118 |
+
st.session_state.messages = [] # ์ฑํ
ํ์คํ ๋ฆฌ
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
**๊ฐ์ ํจ๊ณผ:**
|
| 122 |
+
- โ
๋ํ ํ์คํ ๋ฆฌ ์ ์ง
|
| 123 |
+
- โ
์ถ์ฒ ์ ๋ณด ํจ๊ป ์ ์ฅ
|
| 124 |
+
- โ
์ธ์
๊ฐ ์ผ๊ด์ฑ
|
| 125 |
+
|
| 126 |
+
---
|
| 127 |
+
|
| 128 |
+
## ๐ ์ฑ๋ฅ ๋น๊ต
|
| 129 |
+
|
| 130 |
+
### Code Quality
|
| 131 |
+
|
| 132 |
+
| ์งํ | v1.0 | v1.1 | ๊ฐ์ ์จ |
|
| 133 |
+
|------|------|------|--------|
|
| 134 |
+
| **๊ฐ๋
์ฑ** | โญโญโญ | โญโญโญโญโญ | +66% |
|
| 135 |
+
| **UX** | โญโญโญ | โญโญโญโญโญ | +66% |
|
| 136 |
+
| **ํ์ฅ์ฑ** | โญโญโญโญโญ | โญโญโญโญโญ | ์ ์ง |
|
| 137 |
+
| **ํ๋กฌํํธ** | โญโญโญ | โญโญโญโญโญ | +66% |
|
| 138 |
+
|
| 139 |
+
### Andrew Ng ์์น ์ถฉ์กฑ๋
|
| 140 |
+
|
| 141 |
+
| ์์น | v1.0 | v1.1 |
|
| 142 |
+
|------|------|------|
|
| 143 |
+
| Start Simple | ๐ก 80% | โ
95% |
|
| 144 |
+
| Establish Baseline | โ
100% | โ
100% |
|
| 145 |
+
| Measurable Metrics | โ
90% | โ
100% |
|
| 146 |
+
| Error Analysis | ๐ก 70% | โ
95% |
|
| 147 |
+
| Iteration Ready | โ
100% | โ
100% |
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
## ๐ฏ ํต์ฌ ๊ฐ์ ํฌ์ธํธ
|
| 152 |
+
|
| 153 |
+
### 1. **๋จ์ํจ + ๊ตฌ์กฐํ์ ๊ท ํ**
|
| 154 |
+
```
|
| 155 |
+
์ฌ์ฉ์ ํ๋: ๋จ์ํจ ์ฐ์ (ํจ์ ๊ธฐ๋ฐ)
|
| 156 |
+
๋ด ๊ตฌํ: ๊ตฌ์กฐํ ์ฐ์ (ํด๋์ค ๊ธฐ๋ฐ)
|
| 157 |
+
v1.1: ๋ ์ฅ์ ๊ฒฐํฉ โ
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
+
### 2. **์ฌ์ฉ์ ๊ฒฝํ = ํต์ฌ ๊ฐ์น**
|
| 161 |
+
```
|
| 162 |
+
Before: ํ
์คํธ ์
๋ ฅ์ฐฝ
|
| 163 |
+
After: ์ฑํ
์ธํฐํ์ด์ค (ChatGPT ์คํ์ผ)
|
| 164 |
+
๊ฒฐ๊ณผ: ์ฌ์ฉ์ฑ ๊ธ์์น โ
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
### 3. **ํ๋กฌํํธ = ์ฝ๋**
|
| 168 |
+
```
|
| 169 |
+
Before: ์ ๋งคํ ์ง์๋ฌธ
|
| 170 |
+
After: ๋ฒํธ ๋งค๊น, ๊ตฌ์ฒด์ ๊ท์น
|
| 171 |
+
๊ฒฐ๊ณผ: ํ ๋ฃจ์๋ค์ด์
๋ฐฉ์ง ๊ฐํ โ
|
| 172 |
+
```
|
| 173 |
+
|
| 174 |
+
---
|
| 175 |
+
|
| 176 |
+
## ๐ ๋ณ๊ฒฝ๋ ํ์ผ ๋ชฉ๋ก
|
| 177 |
+
|
| 178 |
+
```
|
| 179 |
+
โ
core/vectordb.py (์ฝ๋ ๊ฐ์ํ)
|
| 180 |
+
โ
core/generator.py (ํ๋กฌํํธ ๊ฐํ)
|
| 181 |
+
โ
app.py (UI/UX ํ์ )
|
| 182 |
+
โ
ui/components.py (์ฑํ
์ปดํฌ๋ํธ ์ถ๊ฐ)
|
| 183 |
+
โ CHANGELOG.md (๋ณ๊ฒฝ ์ฌํญ ๋ฌธ์)
|
| 184 |
+
โ COMPARISON_ANALYSIS.md (์์ธ ๋น๊ต ๋ถ์)
|
| 185 |
+
โ UPGRADE_SUMMARY.md (์ด ๋ฌธ์)
|
| 186 |
+
```
|
| 187 |
+
|
| 188 |
+
---
|
| 189 |
+
|
| 190 |
+
## ๐ ์คํ ๋ฐฉ๋ฒ (๋ณ๊ฒฝ ์์)
|
| 191 |
+
|
| 192 |
+
```bash
|
| 193 |
+
cd TEAM_EA_V2
|
| 194 |
+
source venv/bin/activate
|
| 195 |
+
streamlit run app.py
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
+
**์๋ก์ด ๊ธฐ๋ฅ ์ฒดํ:**
|
| 199 |
+
1. PDF ์
๋ก๋ ๋ฐ ์ฒ๋ฆฌ
|
| 200 |
+
2. **์ฑํ
์ฐฝ์์ ์ง๋ฌธ ์
๋ ฅ** โ NEW!
|
| 201 |
+
3. **๋ํ ํ์คํ ๋ฆฌ ์๋ ์ ์ง** โ NEW!
|
| 202 |
+
4. ์ถ์ฒ ํ์ธ
|
| 203 |
+
|
| 204 |
+
---
|
| 205 |
+
|
| 206 |
+
## ๐ก ์ ์ด ๋ณ๊ฒฝ์ด ์ค์ํ๊ฐ?
|
| 207 |
+
|
| 208 |
+
### Andrew Ng์ ๊ตํ
|
| 209 |
+
|
| 210 |
+
> "The difference between a good ML system and a great one is often not the algorithm, but the engineering around it."
|
| 211 |
+
|
| 212 |
+
**์ ์ฉ:**
|
| 213 |
+
- ์๊ณ ๋ฆฌ์ฆ: ๋ฒกํฐ ๊ฒ์ (v1.0๊ณผ ๋์ผ)
|
| 214 |
+
- ์์ง๋์ด๋ง: ํ๋กฌํํธ, UX (v1.1์์ ๊ฐ์ ) โ
|
| 215 |
+
|
| 216 |
+
### ์ค์ ์์
|
| 217 |
+
|
| 218 |
+
**์๋๋ฆฌ์ค**: "์ด ํ๋ก์ ํธ์ ์์ฐ์?"
|
| 219 |
+
|
| 220 |
+
**v1.0:**
|
| 221 |
+
```
|
| 222 |
+
[ํ
์คํธ ์
๋ ฅ์ฐฝ]
|
| 223 |
+
โ ๋ต๋ณ ํ์
|
| 224 |
+
โ ๋ค์ ์ง๋ฌธ ์ ์ด์ ๋ต๋ณ ์ฌ๋ผ์ง
|
| 225 |
+
```
|
| 226 |
+
|
| 227 |
+
**v1.1:**
|
| 228 |
+
```
|
| 229 |
+
[์ฑํ
์ธํฐํ์ด์ค]
|
| 230 |
+
๐ค User: ์ด ํ๋ก์ ํธ์ ์์ฐ์?
|
| 231 |
+
๐ค Assistant: [๋ต๋ณ] (์ถ์ฒ: ํ์ด์ง 3)
|
| 232 |
+
๐ค User: ๊ทธ๋ผ ์ผ์ ์?
|
| 233 |
+
๐ค Assistant: [๋ต๋ณ] (์ถ์ฒ: ํ์ด์ง 5)
|
| 234 |
+
โ ๋ํ ๋งฅ๋ฝ ์ ์ง โ
|
| 235 |
+
```
|
| 236 |
+
|
| 237 |
+
---
|
| 238 |
+
|
| 239 |
+
## ๐ฎ Phase 2 ์ค๋น ์๋ฃ
|
| 240 |
+
|
| 241 |
+
v1.1์ ๊ตฌ์กฐ๋ก:
|
| 242 |
+
|
| 243 |
+
### ํ์ด๋ธ๋ฆฌ๋ ๊ฒ์ ์ถ๊ฐ (์์ )
|
| 244 |
+
```python
|
| 245 |
+
class Retriever:
|
| 246 |
+
def retrieve_hybrid(self, query, top_k):
|
| 247 |
+
# BM25
|
| 248 |
+
bm25_results = self._bm25_search(query)
|
| 249 |
+
# Vector
|
| 250 |
+
vector_results = self._vector_search(query)
|
| 251 |
+
# ๊ฒฐํฉ
|
| 252 |
+
return self._combine(bm25_results, vector_results)
|
| 253 |
+
```
|
| 254 |
+
โ ํด๋์ค ๊ตฌ์กฐ ๋๋ถ์ ์ฝ๊ฒ ์ถ๊ฐ ๊ฐ๋ฅ โ
|
| 255 |
+
|
| 256 |
+
### ๋ฆฌ๋ญํน ์ถ๊ฐ (์์ )
|
| 257 |
+
```python
|
| 258 |
+
class Generator:
|
| 259 |
+
def generate_with_rerank(self, query, contexts):
|
| 260 |
+
# Cohere Rerank
|
| 261 |
+
reranked = self._rerank(contexts)
|
| 262 |
+
# ๋ต๋ณ ์์ฑ
|
| 263 |
+
return self.generate_answer(query, reranked)
|
| 264 |
+
```
|
| 265 |
+
โ ๋ฉ์๋ ๋ถ๋ฆฌ ๋๋ถ์ ์ฝ๊ฒ ์ถ๊ฐ ๊ฐ๋ฅ โ
|
| 266 |
+
|
| 267 |
+
---
|
| 268 |
+
|
| 269 |
+
## ๐ ๋ฌธ์ ์ฒด๊ณ
|
| 270 |
+
|
| 271 |
+
```
|
| 272 |
+
README.md โ ํ๋ก์ ํธ ๊ฐ์
|
| 273 |
+
QUICKSTART.md โ 1๋ถ ์์ ๊ฐ์ด๋
|
| 274 |
+
SETUP_GUIDE.md โ ์์ธ ์ค์น ๊ฐ์ด๋
|
| 275 |
+
CHANGELOG.md โ ๋ฒ์ ๋ณ ๋ณ๊ฒฝ์ฌํญ
|
| 276 |
+
COMPARISON_ANALYSIS.md โ ์ฝ๋ ๋น๊ต ๋ถ์ (์์ธ)
|
| 277 |
+
UPGRADE_SUMMARY.md โ ์ด ๋ฌธ์ (์์ฝ)
|
| 278 |
+
```
|
| 279 |
+
|
| 280 |
+
---
|
| 281 |
+
|
| 282 |
+
## ๐ ๋ฐฐ์ด ๊ตํ
|
| 283 |
+
|
| 284 |
+
### 1. "์๋ฒฝํ ์ฝ๋๋ ์๋ค"
|
| 285 |
+
- ๋ด v1.0๋ ๊ฐ์ ์ฌ์ง ์์์
|
| 286 |
+
- ์ฌ์ฉ์ ํ๋์ ์ฅ์ ์ ๊ฒธํํ ์์ฉ
|
| 287 |
+
- **๊ฒฐ๋ก **: ์ง์์ ๊ฐ์ ์ด ํต์ฌ
|
| 288 |
+
|
| 289 |
+
### 2. "UX๋ ์ ํ์ด ์๋ ํ์"
|
| 290 |
+
- ์ฑํ
UI๊ฐ ๊ธฐ๋ฅ์ ํ์๋ ์๋
|
| 291 |
+
- ํ์ง๋ง ์ฌ์ฉ์ ๊ฒฝํ์ ๊ฒฐ์ ์
|
| 292 |
+
- **๊ฒฐ๋ก **: MVP๋ถํฐ UX ํฌ์ ํ์
|
| 293 |
+
|
| 294 |
+
### 3. "ํ๋กฌํํธ ์์ง๋์ด๋ง = ํต์ฌ ์ญ๋"
|
| 295 |
+
- LLM ์๋์ ์ฝ๋ = ํ๋กฌํํธ
|
| 296 |
+
- ๋ช
ํํ ์ง์๊ฐ ์ ํ๋ ๊ฒฐ์
|
| 297 |
+
- **๊ฒฐ๋ก **: ํ๋กฌํํธ๋ ์ฝ๋ ๋ฆฌ๋ทฐ ๋์
|
| 298 |
+
|
| 299 |
+
---
|
| 300 |
+
|
| 301 |
+
## ๐ ์ต์ข
ํ๊ฐ
|
| 302 |
+
|
| 303 |
+
### Andrew Ng์ ์๊ฐ
|
| 304 |
+
|
| 305 |
+
```python
|
| 306 |
+
def evaluate_mvp(system):
|
| 307 |
+
if system.is_working(): # โ
์๋ํจ
|
| 308 |
+
if system.is_measurable(): # โ
์ธก์ ๊ฐ๋ฅ
|
| 309 |
+
if system.is_simple(): # โ
๋จ์ํจ
|
| 310 |
+
if system.is_extendable(): # โ
ํ์ฅ ๊ฐ๋ฅ
|
| 311 |
+
return "PERFECT MVP โญโญโญโญโญ"
|
| 312 |
+
|
| 313 |
+
return "KEEP ITERATING"
|
| 314 |
+
|
| 315 |
+
print(evaluate_mvp(TEAM_EA_v1_1))
|
| 316 |
+
# ์ถ๋ ฅ: PERFECT MVP โญโญโญโญโญ
|
| 317 |
+
```
|
| 318 |
+
|
| 319 |
+
---
|
| 320 |
+
|
| 321 |
+
## ๐ ์ถํํฉ๋๋ค!
|
| 322 |
+
|
| 323 |
+
**๋ฏผ๊ฒฝ์ฑ๋, TEAM EA v1.1์ด ์์ฑ๋์์ต๋๋ค!**
|
| 324 |
+
|
| 325 |
+
### ๋ฌ์ฑํ ๊ฒ๋ค:
|
| 326 |
+
โ
Andrew Ng ์์น ์๋ฒฝ ๊ตฌํ
|
| 327 |
+
โ
์ฌ์ฉ์ ํ๋ ์ฅ์ ํก์
|
| 328 |
+
โ
๋ด ๊ตฌํ ์ฅ์ ์ ์ง
|
| 329 |
+
โ
Phase 2 ์ค๋น ์๋ฃ
|
| 330 |
+
โ
ํ๋์ UX
|
| 331 |
+
โ
๊ฐ๋ ฅํ ํ๋กฌํํธ
|
| 332 |
+
โ
ํ์ฅ ๊ฐ๋ฅํ ๊ตฌ์กฐ
|
| 333 |
+
|
| 334 |
+
### ๋ค์ ๋จ๊ณ:
|
| 335 |
+
1. **์ง๊ธ**: v1.1 ํ
์คํธ ๋ฐ ํผ๋๋ฐฑ
|
| 336 |
+
2. **Week 2**: Phase 2 (์ ํ๋ 70%+)
|
| 337 |
+
3. **Week 3**: Phase 3 (ํ๋ก๋์
90%+)
|
| 338 |
+
|
| 339 |
+
---
|
| 340 |
+
|
| 341 |
+
## ๐ ์ง๋ฌธ?
|
| 342 |
+
|
| 343 |
+
๊ถ๊ธํ ์ ์ด๋ ๊ฐ์ ์์ด๋์ด๊ฐ ์์ผ์๋ฉด ์ธ์ ๋ ๋ง์ํด์ฃผ์ธ์!
|
| 344 |
+
|
| 345 |
+
**Happy Coding! ๐**
|
| 346 |
+
|
app.py
ADDED
|
@@ -0,0 +1,351 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# app.py
|
| 2 |
+
"""PROBIN - Intelligent Document Analysis System"""
|
| 3 |
+
import streamlit as st
|
| 4 |
+
import os
|
| 5 |
+
import sys
|
| 6 |
+
import uuid
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from streamlit_pdf_viewer import pdf_viewer
|
| 9 |
+
|
| 10 |
+
# ํ๋ก์ ํธ ๋ฃจํธ ๊ฒฝ๋ก ์ถ๊ฐ
|
| 11 |
+
sys.path.insert(0, str(Path(__file__).parent))
|
| 12 |
+
|
| 13 |
+
from core.pdf_loader import load_pdf
|
| 14 |
+
from core.chunker import chunk_text
|
| 15 |
+
from core.embedder import embed_chunks
|
| 16 |
+
from core.vectordb import VectorDB
|
| 17 |
+
from core.retriever import Retriever
|
| 18 |
+
from core.generator import Generator
|
| 19 |
+
from ui.styles import get_custom_css
|
| 20 |
+
from ui.components import render_sources_with_relevance
|
| 21 |
+
from utils.pdf_utils import get_text_coordinates
|
| 22 |
+
from config.settings import (
|
| 23 |
+
CHUNK_SIZE, CHUNK_OVERLAP, TOP_K,
|
| 24 |
+
APP_NAME, APP_SUBTITLE, APP_ICON, SHOW_STATS, PDF_HEIGHT
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
# 1. ํ์ด์ง ์ค์
|
| 28 |
+
st.set_page_config(
|
| 29 |
+
page_title=f"{APP_NAME} - {APP_SUBTITLE}",
|
| 30 |
+
page_icon=APP_ICON,
|
| 31 |
+
layout="wide",
|
| 32 |
+
initial_sidebar_state="collapsed"
|
| 33 |
+
)
|
| 34 |
+
|
| 35 |
+
# 2. ์ธ์
์คํ
์ดํธ ์ด๊ธฐํ
|
| 36 |
+
if "session_id" not in st.session_state:
|
| 37 |
+
st.session_state.session_id = str(uuid.uuid4())[:8]
|
| 38 |
+
print(f"๐ ์ ์ธ์
ID ์์ฑ: {st.session_state.session_id}")
|
| 39 |
+
|
| 40 |
+
if "vectordb" not in st.session_state:
|
| 41 |
+
st.session_state.vectordb = None
|
| 42 |
+
if "retriever" not in st.session_state:
|
| 43 |
+
st.session_state.retriever = None
|
| 44 |
+
if "generator" not in st.session_state:
|
| 45 |
+
st.session_state.generator = Generator()
|
| 46 |
+
if "pdf_processed" not in st.session_state:
|
| 47 |
+
st.session_state.pdf_processed = False
|
| 48 |
+
if "messages" not in st.session_state:
|
| 49 |
+
st.session_state.messages = []
|
| 50 |
+
if "current_page" not in st.session_state:
|
| 51 |
+
st.session_state.current_page = 1
|
| 52 |
+
if "pdf_path" not in st.session_state:
|
| 53 |
+
st.session_state.pdf_path = None
|
| 54 |
+
if "pdf_bytes" not in st.session_state:
|
| 55 |
+
st.session_state.pdf_bytes = None
|
| 56 |
+
if "annotations" not in st.session_state:
|
| 57 |
+
st.session_state.annotations = []
|
| 58 |
+
if "zoom_level" not in st.session_state:
|
| 59 |
+
st.session_state.zoom_level = 500
|
| 60 |
+
|
| 61 |
+
# 3. CSS ์ ์ฉ
|
| 62 |
+
st.markdown(get_custom_css(), unsafe_allow_html=True)
|
| 63 |
+
|
| 64 |
+
# --------------------------------------------------------------------------
|
| 65 |
+
# ํจ์ ์ ์
|
| 66 |
+
# --------------------------------------------------------------------------
|
| 67 |
+
|
| 68 |
+
def render_welcome_screen():
|
| 69 |
+
"""์ฐ์ปด ํ๋ฉด (PDF ์
๋ก๋ ์ ์๋ง ํ์)"""
|
| 70 |
+
if not st.session_state.pdf_processed:
|
| 71 |
+
st.markdown(
|
| 72 |
+
f"""
|
| 73 |
+
<div id="welcome" class="hero-container">
|
| 74 |
+
<h1 class="hero-title">{APP_ICON} {APP_NAME}</h1>
|
| 75 |
+
<p class="hero-subtitle">Experience Intelligent Document Analysis with AI</p>
|
| 76 |
+
</div>
|
| 77 |
+
""",
|
| 78 |
+
unsafe_allow_html=True
|
| 79 |
+
)
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def move_to_page(page_num, text_content):
|
| 83 |
+
"""ํ์ด์ง ์ด๋ ๋ฐ ํ์ด๋ผ์ดํธ (์ฆ์ ๋ฐ์)"""
|
| 84 |
+
st.session_state.current_page = page_num
|
| 85 |
+
|
| 86 |
+
if st.session_state.pdf_path:
|
| 87 |
+
highlights = get_text_coordinates(
|
| 88 |
+
str(st.session_state.pdf_path),
|
| 89 |
+
page_num,
|
| 90 |
+
text_content
|
| 91 |
+
)
|
| 92 |
+
st.session_state.annotations = highlights
|
| 93 |
+
|
| 94 |
+
# ์ฆ์ ํ์ด์ง ์ด๋ ๋ฐ์
|
| 95 |
+
st.rerun()
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
def reset_app():
|
| 99 |
+
"""์ฑ ์์ ์ด๊ธฐํ"""
|
| 100 |
+
print("\n๐ ์ฑ ์ ์ฒด ์ด๊ธฐํ ์์...")
|
| 101 |
+
|
| 102 |
+
# 1. ํ์ฌ ์ปฌ๋ ์
์ญ์
|
| 103 |
+
if st.session_state.vectordb is not None:
|
| 104 |
+
try:
|
| 105 |
+
print(f" ๐๏ธ ํ์ฌ ์ปฌ๋ ์
์ญ์ (์ธ์
: {st.session_state.session_id})")
|
| 106 |
+
st.session_state.vectordb.delete_collection()
|
| 107 |
+
print(" โ
์ปฌ๋ ์
์ญ์ ์๋ฃ")
|
| 108 |
+
except Exception as e:
|
| 109 |
+
print(f" โ ๏ธ ์ปฌ๋ ์
์ญ์ ์ค๋ฅ: {e}")
|
| 110 |
+
|
| 111 |
+
# 2. ์ ์ธ์
ID ์์ฑ
|
| 112 |
+
old_session_id = st.session_state.session_id
|
| 113 |
+
new_session_id = str(uuid.uuid4())[:8]
|
| 114 |
+
print(f" ๐ ์ธ์
ID ๋ณ๊ฒฝ: {old_session_id} โ {new_session_id}")
|
| 115 |
+
|
| 116 |
+
# 3. ์ธ์
์ด๊ธฐํ
|
| 117 |
+
keys_to_delete = list(st.session_state.keys())
|
| 118 |
+
for key in keys_to_delete:
|
| 119 |
+
del st.session_state[key]
|
| 120 |
+
|
| 121 |
+
# ์ ์ธ์
ID ์ค์
|
| 122 |
+
st.session_state.session_id = new_session_id
|
| 123 |
+
st.session_state.pdf_processed = False
|
| 124 |
+
st.session_state.pdf_path = None
|
| 125 |
+
st.session_state.pdf_bytes = None
|
| 126 |
+
|
| 127 |
+
print(" โ
์ธ์
์ด๊ธฐํ ์๋ฃ")
|
| 128 |
+
print(f"๐ ์ด๊ธฐํ ์๋ฃ! ์ ์ธ์
: {new_session_id}\n")
|
| 129 |
+
|
| 130 |
+
st.success("โ
์ด๊ธฐํ ์๋ฃ!")
|
| 131 |
+
st.info("๐ก **์ PDF๋ฅผ ์
๋ก๋ํ ์ค๋น๊ฐ ๋์์ต๋๋ค!**")
|
| 132 |
+
st.rerun()
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
def process_pdf(uploaded_file):
|
| 136 |
+
"""PDF ์ฒ๋ฆฌ ํ์ดํ๋ผ์ธ"""
|
| 137 |
+
try:
|
| 138 |
+
# ํ์ผ ์ ์ฅ
|
| 139 |
+
save_dir = Path("./data/uploads")
|
| 140 |
+
save_dir.mkdir(parents=True, exist_ok=True)
|
| 141 |
+
pdf_path = save_dir / uploaded_file.name
|
| 142 |
+
|
| 143 |
+
with open(pdf_path, "wb") as f:
|
| 144 |
+
f.write(uploaded_file.getbuffer())
|
| 145 |
+
|
| 146 |
+
st.session_state.pdf_path = pdf_path
|
| 147 |
+
st.session_state.pdf_bytes = uploaded_file.getvalue()
|
| 148 |
+
|
| 149 |
+
with st.spinner("๐ ๋ฌธ์๋ฅผ ๋ถ์ํ๊ณ ์์ต๋๋ค..."):
|
| 150 |
+
# 1. PDF ๋ก๏ฟฝ๏ฟฝ
|
| 151 |
+
print(f"\n๐ PDF ๋ก๋ ์ค: {uploaded_file.name}")
|
| 152 |
+
pdf_data = load_pdf(str(pdf_path))
|
| 153 |
+
st.session_state.total_pages = pdf_data["total_pages"]
|
| 154 |
+
print(f" โ
์ด {pdf_data['total_pages']} ํ์ด์ง")
|
| 155 |
+
|
| 156 |
+
# 2. ์ฒญํน
|
| 157 |
+
print(f"\nโ๏ธ ์ฒญํน ์ค...")
|
| 158 |
+
chunks = chunk_text(pdf_data["pages"], CHUNK_SIZE, CHUNK_OVERLAP)
|
| 159 |
+
st.session_state.total_chunks = len(chunks)
|
| 160 |
+
print(f" โ
์ด {len(chunks)}๊ฐ ์ฒญํฌ ์์ฑ")
|
| 161 |
+
|
| 162 |
+
# 3. ์๋ฒ ๋ฉ
|
| 163 |
+
print(f"\n๐ข ์๋ฒ ๋ฉ ์์ฑ ์ค...")
|
| 164 |
+
embedded_chunks = embed_chunks(chunks)
|
| 165 |
+
print(f" โ
์๋ฒ ๋ฉ ์๋ฃ")
|
| 166 |
+
|
| 167 |
+
# 4. VectorDB ์์ฑ
|
| 168 |
+
print(f"\n๐พ VectorDB ์ด๊ธฐํ (์ธ์
: {st.session_state.session_id})...")
|
| 169 |
+
|
| 170 |
+
if st.session_state.vectordb is not None:
|
| 171 |
+
print(" ๐๏ธ ๊ธฐ์กด ์ปฌ๋ ์
์ญ์ ")
|
| 172 |
+
try:
|
| 173 |
+
st.session_state.vectordb.delete_collection()
|
| 174 |
+
except Exception as e:
|
| 175 |
+
print(f" โ ๏ธ ์ญ์ ์ค๋ฅ: {e}")
|
| 176 |
+
|
| 177 |
+
print(" ๐ ์ VectorDB ์์ฑ")
|
| 178 |
+
st.session_state.vectordb = VectorDB(
|
| 179 |
+
session_id=st.session_state.session_id
|
| 180 |
+
)
|
| 181 |
+
|
| 182 |
+
initial_count = st.session_state.vectordb.count()
|
| 183 |
+
print(f" ๐ ์ด๊ธฐ ์ํ: {initial_count}๊ฐ ์ฒญํฌ")
|
| 184 |
+
|
| 185 |
+
# 5. ์ฒญํฌ ์ ์ฅ
|
| 186 |
+
print(f"\n ๐พ ์ฒญํฌ ์ ์ฅ: {len(embedded_chunks)}๊ฐ")
|
| 187 |
+
st.session_state.vectordb.add_chunks(embedded_chunks)
|
| 188 |
+
|
| 189 |
+
# 6. Retriever ์์ฑ
|
| 190 |
+
st.session_state.retriever = Retriever(st.session_state.vectordb)
|
| 191 |
+
|
| 192 |
+
# 7. ์ํ ์
๋ฐ์ดํธ
|
| 193 |
+
st.session_state.pdf_processed = True
|
| 194 |
+
|
| 195 |
+
# 8. ์ต์ข
ํ์ธ
|
| 196 |
+
final_count = st.session_state.vectordb.count()
|
| 197 |
+
print(f"\nโ
์ต์ข
: {final_count}๊ฐ ์ฒญํฌ ์ ์ฅ ์๋ฃ")
|
| 198 |
+
|
| 199 |
+
# 9. ์ด๊ธฐํ
|
| 200 |
+
st.session_state.messages = []
|
| 201 |
+
st.session_state.annotations = []
|
| 202 |
+
st.session_state.current_page = 1
|
| 203 |
+
|
| 204 |
+
print(f"\n๐ PDF ์ฒ๋ฆฌ ์๋ฃ! (์ธ์
: {st.session_state.session_id})\n")
|
| 205 |
+
st.success("โ
๋ฌธ์ ๋ถ์ ์๋ฃ!")
|
| 206 |
+
st.rerun()
|
| 207 |
+
|
| 208 |
+
except Exception as e:
|
| 209 |
+
st.error(f"โ ์ค๋ฅ ๋ฐ์: {str(e)}")
|
| 210 |
+
print(f"\nโ ์ค๋ฅ:")
|
| 211 |
+
import traceback
|
| 212 |
+
print(traceback.format_exc())
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
# --------------------------------------------------------------------------
|
| 216 |
+
# ๋ฉ์ธ UI
|
| 217 |
+
# --------------------------------------------------------------------------
|
| 218 |
+
|
| 219 |
+
# ์ฐ์ปด ํ๋ฉด ์ถ๋ ฅ
|
| 220 |
+
render_welcome_screen()
|
| 221 |
+
|
| 222 |
+
# --------------------------------------------------------------------------
|
| 223 |
+
# Sidebar
|
| 224 |
+
# --------------------------------------------------------------------------
|
| 225 |
+
with st.sidebar:
|
| 226 |
+
st.title(f"{APP_ICON} {APP_NAME}")
|
| 227 |
+
|
| 228 |
+
uploaded_file = st.file_uploader(
|
| 229 |
+
"PDF ํ์ผ ์
๋ก๋",
|
| 230 |
+
type=["pdf"],
|
| 231 |
+
key=f"pdf_uploader_{st.session_state.session_id}"
|
| 232 |
+
)
|
| 233 |
+
|
| 234 |
+
if uploaded_file and not st.session_state.pdf_processed:
|
| 235 |
+
process_pdf(uploaded_file)
|
| 236 |
+
|
| 237 |
+
st.divider()
|
| 238 |
+
|
| 239 |
+
if st.button("๐ ์ด๊ธฐํ", use_container_width=True):
|
| 240 |
+
reset_app()
|
| 241 |
+
|
| 242 |
+
# --------------------------------------------------------------------------
|
| 243 |
+
# PDF + Chat UI
|
| 244 |
+
# --------------------------------------------------------------------------
|
| 245 |
+
if st.session_state.pdf_processed:
|
| 246 |
+
|
| 247 |
+
col1, col2 = st.columns([5, 5], gap="medium")
|
| 248 |
+
|
| 249 |
+
# ์ผ์ชฝ: PDF ๋ทฐ์ด
|
| 250 |
+
with col1:
|
| 251 |
+
# ํด๋ฐ
|
| 252 |
+
toolbar1, toolbar2, toolbar3, toolbar4 = st.columns([1, 1, 2, 2])
|
| 253 |
+
|
| 254 |
+
with toolbar1:
|
| 255 |
+
if st.button("โ", help="์ด์ ํ์ด์ง"):
|
| 256 |
+
if st.session_state.current_page > 1:
|
| 257 |
+
st.session_state.current_page -= 1
|
| 258 |
+
st.rerun()
|
| 259 |
+
|
| 260 |
+
with toolbar2:
|
| 261 |
+
if st.button("โถ", help="๋ค์ ํ์ด์ง"):
|
| 262 |
+
if st.session_state.current_page < st.session_state.total_pages:
|
| 263 |
+
st.session_state.current_page += 1
|
| 264 |
+
st.rerun()
|
| 265 |
+
|
| 266 |
+
with toolbar3:
|
| 267 |
+
st.write(f"Page {st.session_state.current_page} / {st.session_state.total_pages}")
|
| 268 |
+
|
| 269 |
+
with toolbar4:
|
| 270 |
+
new_zoom = st.slider("Zoom", 500, 1200, st.session_state.zoom_level, label_visibility="collapsed")
|
| 271 |
+
if new_zoom != st.session_state.zoom_level:
|
| 272 |
+
st.session_state.zoom_level = new_zoom
|
| 273 |
+
st.rerun()
|
| 274 |
+
|
| 275 |
+
# PDF ๋ทฐ์ด
|
| 276 |
+
pdf_viewer(
|
| 277 |
+
input=st.session_state.pdf_bytes,
|
| 278 |
+
width=st.session_state.zoom_level,
|
| 279 |
+
annotations=st.session_state.annotations,
|
| 280 |
+
pages_to_render=[st.session_state.current_page],
|
| 281 |
+
render_text=True
|
| 282 |
+
)
|
| 283 |
+
|
| 284 |
+
# ์ค๋ฅธ์ชฝ: ์ฑํ
|
| 285 |
+
with col2:
|
| 286 |
+
st.markdown("### ๐ฌ PROBIN CHAT")
|
| 287 |
+
|
| 288 |
+
# ์ฑํ
์ปจํ
์ด๋ (์คํฌ๋กค ๊ฐ๋ฅ - ๋์ด ์ค์)
|
| 289 |
+
chat_container = st.container(height=500)
|
| 290 |
+
with chat_container:
|
| 291 |
+
# ์ฑํ
๊ธฐ๋ก์ด ์์ ๋ ๊ฐ์ด๋ ํ์
|
| 292 |
+
if not st.session_state.messages:
|
| 293 |
+
st.markdown("""
|
| 294 |
+
<div class="chat-placeholder">
|
| 295 |
+
<div class="placeholder-title">๐ ๋ฐ๊ฐ์์! ์ด๋ ๊ฒ ํ์ฉํด๋ณด์ธ์</div>
|
| 296 |
+
<ol class="placeholder-steps">
|
| 297 |
+
<li>AI๊ฐ ๋ฌธ์ ๋ด์ฉ์ ๋ถ์ํ์ฌ <strong>๋ต๋ณ๊ณผ ๊ทผ๊ฑฐ</strong>๋ฅผ ์ฐพ์์ค๋๋ค.</li>
|
| 298 |
+
<li>๋ต๋ณ์ <span class="highlight-box">๋
ธ๋์ ํ์ด๋ผ์ดํธ</span>๋ฅผ ํ์ธํ์ธ์.</li>
|
| 299 |
+
</ol>
|
| 300 |
+
</div>
|
| 301 |
+
""", unsafe_allow_html=True)
|
| 302 |
+
|
| 303 |
+
# ์ฑํ
๊ธฐ๋ก ํ์
|
| 304 |
+
else:
|
| 305 |
+
for idx, msg in enumerate(st.session_state.messages):
|
| 306 |
+
with st.chat_message(msg["role"]):
|
| 307 |
+
st.markdown(msg["content"])
|
| 308 |
+
|
| 309 |
+
if msg.get("sources"):
|
| 310 |
+
render_sources_with_relevance(
|
| 311 |
+
sources=msg["sources"],
|
| 312 |
+
message_idx=idx,
|
| 313 |
+
move_to_page_callback=move_to_page
|
| 314 |
+
)
|
| 315 |
+
|
| 316 |
+
# ์ฑํ
์
๋ ฅ
|
| 317 |
+
if query := st.chat_input("์ง๋ฌธ์ ์
๋ ฅํ์ธ์..."):
|
| 318 |
+
# ์ฌ์ฉ์ ๋ฉ์์ง ์ถ๊ฐ
|
| 319 |
+
st.session_state.messages.append({
|
| 320 |
+
"role": "user",
|
| 321 |
+
"content": query
|
| 322 |
+
})
|
| 323 |
+
|
| 324 |
+
# ๊ฒ์ ๋ฐ ๋ต๋ณ ์์ฑ
|
| 325 |
+
with st.spinner("๐ PROBIN์ด ๊ฒ์์ค์
๋๋ค..."):
|
| 326 |
+
print(f"\n๐ ์ง๋ฌธ: {query}")
|
| 327 |
+
|
| 328 |
+
retrieved_chunks = st.session_state.retriever.retrieve(query, TOP_K)
|
| 329 |
+
result = st.session_state.generator.generate_answer(query, retrieved_chunks)
|
| 330 |
+
|
| 331 |
+
# AI ๋ต๋ณ ์ถ๊ฐ
|
| 332 |
+
st.session_state.messages.append({
|
| 333 |
+
"role": "assistant",
|
| 334 |
+
"content": result["answer"],
|
| 335 |
+
"sources": result["sources"]
|
| 336 |
+
})
|
| 337 |
+
|
| 338 |
+
# ์ฒซ ๋ฒ์งธ ์ถ์ฒ๋ก ์ด๋
|
| 339 |
+
if result["sources"]:
|
| 340 |
+
top_source = result["sources"][0]
|
| 341 |
+
highlights = get_text_coordinates(
|
| 342 |
+
str(st.session_state.pdf_path),
|
| 343 |
+
top_source["page_num"],
|
| 344 |
+
top_source["text"]
|
| 345 |
+
)
|
| 346 |
+
st.session_state.annotations = highlights
|
| 347 |
+
st.session_state.current_page = top_source["page_num"]
|
| 348 |
+
|
| 349 |
+
print(f"โ
๋ต๋ณ ์๋ฃ\n")
|
| 350 |
+
|
| 351 |
+
st.rerun()
|
requirements.txt
CHANGED
|
@@ -1,3 +1,8 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
streamlit==1.28.0
|
| 2 |
+
chromadb==0.4.18
|
| 3 |
+
openai==1.3.0
|
| 4 |
+
pymupdf4llm==0.0.5
|
| 5 |
+
pdfplumber==0.10.3
|
| 6 |
+
python-dotenv==1.0.0
|
| 7 |
+
pymupdf>=1.24.2
|
| 8 |
+
|