QQ / Task.md
Hyungseoky's picture
Rename CLAUDE.md to Task.md
79c6235 verified
|
Raw
History Blame Contribute Delete
3.68 kB
# QP Search System
๋ฐ˜๋„์ฒด ์ œ์กฐ ๊ณต์ •๋ฌธ์„œ(QP) ๊ฒ€์ƒ‰์„ ์œ„ํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ RAG ํŒŒ์ดํ”„๋ผ์ธ. LangGraph ๊ธฐ๋ฐ˜ StateGraph๋กœ ๊ตฌํ˜„๋˜๋ฉฐ, ๋ณต์žก๋„ ๊ธฐ๋ฐ˜ ์ „๋žต ๋ถ„๊ธฐ โ†’ ๋ฉ€ํ‹ฐ์ฟผ๋ฆฌ ์ƒ์„ฑ โ†’ ํ…์ŠคํŠธ/๋น„์ „ ๋ณ‘๋ ฌ ๊ฒ€์ƒ‰ โ†’ ๋™์  ๊ฐ€์ค‘์น˜ ์œตํ•ฉ โ†’ Answerability ํ‰๊ฐ€ โ†’ ์กฐ๊ฑด๋ถ€ ์žฌ์ฒ˜๋ฆฌ์˜ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋”ฐ๋ฅธ๋‹ค.
## ์ธํ”„๋ผ
| ์—ญํ•  | ์ฃผ์†Œ | ๋น„๊ณ  |
|------|------|------|
| Search API (ํ…์ŠคํŠธ) | `10.150.6.47:4275/search/text` | collection: `page` |
| Search API (์ด๋ฏธ์ง€/ColQwen3.5) | `10.150.6.47:4275/search/text/colqwen35` | collection: `colqwen35` |
| Search API (์ด๋ฏธ์ง€/Tomoro) | `10.150.6.47:4275/search/text/tomoro` | collection: `tomoro` |
| ๋‹จ์–ด์‚ฌ์ „ API | `10.150.6.47:4276/search` | ๋„๋ฉ”์ธ ์•ฝ์–ด/์šฉ์–ด ๊ฒ€์ƒ‰ |
| Qwen3-VL-235B | `10.150.6.160:8000` | ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ LLM |
๊ฒ€์ƒ‰ API ๊ณตํ†ต ํŒŒ๋ผ๋ฏธํ„ฐ: `query` (string), `top_k` (int). ํ•„ํ„ฐ ํ•„๋“œ(์„ ํƒ): `apply_region`, `apply_product`, `document_type`, `dept_name` โ€” `filters` ๊ฐ์ฒด๋กœ ์ „๋‹ฌ, AND ์กฐ๊ฑด.
> ๋ชจ๋“  ์„œ๋ฒ„๋Š” Air-gapped ํ์‡„๋ง ํ™˜๊ฒฝ์ด๋ฉฐ ์™ธ๋ถ€ ์ธํ„ฐ๋„ท ์ ‘๊ทผ ๋ถˆ๊ฐ€.
## ํ•ต์‹ฌ ํŒŒ์ผ
- `qp_rag_pipeline.py` โ€” LangGraph StateGraph ๋ฉ”์ธ ํŒŒ์ดํ”„๋ผ์ธ (797์ค„)
- Jupyter ๊ฒ€์ฆ ๋…ธํŠธ๋ถ (31์…€) โ€” ํŒŒ์ดํ”„๋ผ์ธ ๋‹จ๊ณ„๋ณ„ ํ…Œ์ŠคํŠธ์šฉ
## ๋„๋ฉ”์ธ ์ปจํ…์ŠคํŠธ
๋ฐ˜๋„์ฒด ๊ณต์ • ์•ฝ์–ด๋Š” ์ฟผ๋ฆฌ ์ดํ•ด์— ํ•ต์‹ฌ์ ์ด๋‹ค: DSP, DSG, EPI, LAP, CMP, STI. ์ด ์•ฝ์–ด๋“ค์ด ํฌํ•จ๋œ ์ฟผ๋ฆฌ๋ฅผ ์ฒ˜๋ฆฌํ•  ๋•Œ ๋‹จ์–ด์‚ฌ์ „ API(`4276/search`)๋ฅผ ๋ฐ˜๋“œ์‹œ ๋จผ์ € ํ˜ธ์ถœํ•  ๊ฒƒ.
## ์•„ํ‚คํ…์ฒ˜ ๊ทœ์น™
### ํŒŒ์ดํ”„๋ผ์ธ ํ๋ฆ„
์ฟผ๋ฆฌ ์ž…๋ ฅ โ†’ ์ฟผ๋ฆฌ ๋ถ„์„ โ†’ ์ฟผ๋ฆฌ ์ƒ์„ฑ โ†’ ๊ฒ€์ƒ‰(ํ…์ŠคํŠธ/๋น„์ „ ๋ณ‘๋ ฌ) โ†’ ๊ฒฐ๊ณผ ์œตํ•ฉ ๋ฐ ํ‰๊ฐ€ โ†’ (์žฌ์ฒ˜๋ฆฌ or ์ถœ๋ ฅ)
### 1. ์ฟผ๋ฆฌ ๋ถ„์„
- ๋ณต์žก๋„(High/Medium/Low)์™€ ์ฟผ๋ฆฌ ์œ ํ˜•(Text/Vision/Hybrid)์„ ๋ถ„๋ฅ˜
- ๋ณต์žก๋„์— ๋”ฐ๋ผ ์ฟผ๋ฆฌ ์ƒ์„ฑ ์ „๋žต์ด ๋‹ฌ๋ผ์ง
- ์ฟผ๋ฆฌ ์œ ํ˜•์— ๋”ฐ๋ผ ์œตํ•ฉ ๊ฐ€์ค‘์น˜ `alpha` ๊ฒฐ์ •: Text=0.8, Vision=0.4, Hybrid=0.6
### 2. ์ฟผ๋ฆฌ ์ƒ์„ฑ
- ๋ณต์žก๋„๋ณ„ ์ „๋žต: High โ†’ ์›๋ณธ+ํ™•์žฅ+์žฌํ•ด์„+Text-HyDE / Medium โ†’ ์›๋ณธ+ํ™•์žฅ+์žฌํ•ด์„ / Low โ†’ ์›๋ณธ+๊ฒฝ๋Ÿ‰ ์žฌํ•ด์„
- ์•ฝ์–ด ํฌํ•จ ์ฟผ๋ฆฌ๋Š” ๋‹จ์–ด์‚ฌ์ „ API(`4276`)๋กœ ์•ฝ์–ด ํ’€์–ด์“ฐ๊ธฐ ํ›„ ๋ฆฌ๋ผ์ดํŠธ
- ์ƒ์„ฑ๋œ `query_set`๊ณผ `alpha`๋ฅผ ๊ฒ€์ƒ‰ ๋ชจ๋“ˆ์— ์ „๋‹ฌ
### 3. ๊ฒ€์ƒ‰
- Track 1 (ํ…์ŠคํŠธ): BGE-M3 ํ•˜์ด๋ธŒ๋ฆฌ๋“œ โ†’ Top-50 โ†’ Cross-Encoder ๋ฆฌ๋žญํ‚น โ†’ Top-30
- Track 2 (๋น„์ „): VLM ๊ธฐ๋ฐ˜ VRAG โ†’ Top-30 (๋ฆฌ๋žญํ‚น ์ƒ๋žต)
- ๋‘ ํŠธ๋ž™์€ ๋ณ‘๋ ฌ ์‹คํ–‰
### 4. ๊ฒฐ๊ณผ ์œตํ•ฉ ๋ฐ ํ‰๊ฐ€
- ๋™์  ๊ฐ€์ค‘์น˜ ์„ ํ˜• ๊ฒฐํ•ฉ: `S_final = alpha * S_text + (1 - alpha) * S_vision`
- ์œตํ•ฉ ๊ฒฐ๊ณผ ์ƒ์œ„ 10๊ฐœ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด VLM์œผ๋กœ Answerability Score ์‚ฐ์ถœ
- `S_avg < threshold` ์ด๊ณ  `retry_count < 1`์ด๋ฉด ์žฌ์ฒ˜๋ฆฌ ๋ฐœ๋™
- ์žฌ์ฒ˜๋ฆฌ: Top-K ์ด๋ฏธ์ง€ VLM ๋ถ„์„ โ†’ Vision-HyDE ์ฟผ๋ฆฌ ์ƒ์„ฑ โ†’ ์ฟผ๋ฆฌ ์ƒ์„ฑ ๋ชจ๋“ˆ๋กœ ํšŒ๊ท€
- ์žฌ์ฒ˜๋ฆฌ๋Š” ์ตœ๋Œ€ 1ํšŒ๋กœ ์ œํ•œ โ€” reducer ๊ฒฐ๊ณผ ๋ˆ„์  ๋ฒ„๊ทธ ์ฃผ์˜, ๊ฐ iteration๋งˆ๋‹ค ์ดˆ๊ธฐํ™”ํ•  ๊ฒƒ
### ๋ชจ๋“ˆ ๊ฐ„ ๋ฐ์ดํ„ฐ ํ๋ฆ„
- ์ฟผ๋ฆฌ ๋ถ„์„ โ†’ ์ฟผ๋ฆฌ ์ƒ์„ฑ: `complexity`, `query_type`, `alpha`
- ์ฟผ๋ฆฌ ์ƒ์„ฑ โ†’ ๊ฒ€์ƒ‰: `query_set`, `alpha`
- ๊ฒ€์ƒ‰ โ†’ ์œตํ•ฉ/ํ‰๊ฐ€: `text_results`, `vision_results`, `alpha`
- ์œตํ•ฉ/ํ‰๊ฐ€ โ†’ ์ฟผ๋ฆฌ ์ƒ์„ฑ (์žฌ์ฒ˜๋ฆฌ): `enhanced_query`, `alpha`, `retry_count + 1`
- ์œตํ•ฉ/ํ‰๊ฐ€ โ†’ ์ถœ๋ ฅ: `fused_results`, `S_avg`, `retrieval_retried`
## ์ฝ”๋“œ ์Šคํƒ€์ผ
- Python 3.10+, type hint ํ•„์ˆ˜
- async/await ํŒจํ„ด ์‚ฌ์šฉ (FastAPI ์—ฐ๋™)
- ๋กœ๊น…์€ `loguru` ์‚ฌ์šฉ, print ๊ธˆ์ง€
- ํ•œ๊ธ€ ์ฃผ์„ ํ—ˆ์šฉ, docstring๋„ ํ•œ๊ธ€๋กœ ์ž‘์„ฑ
## ํ…Œ์ŠคํŠธ & ๊ฒ€์ฆ
- Milvus ์ฟผ๋ฆฌ ๊ฒฐ๊ณผ๋Š” `relevance_score >= 0.7` ๊ธฐ์ค€์œผ๋กœ ํ•„ํ„ฐ๋ง