--- title: RFPilot # κ΄„ν˜Έ 제거 emoji: πŸ“„ colorFrom: blue colorTo: green sdk: docker # Docker μ‚¬μš© app_port: 7860 # Streamlit 포트 pinned: false license: mit --- # Codeit-AI-1team-LLM-project --- ## 챗봇 μ„œλΉ„μŠ€ μ‹œμ—° ![chatbot_final](https://github.com/user-attachments/assets/1b321abb-6ba1-4063-be97-300036d8047a) ## 벑터 DB λŒ€μ‹œλ³΄λ“œ μ˜μƒ(별도 μ„œλΉ„μŠ€ν™” 진행쀑) [접속 링크](https://vectordb-dashboard-dong.streamlit.app/) ![Vector_DB_v1](https://github.com/user-attachments/assets/1b12ecf9-a105-44c7-82a4-67744d82931b) # 1. ν”„λ‘œμ νŠΈ κ°œμš” - **B2G μž…μ°°μ§€μ› μ „λ¬Έ μ»¨μ„€νŒ… μŠ€νƒ€νŠΈμ—… – 'RFPilot'** - RFP λ¬Έμ„œλ₯Ό μš”μ•½ν•˜κ³ , μ‚¬μš©μž μ§ˆλ¬Έμ— μ‹€μ‹œκ°„μœΌλ‘œ μ‘λ‹΅ν•˜λŠ” 챗봇 μ‹œμŠ€ν…œ > **λ°°κ²½**: 맀일 수백 건의 κΈ°μ—… 및 μ •λΆ€ μ œμ•ˆμš”μ²­μ„œ(RFP)κ°€ κ²Œμ‹œλ˜λŠ”λ°, 각 μš”μ²­μ„œ λ‹Ή μˆ˜μ‹­ νŽ˜μ΄μ§€κ°€ λ„˜λŠ” 문건을 λͺ¨λ‘ κ²€ν† ν•˜λŠ” 것은 λΆˆκ°€λŠ₯ν•©λ‹ˆλ‹€. μ΄λŸ¬ν•œ 과정은 λΉ„νš¨μœ¨μ μ΄λ©°, μ€‘μš”ν•œ 정보λ₯Ό λΉ λ₯΄κ²Œ νŒŒμ•…ν•˜κΈ° μ–΄λ ΅μŠ΅λ‹ˆλ‹€. > > **λͺ©ν‘œ**: μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— μ‹€μ‹œκ°„μœΌλ‘œ μ‘λ‹΅ν•˜κ³ , κ΄€λ ¨ μ œμ•ˆμ„œλ₯Ό νƒμƒ‰ν•˜μ—¬ μš”μ•½ 정보λ₯Ό μ œκ³΅ν•˜λŠ” 챗봇을 κ°œλ°œν•˜μ—¬ μ»¨μ„€ν„΄νŠΈμ˜ 업무 νš¨μœ¨μ„ ν–₯μƒμ‹œν‚€κ³ μž ν•©λ‹ˆλ‹€. > > **κΈ°λŒ€ 효과**: RAG μ‹œμŠ€ν…œμ„ 톡해 μ€‘μš”ν•œ 정보λ₯Ό μ‹ μ†ν•˜κ²Œ μ œκ³΅ν•¨μœΌλ‘œμ¨, μ œμ•ˆμ„œ κ²€ν†  μ‹œκ°„μ„ λ‹¨μΆ•ν•˜κ³  μ»¨μ„€νŒ… 업무에 보닀 집쀑할 수 μžˆλŠ” ν™˜κ²½μ„ μ‘°μ„±ν•©λ‹ˆλ‹€. --- # 2. ν”„λ‘œμ νŠΈ μ‚¬μš© 방법 ## 🌐 μ›Ή μ„œλΉ„μŠ€ μ‚¬μš© (일반 μ‚¬μš©μž) **μž…μ°°λ©”μ΄νŠΈ 챗봇을 λ°”λ‘œ μ‚¬μš©ν•˜μ„Έμš”!** - πŸ€— **데λͺ¨ μ„œλΉ„μŠ€**: [HuggingFace Space](https://huggingface.co/spaces/Dongjin1203/RFP_summary_chatbot) - πŸ’‘ **μ‚¬μš©λ²•**: 1. μœ„ 링크 접속 2. 질문 μž…λ ₯ (예: "사업 기간이 12κ°œμ›” μ΄ν•˜μΈ 사업 μ°Ύμ•„μ€˜") 3. AIκ°€ RFP λ¬Έμ„œλ₯Ό λΆ„μ„ν•˜μ—¬ λ‹΅λ³€ 생성 - ⚑ **μ„±λŠ₯**: 평균 응닡 μ‹œκ°„ 1λΆ„ 이내 - πŸ”§ **μ‚¬μš© λͺ¨λΈ**: Llama-3-Open-Ko-8B (Q4_K_M, T4 GPU) --- ## πŸ’» 둜컬 개발 ν™˜κ²½ ꡬ좕 (개발자용) ### Prerequisites - Python 3.12.3 μ„€μΉ˜ - Poetry μ„€μΉ˜ - μ €μž₯μ†Œ 클둠 μ™„λ£Œ - 데이터셋 둜컬 μ €μž₯ ([λ‹€μš΄λ‘œλ“œ 링크](https://drive.google.com/file/d/187QnN2VeCfa-nyFMcv8ZtBJP0JxTaY4U/view?usp=drive_link)) - (선택) μ–‘μžν™” λͺ¨λΈ 파일(.gguf) μ €μž₯ (GPT API만 μ‚¬μš© μ‹œ λΆˆν•„μš”) ### ν™˜κ²½ μ„€μ • **1. .env 파일 생성** ```env # ν•„μˆ˜: OpenAI API (GPT λͺ¨λΈ μ‚¬μš©) OPENAI_API_KEY="sk-..." # 선택: μ‹€ν—˜ 좔적 (LangSmith, WandB) WANDB_API_KEY="..." LANGCHAIN_TRACING_V2=true LANGSMITH_API_KEY="..." LANGCHAIN_PROJECT="μž…μ°°λ©”μ΄νŠΈ" # 선택: GGUF 둜컬 λͺ¨λΈ μ‚¬μš© μ‹œ USE_MODEL_HUB=false GGUF_MODEL_PATH="./models/Llama-3-Open-Ko-8B.Q4_K_M.gguf" GGUF_N_CTX=4096 GGUF_N_GPU_LAYERS=35 ``` **2. κ°€μƒν™˜κ²½ μ„€μ • 및 μ˜μ‘΄μ„± μ„€μΉ˜** ```powershell # ν”„λ‘œμ νŠΈ ν΄λ”λ‘œ 이동 cd Codeit-AI-1team-LLM-project # Poetry κ°€μƒν™˜κ²½ μ„€μ • python -m poetry config virtualenvs.in-project true python -m poetry env use 3.12.3 python -m poetry install # κ°€μƒν™˜κ²½ ν™œμ„±ν™” python -m poetry shell ``` ### μ‹€ν–‰ 방법 **1. 데이터 μ „μ²˜λ¦¬ 및 벑터 DB ꡬ좕** ```powershell # 전체 νŒŒμ΄ν”„λΌμΈ μ‹€ν–‰ (μ „μ²˜λ¦¬ β†’ μž„λ² λ”© β†’ 벑터DB) python main.py --step all # λ˜λŠ” 단계별 μ‹€ν–‰ python main.py --step preprocess # μ „μ²˜λ¦¬λ§Œ python main.py --step embed # μž„λ² λ”©λ§Œ python main.py --step vectordb # 벑터DB만 ``` **2. 벑터 DB λŒ€μ‹œλ³΄λ“œ (별도 μ„œλΉ„μŠ€λ‘œ μ „ν™˜)** > πŸ“ **Note**: 벑터 DB λŒ€μ‹œλ³΄λ“œλŠ” 별도 μ €μž₯μ†Œλ‘œ 뢄리 > 접속 링크: [μž…μ°°λ©”μ΄νŠΈ-VectorDB-Dashboard](https://vectordb-dashboard-dong.streamlit.app/) (Chroma DB만 κ°€λŠ₯) **3. 챗봇 둜컬 ν…ŒμŠ€νŠΈ** ```powershell # Streamlit 기반 둜컬 챗봇 UI streamlit run src/visualization/chatbot_app.py ``` > ⚠️ **주의**: 둜컬 μ‹€ν–‰ μ‹œ GGUF λͺ¨λΈμ€ CPU ν™˜κ²½μ—μ„œ 느릴 수 μžˆμŠ΅λ‹ˆλ‹€. > λΉ λ₯Έ ν…ŒμŠ€νŠΈλ₯Ό μ›ν•˜μ‹œλ©΄ GPT API μ‚¬μš©μ„ ꢌμž₯ν•©λ‹ˆλ‹€. **4. μ‹€ν—˜ 및 평가** ```powershell # λŒ€ν™”ν˜• 메뉴 python src/evaluation/run_experiment.py # μ‹€ν—˜ μ‹€ν–‰ python src/evaluation/run_experiment.py --run # μ‹€ν—˜ κ²°κ³Ό 비ꡐ python src/evaluation/run_experiment.py --compare ``` # 3. ν”„λ‘œμ νŠΈ ꡬ쑰 --- ``` CODEIT-AI-1TEAM-LLM-PROJECT/ β”‚ β”œβ”€β”€ main.py # μ‹€ν–‰ μ§„μž…μ  β”œβ”€β”€ models/ # GGUF λͺ¨λΈ (선택) β”œβ”€β”€ chroma_db/ # 벑터 λ°μ΄ν„°λ² μ΄μŠ€ β”œβ”€β”€ data/ # λ¬Έμ„œ 및 벑터DB μ €μž₯ 폴더(RAG용 λ°μ΄ν„°λ§Œ 곡개) β”‚ β”œβ”€β”€ files/ # 원본 RFP λ¬Έμ„œ β”‚ └── rag_chunks_final.csv # μ „μ²˜λ¦¬ μ™„λ£Œλœ RAG 용 데이터 csv β”œβ”€β”€ notebooks/ # Hugging Face λͺ¨λΈ ν•™μŠ΅ μ½”λ“œ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ loader/ # λ¬Έμ„œ λ‘œλ”© 및 μ „μ²˜λ¦¬ β”‚ β”œβ”€β”€ router/ # 쿼리 λΌμš°νŒ… β”‚ β”œβ”€β”€ prompt/ # 동적 ν”„λ‘¬ν”„νŠΈ β”‚ β”œβ”€β”€ evaluation/ # LangSmith 평가 β”‚ β”œβ”€β”€ embedding/ # μž„λ² λ”©, 벑터DB 생성 β”‚ β”œβ”€β”€ retriever/ # λ¬Έμ„œ 검색기 β”‚ β”œβ”€β”€ generator/ # 응닡 생성기 β”‚ β”œβ”€β”€ visualization/ # UI ꡬ성 β”‚ └── utils/ # 곡톡 ν•¨μˆ˜ λͺ¨λ“ˆ └── README.md ``` - `main.py`: 전체 RAG νŒŒμ΄ν”„λΌμΈ μ‹€ν–‰μ˜ μ§„μž…μ μž…λ‹ˆλ‹€. - `data/`: 원문 λ¬Έμ„œ, μƒμ„±λœ 벑터DB 등이 μ €μž₯λ©λ‹ˆλ‹€. - `models/`: 둜컬 λͺ¨λΈ λ‘œλ“œμš© μ–‘μžν™” λͺ¨λΈ νŒŒμΌμ„ μ €μž₯ν•˜λŠ” κ³³μž…λ‹ˆλ‹€. - `src/loader`: PDF, HWP λ¬Έμ„œλ₯Ό ν…μŠ€νŠΈλ‘œ μΆ”μΆœν•˜κ³  의미 λ‹¨μœ„λ‘œ λΆ„ν• ν•©λ‹ˆλ‹€. - `src/router`: 쿼리 λΌμš°ν„°κ°€ μ§ˆλ¬Έμ„ λΆ„λ₯˜ν•˜μ—¬ μ„œλΉ„μŠ€λ₯Ό λ™μž‘ μ‹œν‚΅λ‹ˆλ‹€. - `src/prompt`: λͺ¨λΈ, 질문의 μ’…λ₯˜μ— 따라 각기 λ‹€λ₯Έ ν”„λ‘¬ν”„νŠΈλ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€. - `src/evaluation`: LangSmith 평가 ν™˜κ²½μ„ κ΄€λ¦¬ν•˜κ³  μ‹€ν—˜μ„ μ§„ν–‰ν•©λ‹ˆλ‹€. - `src/embedding`: ν…μŠ€νŠΈ μž„λ² λ”© 벑터λ₯Ό μƒμ„±ν•˜κ³  Chroma DBλ₯Ό κ΅¬μΆ•ν•©λ‹ˆλ‹€. - `src/retriever`: μ‚¬μš©μž μ§ˆλ¬Έμ— λŒ€ν•œ κ΄€λ ¨ λ¬Έμ„œλ₯Ό 벑터DBμ—μ„œ κ²€μƒ‰ν•©λ‹ˆλ‹€. - `src/generator`: κ²€μƒ‰λœ λ¬Έμ„œ 기반으둜 LLM이 응닡을 μƒμ„±ν•©λ‹ˆλ‹€. - `src/visualization`: Streamlit 기반 μ‚¬μš©μž μΈν„°νŽ˜μ΄μŠ€λ₯Ό κ΅¬μ„±ν•©λ‹ˆλ‹€. - `src/notebooks`: 둜컬 λͺ¨λΈμ„ Fine-Tuningν•˜μ—¬ μ–‘μžν™” νŒŒμΌμ„ μƒμ„±ν•©λ‹ˆλ‹€. - `src/utils`: μ„€μ • 확인, 경둜 μ„€μ • λ“± 곡톡 μœ ν‹Έλ¦¬ν‹° ν•¨μˆ˜λ“€μ„ ν¬ν•¨ν•©λ‹ˆλ‹€. # 4. νŒ€ μ†Œκ°œ > 기본에 μΆ©μ‹€μ‹€ν•˜λ©° μ‹€μ œ μ‚¬μš© κ°€λŠ₯ν•œ λͺ¨λΈμ„ λ§Œλ“€κΈ° μœ„ν•΄ λŠμž„μ—†μ΄ λ…Έλ ₯ν•˜λŠ” νŒ€μž…λ‹ˆλ‹€. ## πŸ‘¨πŸΌβ€πŸ’» 멀버 ꡬ성 |지동진|κΉ€μ§„μš±|μ΄μœ λ…Έ|λ°•μ§€μœ€| |-----|------|------|-------| |image|image|image|image| |![https://github.com/Dongjin-1203](https://img.shields.io/badge/github-181717?style=for-the-badge&logo=github&logoColor=white)|![https://github.com/Jinuk93](https://img.shields.io/badge/github-181717?style=for-the-badge&logo=github&logoColor=white)|![https://github.com/Leeyuno0419](https://img.shields.io/badge/github-181717?style=for-the-badge&logo=github&logoColor=white)|![https://github.com/krapnuyij](https://img.shields.io/badge/github-181717?style=for-the-badge&logo=github&logoColor=white)| |![hamubr1203@gmail.com](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white)|![rlawlsdnr430@gmail.com](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white)|![yoonolee0419@gmail.com](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white)|![jiyun1147@gmail.com](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white)| ## πŸ‘¨πŸΌβ€πŸ’» μ—­ν•  λΆ„λ‹΄ |지동진|κΉ€μ§„μš±|μ΄μœ λ…Έ|λ°•μ§€μœ€| |------|--------------|---------------|---------------| |PM/AI RAG Lead|Data Scientist|AI Engineer(API, Prompt)|AI Engineer(HuggingFace, Prompt)| |ν”„λ‘œμ νŠΈ 전체 기획 및 일정 관리. Retrieval System 섀계 및 κ΅¬ν˜„ (Retriever, Query Router). 둜컬 μž„λ² λ”© λͺ¨λΈ 개발 및 μ΅œμ ν™”. 동적 ν”„λ‘¬ν”„νŠΈ μ—”μ§€λ‹ˆμ–΄λ§ 및 적용. Streamlit 기반 λŒ€μ‹œλ³΄λ“œ 개발. 배포 ν™˜κ²½ ꡬ좕 및 μ‹œμŠ€ν…œ 톡합|데이터 νŒŒμ΄ν”„λΌμΈ 관리. λ¬Έμ„œ μ²­ν‚Ή μ „λž΅ κ³„νš 수립. λͺ¨λΈ Baseline 제곡. λͺ¨λΈ μ–‘μžν™”|- OpenAI λͺ¨λΈ 개발. Prompt Engineering λ‹΄λ‹Ή|- 둜컬 μž„λ² λ”© λͺ¨λΈ 개발. Prompt Engineering λ‹΄λ‹Ή| --- # 5. ν”„λ‘œμ νŠΈ νƒ€μž„λΌμΈ image --- # 6. μ„œλΉ„μŠ€ μ„€λͺ… ## μ„œλΉ„μŠ€ 아킀텍쳐 image --- # Further Information ## 개발 μŠ€νƒ 및 κ°œλ°œν™˜κ²½ - **μ–Έμ–΄**: image image - **ν”„λ ˆμž„μ›Œν¬**: image image - **라이브러리**: image image image image - **ν΄λΌμš°λ“œ μ„œλΉ„μŠ€**: image image - **도ꡬ**: image imageimage imageimage ## ν˜‘μ—… Tools image image image image image ## 기타 링크 ### ν”„λ‘œμ νŠΈ λ³΄κ³ μ„œ [ν”„λ‘œμ νŠΈ λ³΄κ³ μ„œ λ‹€μš΄](https://drive.google.com/file/d/1p3HHeugJmaiJP4AQpxZZEzAiAngtaHr8/view?usp=sharing) ### ν”„λ‘œμ νŠΈ ppt [ν”„λ‘œμ νŠΈ ppt λ‹€μš΄](https://drive.google.com/file/d/1QM88Ayztv5TNaxTXi0z1Xhy6ngHLLKUm/view?usp=sharing) ### 개인 ν˜‘μ—… 일지 - 지동진([개인 ν˜‘μ—…μΌμ§€](https://www.notion.so/2a2e8d29749a80fca2c7cdae7dfbf883?source=copy_link)) - κΉ€μ§„μš±([개인 ν˜‘μ—…μΌμ§€](https://www.notion.so/2a2e8d29749a807d8f7cf0afd33d3045?source=copy_link)) - μ΄μœ λ…Έ([개인 ν˜‘μ—…μΌμ§€](https://www.notion.so/2a2e8d29749a807abb25fbdb616f8b40?source=copy_link)) - λ°•μ§€μœ€([개인 ν˜‘μ—…μΌμ§€](https://www.notion.so/2a2e8d29749a806cac38f19f2357dad0?source=copy_link))