Dongjin1203 commited on
Commit
6a86523
Β·
verified Β·
1 Parent(s): 064e6e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +138 -18
README.md CHANGED
@@ -1,25 +1,145 @@
 
1
  ---
2
- license: mit
3
- base_model:
4
- - beomi/Llama-3-Open-Ko-8B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
- # RFP λ¬Έμ„œ μš”μ•½ μ±—λ΄‡μš© 둜컬 μž„λ² λ”© λͺ¨λΈ(Llama-3 Open Ko 8B (GGUF Q4_K_M))
 
 
8
 
9
- QLoRA둜 fine-tuned된 ν•œκ΅­μ–΄ RFP 뢄석 λͺ¨λΈ
10
 
11
- ## λͺ¨λΈ 정보
12
- - Base: beomi/Llama-3-Open-Ko-8B
13
- - Quantization: Q4_K_M
14
- - Size: 4.8GB
15
- - μš©λ„: μž…μ°° λ¬Έμ„œ μ§ˆμ˜μ‘λ‹΅
16
 
17
- ## μ‚¬μš©λ²•
18
- ```python
19
- from huggingface_hub import hf_hub_download
 
 
 
20
 
21
- model_path = hf_hub_download(
22
- repo_id="{username}/llama3-ko-8b-gguf",
23
- filename="Llama-3-Open-Ko-8B.Q4_K_M.gguf"
24
- )
25
- ```
 
1
+ # Codeit-AI-1team-LLM-project
2
  ---
3
+ ## 챗봇 μ„œλΉ„μŠ€ μ‹œμ—°
4
+ ![VectorDB Dashboard](asset/chatbot.gif)
5
+
6
+ ## 벑터 DB λŒ€μ‹œλ³΄λ“œ μ˜μƒ
7
+ ![VectorDB Dashboard](asset/vectorDB.gif)
8
+
9
+ # 1. ν”„λ‘œμ νŠΈ κ°œμš”
10
+ - **B2G μž…μ°°μ§€μ› μ „λ¬Έ μ»¨μ„€νŒ… μŠ€νƒ€νŠΈμ—… – 'RFPilot'**
11
+ - RFP λ¬Έμ„œλ₯Ό μš”μ•½ν•˜κ³ , μ‚¬μš©μž μ§ˆλ¬Έμ— μ‹€μ‹œκ°„μœΌλ‘œ μ‘λ‹΅ν•˜λŠ” 챗봇 μ‹œμŠ€ν…œ
12
+ > **λ°°κ²½**: 맀일 수백 건의 κΈ°μ—… 및 μ •λΆ€ μ œμ•ˆμš”μ²­μ„œ(RFP)κ°€ κ²Œμ‹œλ˜λŠ”λ°, 각 μš”μ²­μ„œ λ‹Ή μˆ˜μ‹­ νŽ˜μ΄μ§€κ°€ λ„˜λŠ” 문건을 λͺ¨λ‘ κ²€ν† ν•˜λŠ” 것은 λΆˆκ°€λŠ₯ν•©λ‹ˆλ‹€. μ΄λŸ¬ν•œ 과정은 λΉ„νš¨μœ¨μ μ΄λ©°, μ€‘μš”ν•œ 정보λ₯Ό λΉ λ₯΄κ²Œ νŒŒμ•…ν•˜κΈ° μ–΄λ ΅μŠ΅λ‹ˆλ‹€.
13
+ >
14
+ > **λͺ©ν‘œ**: μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— μ‹€μ‹œκ°„μœΌλ‘œ μ‘λ‹΅ν•˜κ³ , κ΄€λ ¨ μ œμ•ˆμ„œλ₯Ό νƒμƒ‰ν•˜μ—¬ μš”μ•½ 정보λ₯Ό μ œκ³΅ν•˜λŠ” 챗봇을 κ°œλ°œν•˜μ—¬ μ»¨μ„€ν„΄νŠΈμ˜ 업무 νš¨μœ¨μ„ ν–₯μƒμ‹œν‚€κ³ μž ν•©λ‹ˆλ‹€.
15
+ >
16
+ > **κΈ°λŒ€ 효과**: RAG μ‹œμŠ€ν…œμ„ 톡해 μ€‘μš”ν•œ 정보λ₯Ό μ‹ μ†ν•˜κ²Œ μ œκ³΅ν•¨μœΌλ‘œμ¨, μ œμ•ˆμ„œ κ²€ν†  μ‹œκ°„μ„ λ‹¨μΆ•ν•˜κ³  μ»¨μ„€νŒ… 업무에 보닀 집쀑할 수 μžˆλŠ” ν™˜κ²½μ„ μ‘°μ„±ν•©λ‹ˆλ‹€.
17
+ ---
18
+ # 2. μ„€μΉ˜ 및 μ‹€ν–‰(πŸͺŸ Windows)
19
+ ---
20
+ ### Prerequisites
21
+ - Python 3.12.3 μ„€μΉ˜λ¨
22
+ - Poetry μ„€μΉ˜λ¨
23
+ - μ €μž₯μ†Œ 클둠 μ™„λ£Œ
24
+ - 데이터셋 λ‘œμ»¬μ— μ €μž₯
25
+ - μ–‘μžν™”λœ λͺ¨λΈ 파일(.gguf) μ €μž₯
26
+ - .env 생성(apiν‚€ μž…λ ₯)
27
+
28
+ **env 파일 μ„€μ • 방법**
29
+ ```env
30
+ OPENAI_API_KEY = "OpenAI API ν‚€"
31
+ WANDB_API_KEY = "WanDB API ν‚€"
32
+ LANGCHAIN_TRACING_V2=true
33
+ LANGSMITH_API_KEY = "LangSmith API ν‚€"
34
+ LANGCHAIN_PROJECT = "LangSmith ν”„λ‘œμ νŠΈ 이름"
35
+ ```
36
+
37
+ **μ½”λ“œ μ‹€ν–‰**
38
+ ```powershell
39
+ # 1. ν”„λ‘œμ νŠΈ ν΄λ”λ‘œ 이동
40
+ cd Codeit-AI-1team-LLM-project
41
+
42
+ # 2. κ°€μƒν™˜κ²½ μ„€μ • 및 μ˜μ‘΄μ„± μ„€μΉ˜
43
+ python -m poetry config virtualenvs.in-project true
44
+ python -m poetry env use 3.12.3
45
+ python -m poetry install
46
+
47
+ # 3. κ°€μƒν™˜κ²½ ν™œμ„±ν™”
48
+ python -m poetry env activate
49
+
50
+ # 4. μ‹€ν–‰(μ „μ²˜λ¦¬~벑터DB ꡬ츑)
51
+ python -m poetry run python main.py --step all
52
+
53
+ # 5. 벑터 DB λŒ€μ‹œλ³΄λ“œ μ‹€ν–‰
54
+ python -m poetry run streamlit run src/visualization/streamlit_app.py
55
+
56
+ # 6. 챗봇 μ„œλΉ„μŠ€ μ‹€ν–‰
57
+ python -m poetry run streamlit run src/visualization/chatbot_app.py
58
+
59
+ # 7. LangSmith μ‹€ν—˜ μ‹€ν–‰(API 및 ν”„λ‘œμ νŠΈ 생성 ν•„μš”)
60
+ python -m poetry run python src/evaluation/run_experiment.py # λŒ€ν™”ν˜• 메뉴
61
+ python -m poetry run python src/evaluation/run_experiment.py --run # μ‹€ν—˜ μ‹€ν–‰
62
+ python -m poetry run python src/evaluation/run_experiment.py --compare # μ‹€ν—˜ 비ꡐ
63
+ ```
64
+
65
+ # 3. ν”„λ‘œμ νŠΈ ꡬ쑰
66
  ---
67
+ ```
68
+ CODEIT-AI-1TEAM-LLM-PROJECT/
69
+ β”‚
70
+ β”œβ”€β”€ main.py # μ‹€ν–‰ μ§„μž…μ 
71
+ β”œβ”€β”€ models/ # 둜컬 λͺ¨λΈ λ‘œλ“œμš© μ–‘μžν™” 파일 μ €μž₯ 폴더(λΉ„κ³΅κ°œ)
72
+ β”œβ”€β”€ data/ # λ¬Έμ„œ 및 벑터DB μ €μž₯ 폴더(λΉ„κ³΅κ°œ)
73
+ β”‚ β”œβ”€β”€ files/ # hwp, pdf λ¬Έμ„œ
74
+ β”‚ └── data_list.csv # RFP λ¬Έμ„œ 정보 csv
75
+ β”œβ”€β”€ src/
76
+ β”‚ β”œβ”€β”€ loader/ # λ¬Έμ„œ λ‘œλ”© 및 μ „μ²˜λ¦¬
77
+ β”‚ β”œβ”€β”€ evaluation/ # LangSmith 평가
78
+ β”‚ β”œβ”€β”€ embedding/ # μž„λ² λ”©, 벑터DB 생성
79
+ β”‚ β”œβ”€β”€ retriever/ # λ¬Έμ„œ 검색기
80
+ β”‚ β”œβ”€β”€ generator/ # 응닡 생성기
81
+ β”‚ β”œβ”€β”€ visualization/ # UI ꡬ성
82
+ β”‚ β”œβ”€β”€ notebooks/ # Hugging Face λͺ¨λΈ ν•™μŠ΅ μ½”λ“œ
83
+ β”‚ └── utils/ # 곡톡 ν•¨μˆ˜ λͺ¨λ“ˆ
84
+ └── README.md
85
+ ```
86
+ - `main.py`: 전체 RAG νŒŒμ΄ν”„λΌμΈ μ‹€ν–‰μ˜ μ§„μž…μ μž…λ‹ˆλ‹€.
87
+ - `data/`: 원문 λ¬Έμ„œ, μƒμ„±λœ 벑터DB 등이 μ €μž₯λ©λ‹ˆλ‹€.
88
+ - `models/`: 둜컬 λͺ¨λΈ λ‘œλ“œμš© μ–‘μžν™” λͺ¨λΈ νŒŒμΌμ„ μ €μž₯ν•˜λŠ” κ³³μž…λ‹ˆλ‹€.
89
+ - `src/loader`: PDF, HWP λ¬Έμ„œλ₯Ό ν…μŠ€νŠΈλ‘œ μΆ”μΆœν•˜κ³  의미 λ‹¨μœ„λ‘œ λΆ„ν• ν•©λ‹ˆλ‹€.
90
+ - `src/evaluation`: LangSmith 평가 ν™˜κ²½μ„ κ΄€λ¦¬ν•˜κ³  μ‹€ν—˜μ„ μ§„ν–‰ν•©λ‹ˆλ‹€.
91
+ - `src/embedding`: ν…μŠ€νŠΈ μž„λ² λ”© 벑터λ₯Ό μƒμ„±ν•˜κ³  Chroma DBλ₯Ό κ΅¬μΆ•ν•©λ‹ˆλ‹€.
92
+ - `src/retriever`: μ‚¬μš©μž μ§ˆλ¬Έμ— λŒ€ν•œ κ΄€λ ¨ λ¬Έμ„œλ₯Ό 벑터DBμ—μ„œ κ²€μƒ‰ν•©λ‹ˆλ‹€.
93
+ - `src/generator`: κ²€μƒ‰λœ λ¬Έμ„œ 기반으둜 LLM이 응닡을 μƒμ„±ν•©λ‹ˆλ‹€.
94
+ - `src/notebooks`: 둜컬 λͺ¨λΈμ„ Fine-Tuningν•˜μ—¬ μ–‘μžν™” νŒŒμΌμ„ μƒμ„±ν•©λ‹ˆλ‹€.
95
+ - `src/visualization`: Streamlit 기반 μ‚¬μš©μž μΈν„°νŽ˜μ΄μŠ€λ₯Ό κ΅¬μ„±ν•©λ‹ˆλ‹€.
96
+ - `src/utils`: μ„€μ • 확인, 경둜 μ„€μ • λ“± 곡톡 μœ ν‹Έλ¦¬ν‹° ν•¨μˆ˜λ“€μ„ ν¬ν•¨ν•©λ‹ˆλ‹€.
97
+
98
+ # 4. νŒ€ μ†Œκ°œ
99
+ > 기본에 μΆ©μ‹€μ‹€ν•˜λ©° μ‹€μ œ μ‚¬μš© κ°€λŠ₯ν•œ λͺ¨λΈμ„ λ§Œλ“€κΈ° μœ„ν•΄ λŠμž„μ—†μ΄ λ…Έλ ₯ν•˜λŠ” νŒ€μž…λ‹ˆλ‹€.
100
+
101
+ ## πŸ‘¨πŸΌβ€πŸ’» 멀버 ꡬ성
102
+ |지동진|κΉ€μ§„μš±|μ΄μœ λ…Έ|λ°•μ§€μœ€|
103
+ |-----|------|------|-------|
104
+ |<img width="100" height="100" alt="image" src="https://github.com/user-attachments/assets/b9f1a52f-4304-496d-a19c-2d6b4775a5c3" />|<img width="100" height="100" alt="image" src="https://avatars.githubusercontent.com/u/80089860?v=4.png"/>|<img width="100" height="100" alt="image" src="https://github.com/user-attachments/assets/4e635630-f00c-4026-bb1d-c73ec05f37c8" />|<img width="100" height="100" alt="image" src="https://github.com/user-attachments/assets/088a073c-cf1c-40a1-97fb-1d2c1f1b8794" />|
105
+ |![https://github.com/Dongjin-1203](https://img.shields.io/badge/github-181717?style=for-the-badge&logo=github&logoColor=white)|![https://github.com/Jinuk93](https://img.shields.io/badge/github-181717?style=for-the-badge&logo=github&logoColor=white)|![https://github.com/Leeyuno0419](https://img.shields.io/badge/github-181717?style=for-the-badge&logo=github&logoColor=white)|![https://github.com/krapnuyij](https://img.shields.io/badge/github-181717?style=for-the-badge&logo=github&logoColor=white)|
106
+ |![hamubr1203@gmail.com](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white)|![rlawlsdnr430@gmail.com](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white)|![yoonolee0419@gmail.com](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white)|![jiyun1147@gmail.com](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white)|
107
+
108
+ ## πŸ‘¨πŸΌβ€πŸ’» μ—­ν•  λΆ„λ‹΄
109
+ |지동진|κΉ€μ§„μš±|μ΄μœ λ…Έ|λ°•μ§€μœ€|
110
+ |------|--------------|---------------|---------------|
111
+ |PM/AI Enginner(Rettriever, Pre-trained, PEFT)|Data Scientist|AI Engineer(API, Prompt)|AI Engineer(HuggingFace, PEFT)|
112
+ |ν”„λ‘œμ νŠΈ 총괄. νŒ€ 회의 μ§„ν–‰. νŒ€ ν˜μ—… ν™˜κ²½ 관리. RAG 개발. λŒ€μ‹œλ³΄λ“œ 개발, PEFT λ‹΄λ‹Ή|ν•™μŠ΅ 데이터 ꡬ성. 데이터 μ „μ²˜λ¦¬ νŒŒμ΄ν”„λΌμΈ μž‘μ„±. κ°œλ°œκ°„ ν•„μš”ν•œ μΈμ‚¬μ΄νŠΈ λ„μΆœ 및 정보 μˆ˜μ§‘, 제곡|API λͺ¨λΈ 개발. ν”„λ‘¬ν”„νŠΈ μž‘μ„±. λͺ¨λΈ κ°œμ„ |HuggingFace λͺ¨λΈ ν•™μŠ΅, λͺ¨λΈ κ°œμ„ |
113
+ ---
114
+ # 5. ν”„λ‘œμ νŠΈ νƒ€μž„λΌμΈ
115
+ <img width="1580" height="807" alt="image" src="https://github.com/user-attachments/assets/57f6346a-663f-4ddd-a4b6-fafc2074ff71" />
116
+
117
+
118
+ ---
119
+ # 6. μ„œλΉ„μŠ€ μ„€λͺ…
120
+
121
+ ## μ„œλΉ„μŠ€ 아킀텍쳐
122
+ <img width="4208" height="2004" alt="image" src="https://github.com/user-attachments/assets/73a0db09-b858-4b69-b93b-a85f928225a9" />
123
+
124
+ ---
125
+ # Further Information
126
+
127
+ ## 개발 μŠ€νƒ 및 κ°œλ°œν™˜κ²½
128
+ - **μ–Έμ–΄**: <img width="67" height="18" alt="image" src="https://github.com/user-attachments/assets/e8035e3d-cadb-48f5-a4ac-3693faca01a7" /> <img width="67" height="18" alt="image" src="https://github.com/user-attachments/assets/0658c7ba-8039-4dc3-96a2-7c1308b2fafc" />
129
+
130
+ - **ν”„λ ˆμž„μ›Œν¬**: <img width="79" height="18" alt="image" src="https://github.com/user-attachments/assets/e8814092-7e1e-4b22-8d77-e04fd2b26ae6" /> <img width="79" height="18" alt="image" src="https://img.shields.io/badge/LangChain-ffffff?logo=langchain&logoColor=green" />
131
 
132
+ - **라이브러리**: <img width="71" height="18" alt="image" src="https://github.com/user-attachments/assets/a428cd24-c8a5-4296-b6da-22eb322afa49" /> <img width="69" height="18" alt="image" src="https://github.com/user-attachments/assets/4325f1d3-d8ba-4bec-a746-4cad4993e925" /> <img width="103" height="18" alt="image" src="https://github.com/user-attachments/assets/a2009044-329d-4dde-b0dc-701122ff8149" /> <img width="53" height="18" alt="image" src="https://github.com/user-attachments/assets/f6225115-0b60-439e-8388-974a0365f8d6" />
133
+ - **ν΄λΌμš°λ“œ μ„œλΉ„μŠ€**: <img width="71" height="18" alt="image" src="https://img.shields.io/badge/Google%20Cloud-4285F4?&style=plastic&logo=Google%20Cloud&logoColor=white" />
134
+ - **도ꡬ**: <img width="65" height="18" alt="image" src="https://github.com/user-attachments/assets/52f296c1-c878-4285-abe6-74842522e793" /> <img width="89" height="18" alt="image" src="https://github.com/user-attachments/assets/4ac10441-0753-4e94-9237-1ea6dc2034a2" /><img width="63" height="18" alt="image" src="https://github.com/user-attachments/assets/fea30130-c47c-4fa7-b3cb-7531481cfb28" /> <img width="89" height="18" alt="image" src="https://img.shields.io/badge/google_drive-white?style=for-the-badge&logo=google%20drive&logoColor=white&color=%23EA4336" />
135
 
 
136
 
 
 
 
 
 
137
 
138
+ ## ν˜‘μ—… Tools
139
+ <img width="69" height="18" alt="image" src="https://github.com/user-attachments/assets/2bc2fa93-b01e-4051-9b31-ab83301594df" />
140
+ <img width="63" height="18" alt="image" src="https://github.com/user-attachments/assets/6c44ddad-80a4-4098-9727-6dae9a8fcb1c" />
141
+ <img width="65" height="18" alt="image" src="https://github.com/user-attachments/assets/a85b2d0f-8cdc-43e7-8e14-da11708a33a4" />
142
+ <img width="89" height="18" alt="image" src="https://github.com/user-attachments/assets/28d7f511-a4fe-4aa5-9184-2d3a94a97f29" />
143
+ <img width="89" height="18" alt="image" src="https://img.shields.io/badge/weightsandbiases-%23FFBE00?style=for-the-badge&logo=wandb-%23FFBE00&logoColor=%23FFBE00" />
144
 
145
+ ## 기타 링크