Update README.md
Browse files
README.md
CHANGED
|
@@ -1,25 +1,145 @@
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
QLoRAλ‘ fine-tunedλ νκ΅μ΄ RFP λΆμ λͺ¨λΈ
|
| 10 |
|
| 11 |
-
## λͺ¨λΈ μ 보
|
| 12 |
-
- Base: beomi/Llama-3-Open-Ko-8B
|
| 13 |
-
- Quantization: Q4_K_M
|
| 14 |
-
- Size: 4.8GB
|
| 15 |
-
- μ©λ: μ
μ°° λ¬Έμ μ§μμλ΅
|
| 16 |
|
| 17 |
-
##
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
| 22 |
-
repo_id="{username}/llama3-ko-8b-gguf",
|
| 23 |
-
filename="Llama-3-Open-Ko-8B.Q4_K_M.gguf"
|
| 24 |
-
)
|
| 25 |
-
```
|
|
|
|
| 1 |
+
# Codeit-AI-1team-LLM-project
|
| 2 |
---
|
| 3 |
+
## μ±λ΄ μλΉμ€ μμ°
|
| 4 |
+

|
| 5 |
+
|
| 6 |
+
## λ²‘ν° DB λμ보λ μμ
|
| 7 |
+

|
| 8 |
+
|
| 9 |
+
# 1. νλ‘μ νΈ κ°μ
|
| 10 |
+
- **B2G μ
μ°°μ§μ μ λ¬Έ 컨μ€ν
μ€ννΈμ
β 'RFPilot'**
|
| 11 |
+
- RFP λ¬Έμλ₯Ό μμ½νκ³ , μ¬μ©μ μ§λ¬Έμ μ€μκ°μΌλ‘ μλ΅νλ μ±λ΄ μμ€ν
|
| 12 |
+
> **λ°°κ²½**: λ§€μΌ μλ°± 건μ κΈ°μ
λ° μ λΆ μ μμμ²μ(RFP)κ° κ²μλλλ°, κ° μμ²μ λΉ μμ νμ΄μ§κ° λλ 문건μ λͺ¨λ κ²ν νλ κ²μ λΆκ°λ₯ν©λλ€. μ΄λ¬ν κ³Όμ μ λΉν¨μ¨μ μ΄λ©°, μ€μν μ 보λ₯Ό λΉ λ₯΄κ² νμ
νκΈ° μ΄λ ΅μ΅λλ€.
|
| 13 |
+
>
|
| 14 |
+
> **λͺ©ν**: μ¬μ©μμ μ§λ¬Έμ μ€μκ°μΌλ‘ μλ΅νκ³ , κ΄λ ¨ μ μμλ₯Ό νμνμ¬ μμ½ μ 보λ₯Ό μ 곡νλ μ±λ΄μ κ°λ°νμ¬ μ»¨μ€ν΄νΈμ μ
무 ν¨μ¨μ ν₯μμν€κ³ μ ν©λλ€.
|
| 15 |
+
>
|
| 16 |
+
> **κΈ°λ ν¨κ³Ό**: RAG μμ€ν
μ ν΅ν΄ μ€μν μ 보λ₯Ό μ μνκ² μ 곡ν¨μΌλ‘μ¨, μ μμ κ²ν μκ°μ λ¨μΆνκ³ μ»¨μ€ν
μ
무μ λ³΄λ€ μ§μ€ν μ μλ νκ²½μ μ‘°μ±ν©λλ€.
|
| 17 |
+
---
|
| 18 |
+
# 2. μ€μΉ λ° μ€ν(πͺ Windows)
|
| 19 |
+
---
|
| 20 |
+
### Prerequisites
|
| 21 |
+
- Python 3.12.3 μ€μΉλ¨
|
| 22 |
+
- Poetry μ€μΉλ¨
|
| 23 |
+
- μ μ₯μ ν΄λ‘ μλ£
|
| 24 |
+
- λ°μ΄ν°μ
λ‘컬μ μ μ₯
|
| 25 |
+
- μμνλ λͺ¨λΈ νμΌ(.gguf) μ μ₯
|
| 26 |
+
- .env μμ±(apiν€ μ
λ ₯)
|
| 27 |
+
|
| 28 |
+
**env νμΌ μ€μ λ°©λ²**
|
| 29 |
+
```env
|
| 30 |
+
OPENAI_API_KEY = "OpenAI API ν€"
|
| 31 |
+
WANDB_API_KEY = "WanDB API ν€"
|
| 32 |
+
LANGCHAIN_TRACING_V2=true
|
| 33 |
+
LANGSMITH_API_KEY = "LangSmith API ν€"
|
| 34 |
+
LANGCHAIN_PROJECT = "LangSmith νλ‘μ νΈ μ΄λ¦"
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
**μ½λ μ€ν**
|
| 38 |
+
```powershell
|
| 39 |
+
# 1. νλ‘μ νΈ ν΄λλ‘ μ΄λ
|
| 40 |
+
cd Codeit-AI-1team-LLM-project
|
| 41 |
+
|
| 42 |
+
# 2. κ°μνκ²½ μ€μ λ° μμ‘΄μ± μ€μΉ
|
| 43 |
+
python -m poetry config virtualenvs.in-project true
|
| 44 |
+
python -m poetry env use 3.12.3
|
| 45 |
+
python -m poetry install
|
| 46 |
+
|
| 47 |
+
# 3. κ°μνκ²½ νμ±ν
|
| 48 |
+
python -m poetry env activate
|
| 49 |
+
|
| 50 |
+
# 4. μ€ν(μ μ²λ¦¬~벑ν°DB ꡬ츑)
|
| 51 |
+
python -m poetry run python main.py --step all
|
| 52 |
+
|
| 53 |
+
# 5. λ²‘ν° DB λμ보λ μ€ν
|
| 54 |
+
python -m poetry run streamlit run src/visualization/streamlit_app.py
|
| 55 |
+
|
| 56 |
+
# 6. μ±λ΄ μλΉμ€ μ€ν
|
| 57 |
+
python -m poetry run streamlit run src/visualization/chatbot_app.py
|
| 58 |
+
|
| 59 |
+
# 7. LangSmith μ€ν μ€ν(API λ° νλ‘μ νΈ μμ± νμ)
|
| 60 |
+
python -m poetry run python src/evaluation/run_experiment.py # λνν λ©λ΄
|
| 61 |
+
python -m poetry run python src/evaluation/run_experiment.py --run # μ€ν μ€ν
|
| 62 |
+
python -m poetry run python src/evaluation/run_experiment.py --compare # μ€ν λΉκ΅
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
# 3. νλ‘μ νΈ κ΅¬μ‘°
|
| 66 |
---
|
| 67 |
+
```
|
| 68 |
+
CODEIT-AI-1TEAM-LLM-PROJECT/
|
| 69 |
+
β
|
| 70 |
+
βββ main.py # μ€ν μ§μ
μ
|
| 71 |
+
βββ models/ # λ‘컬 λͺ¨λΈ λ‘λμ© μμν νμΌ μ μ₯ ν΄λ(λΉκ³΅κ°)
|
| 72 |
+
βββ data/ # λ¬Έμ λ° λ²‘ν°DB μ μ₯ ν΄λ(λΉκ³΅κ°)
|
| 73 |
+
β βββ files/ # hwp, pdf λ¬Έμ
|
| 74 |
+
β βββ data_list.csv # RFP λ¬Έμ μ 보 csv
|
| 75 |
+
βββ src/
|
| 76 |
+
β βββ loader/ # λ¬Έμ λ‘λ© λ° μ μ²λ¦¬
|
| 77 |
+
β βββ evaluation/ # LangSmith νκ°
|
| 78 |
+
β βββ embedding/ # μλ² λ©, 벑ν°DB μμ±
|
| 79 |
+
β βββ retriever/ # λ¬Έμ κ²μκΈ°
|
| 80 |
+
β βββ generator/ # μλ΅ μμ±κΈ°
|
| 81 |
+
β βββ visualization/ # UI ꡬμ±
|
| 82 |
+
β βββ notebooks/ # Hugging Face λͺ¨λΈ νμ΅ μ½λ
|
| 83 |
+
β βββ utils/ # κ³΅ν΅ ν¨μ λͺ¨λ
|
| 84 |
+
βββ README.md
|
| 85 |
+
```
|
| 86 |
+
- `main.py`: μ 체 RAG νμ΄νλΌμΈ μ€νμ μ§μ
μ μ
λλ€.
|
| 87 |
+
- `data/`: μλ¬Έ λ¬Έμ, μμ±λ 벑ν°DB λ±μ΄ μ μ₯λ©λλ€.
|
| 88 |
+
- `models/`: λ‘컬 λͺ¨λΈ λ‘λμ© μμν λͺ¨λΈ νμΌμ μ μ₯νλ κ³³μ
λλ€.
|
| 89 |
+
- `src/loader`: PDF, HWP λ¬Έμλ₯Ό ν
μ€νΈλ‘ μΆμΆνκ³ μλ―Έ λ¨μλ‘ λΆν ν©λλ€.
|
| 90 |
+
- `src/evaluation`: LangSmith νκ° νκ²½μ κ΄λ¦¬νκ³ μ€νμ μ§νν©λλ€.
|
| 91 |
+
- `src/embedding`: ν
μ€νΈ μλ² λ© λ²‘ν°λ₯Ό μμ±νκ³ Chroma DBλ₯Ό ꡬμΆν©λλ€.
|
| 92 |
+
- `src/retriever`: μ¬μ©μ μ§λ¬Έμ λν κ΄λ ¨ λ¬Έμλ₯Ό 벑ν°DBμμ κ²μν©λλ€.
|
| 93 |
+
- `src/generator`: κ²μλ λ¬Έμ κΈ°λ°μΌλ‘ LLMμ΄ μλ΅μ μμ±ν©λλ€.
|
| 94 |
+
- `src/notebooks`: λ‘컬 λͺ¨λΈμ Fine-Tuningνμ¬ μμν νμΌμ μμ±ν©λλ€.
|
| 95 |
+
- `src/visualization`: Streamlit κΈ°λ° μ¬μ©μ μΈν°νμ΄μ€λ₯Ό ꡬμ±ν©λλ€.
|
| 96 |
+
- `src/utils`: μ€μ νμΈ, κ²½λ‘ μ€μ λ± κ³΅ν΅ μ νΈλ¦¬ν° ν¨μλ€μ ν¬ν¨ν©λλ€.
|
| 97 |
+
|
| 98 |
+
# 4. ν μκ°
|
| 99 |
+
> κΈ°λ³Έμ μΆ©μ€μ€νλ©° μ€μ μ¬μ© κ°λ₯ν λͺ¨λΈμ λ§λ€κΈ° μν΄ λμμμ΄ λ
Έλ ₯νλ νμ
λλ€.
|
| 100 |
+
|
| 101 |
+
## π¨πΌβπ» λ©€λ² κ΅¬μ±
|
| 102 |
+
|μ§λμ§|κΉμ§μ±|μ΄μ λ
Έ|λ°μ§μ€|
|
| 103 |
+
|-----|------|------|-------|
|
| 104 |
+
|<img width="100" height="100" alt="image" src="https://github.com/user-attachments/assets/b9f1a52f-4304-496d-a19c-2d6b4775a5c3" />|<img width="100" height="100" alt="image" src="https://avatars.githubusercontent.com/u/80089860?v=4.png"/>|<img width="100" height="100" alt="image" src="https://github.com/user-attachments/assets/4e635630-f00c-4026-bb1d-c73ec05f37c8" />|<img width="100" height="100" alt="image" src="https://github.com/user-attachments/assets/088a073c-cf1c-40a1-97fb-1d2c1f1b8794" />|
|
| 105 |
+
|||||
|
| 106 |
+
|||||
|
| 107 |
+
|
| 108 |
+
## π¨πΌβπ» μν λΆλ΄
|
| 109 |
+
|μ§λμ§|κΉμ§μ±|μ΄μ λ
Έ|λ°μ§μ€|
|
| 110 |
+
|------|--------------|---------------|---------------|
|
| 111 |
+
|PM/AI Enginner(Rettriever, Pre-trained, PEFT)|Data Scientist|AI Engineer(API, Prompt)|AI Engineer(HuggingFace, PEFT)|
|
| 112 |
+
|νλ‘μ νΈ μ΄κ΄. ν νμ μ§ν. ν νμ
νκ²½ κ΄λ¦¬. RAG κ°λ°. λμ보λ κ°λ°, PEFT λ΄λΉ|νμ΅ λ°μ΄ν° ꡬμ±. λ°μ΄ν° μ μ²λ¦¬ νμ΄νλΌμΈ μμ±. κ°λ°κ° νμν μΈμ¬μ΄νΈ λμΆ λ° μ 보 μμ§, μ 곡|API λͺ¨λΈ κ°λ°. ν둬ννΈ μμ±. λͺ¨λΈ κ°μ |HuggingFace λͺ¨λΈ νμ΅, λͺ¨λΈ κ°μ |
|
| 113 |
+
---
|
| 114 |
+
# 5. νλ‘μ νΈ νμλΌμΈ
|
| 115 |
+
<img width="1580" height="807" alt="image" src="https://github.com/user-attachments/assets/57f6346a-663f-4ddd-a4b6-fafc2074ff71" />
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
---
|
| 119 |
+
# 6. μλΉμ€ μ€λͺ
|
| 120 |
+
|
| 121 |
+
## μλΉμ€ μν€ν
μ³
|
| 122 |
+
<img width="4208" height="2004" alt="image" src="https://github.com/user-attachments/assets/73a0db09-b858-4b69-b93b-a85f928225a9" />
|
| 123 |
+
|
| 124 |
+
---
|
| 125 |
+
# Further Information
|
| 126 |
+
|
| 127 |
+
## κ°λ° μ€ν λ° κ°λ°νκ²½
|
| 128 |
+
- **μΈμ΄**: <img width="67" height="18" alt="image" src="https://github.com/user-attachments/assets/e8035e3d-cadb-48f5-a4ac-3693faca01a7" /> <img width="67" height="18" alt="image" src="https://github.com/user-attachments/assets/0658c7ba-8039-4dc3-96a2-7c1308b2fafc" />
|
| 129 |
+
|
| 130 |
+
- **νλ μμν¬**: <img width="79" height="18" alt="image" src="https://github.com/user-attachments/assets/e8814092-7e1e-4b22-8d77-e04fd2b26ae6" /> <img width="79" height="18" alt="image" src="https://img.shields.io/badge/LangChain-ffffff?logo=langchain&logoColor=green" />
|
| 131 |
|
| 132 |
+
- **λΌμ΄λΈλ¬λ¦¬**: <img width="71" height="18" alt="image" src="https://github.com/user-attachments/assets/a428cd24-c8a5-4296-b6da-22eb322afa49" /> <img width="69" height="18" alt="image" src="https://github.com/user-attachments/assets/4325f1d3-d8ba-4bec-a746-4cad4993e925" /> <img width="103" height="18" alt="image" src="https://github.com/user-attachments/assets/a2009044-329d-4dde-b0dc-701122ff8149" /> <img width="53" height="18" alt="image" src="https://github.com/user-attachments/assets/f6225115-0b60-439e-8388-974a0365f8d6" />
|
| 133 |
+
- **ν΄λΌμ°λ μλΉμ€**: <img width="71" height="18" alt="image" src="https://img.shields.io/badge/Google%20Cloud-4285F4?&style=plastic&logo=Google%20Cloud&logoColor=white" />
|
| 134 |
+
- **λꡬ**: <img width="65" height="18" alt="image" src="https://github.com/user-attachments/assets/52f296c1-c878-4285-abe6-74842522e793" /> <img width="89" height="18" alt="image" src="https://github.com/user-attachments/assets/4ac10441-0753-4e94-9237-1ea6dc2034a2" /><img width="63" height="18" alt="image" src="https://github.com/user-attachments/assets/fea30130-c47c-4fa7-b3cb-7531481cfb28" /> <img width="89" height="18" alt="image" src="https://img.shields.io/badge/google_drive-white?style=for-the-badge&logo=google%20drive&logoColor=white&color=%23EA4336" />
|
| 135 |
|
|
|
|
| 136 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 137 |
|
| 138 |
+
## νμ
Tools
|
| 139 |
+
<img width="69" height="18" alt="image" src="https://github.com/user-attachments/assets/2bc2fa93-b01e-4051-9b31-ab83301594df" />
|
| 140 |
+
<img width="63" height="18" alt="image" src="https://github.com/user-attachments/assets/6c44ddad-80a4-4098-9727-6dae9a8fcb1c" />
|
| 141 |
+
<img width="65" height="18" alt="image" src="https://github.com/user-attachments/assets/a85b2d0f-8cdc-43e7-8e14-da11708a33a4" />
|
| 142 |
+
<img width="89" height="18" alt="image" src="https://github.com/user-attachments/assets/28d7f511-a4fe-4aa5-9184-2d3a94a97f29" />
|
| 143 |
+
<img width="89" height="18" alt="image" src="https://img.shields.io/badge/weightsandbiases-%23FFBE00?style=for-the-badge&logo=wandb-%23FFBE00&logoColor=%23FFBE00" />
|
| 144 |
|
| 145 |
+
## κΈ°ν λ§ν¬
|
|
|
|
|
|
|
|
|
|
|
|