Codeit-AI-1team-LLM-project


챗봇 μ„œλΉ„μŠ€ μ‹œμ—°

VectorDB Dashboard

벑터 DB λŒ€μ‹œλ³΄λ“œ μ˜μƒ

VectorDB Dashboard

1. ν”„λ‘œμ νŠΈ κ°œμš”

  • B2G μž…μ°°μ§€μ› μ „λ¬Έ μ»¨μ„€νŒ… μŠ€νƒ€νŠΈμ—… – 'RFPilot'
  • RFP λ¬Έμ„œλ₯Ό μš”μ•½ν•˜κ³ , μ‚¬μš©μž μ§ˆλ¬Έμ— μ‹€μ‹œκ°„μœΌλ‘œ μ‘λ‹΅ν•˜λŠ” 챗봇 μ‹œμŠ€ν…œ

    λ°°κ²½: 맀일 수백 건의 κΈ°μ—… 및 μ •λΆ€ μ œμ•ˆμš”μ²­μ„œ(RFP)κ°€ κ²Œμ‹œλ˜λŠ”λ°, 각 μš”μ²­μ„œ λ‹Ή μˆ˜μ‹­ νŽ˜μ΄μ§€κ°€ λ„˜λŠ” 문건을 λͺ¨λ‘ κ²€ν† ν•˜λŠ” 것은 λΆˆκ°€λŠ₯ν•©λ‹ˆλ‹€. μ΄λŸ¬ν•œ 과정은 λΉ„νš¨μœ¨μ μ΄λ©°, μ€‘μš”ν•œ 정보λ₯Ό λΉ λ₯΄κ²Œ νŒŒμ•…ν•˜κΈ° μ–΄λ ΅μŠ΅λ‹ˆλ‹€.

    λͺ©ν‘œ: μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— μ‹€μ‹œκ°„μœΌλ‘œ μ‘λ‹΅ν•˜κ³ , κ΄€λ ¨ μ œμ•ˆμ„œλ₯Ό νƒμƒ‰ν•˜μ—¬ μš”μ•½ 정보λ₯Ό μ œκ³΅ν•˜λŠ” 챗봇을 κ°œλ°œν•˜μ—¬ μ»¨μ„€ν„΄νŠΈμ˜ 업무 νš¨μœ¨μ„ ν–₯μƒμ‹œν‚€κ³ μž ν•©λ‹ˆλ‹€.

    κΈ°λŒ€ 효과: RAG μ‹œμŠ€ν…œμ„ 톡해 μ€‘μš”ν•œ 정보λ₯Ό μ‹ μ†ν•˜κ²Œ μ œκ³΅ν•¨μœΌλ‘œμ¨, μ œμ•ˆμ„œ κ²€ν†  μ‹œκ°„μ„ λ‹¨μΆ•ν•˜κ³  μ»¨μ„€νŒ… 업무에 보닀 집쀑할 수 μžˆλŠ” ν™˜κ²½μ„ μ‘°μ„±ν•©λ‹ˆλ‹€.


2. μ„€μΉ˜ 및 μ‹€ν–‰(πŸͺŸ Windows)


Prerequisites

  • Python 3.12.3 μ„€μΉ˜λ¨
  • Poetry μ„€μΉ˜λ¨
  • μ €μž₯μ†Œ 클둠 μ™„λ£Œ
  • 데이터셋 λ‘œμ»¬μ— μ €μž₯
  • μ–‘μžν™”λœ λͺ¨λΈ 파일(.gguf) μ €μž₯
  • .env 생성(apiν‚€ μž…λ ₯)

env 파일 μ„€μ • 방법

OPENAI_API_KEY = "OpenAI API ν‚€"
WANDB_API_KEY = "WanDB API ν‚€"
LANGCHAIN_TRACING_V2=true
LANGSMITH_API_KEY = "LangSmith API ν‚€"
LANGCHAIN_PROJECT = "LangSmith ν”„λ‘œμ νŠΈ 이름"

μ½”λ“œ μ‹€ν–‰

# 1. ν”„λ‘œμ νŠΈ ν΄λ”λ‘œ 이동
cd Codeit-AI-1team-LLM-project

# 2. κ°€μƒν™˜κ²½ μ„€μ • 및 μ˜μ‘΄μ„± μ„€μΉ˜
python -m poetry config virtualenvs.in-project true
python -m poetry env use 3.12.3
python -m poetry install

# 3. κ°€μƒν™˜κ²½ ν™œμ„±ν™”
python -m poetry env activate

# 4. μ‹€ν–‰(μ „μ²˜λ¦¬~벑터DB ꡬ츑)
python -m poetry run python main.py --step all

# 5. 벑터 DB λŒ€μ‹œλ³΄λ“œ μ‹€ν–‰
python -m poetry run streamlit run src/visualization/streamlit_app.py

# 6. 챗봇 μ„œλΉ„μŠ€ μ‹€ν–‰
python -m poetry run streamlit run src/visualization/chatbot_app.py

# 7. LangSmith μ‹€ν—˜ μ‹€ν–‰(API 및 ν”„λ‘œμ νŠΈ 생성 ν•„μš”)
python -m poetry run python src/evaluation/run_experiment.py              # λŒ€ν™”ν˜• 메뉴
python -m poetry run python src/evaluation/run_experiment.py --run        # μ‹€ν—˜ μ‹€ν–‰
python -m poetry run python src/evaluation/run_experiment.py --compare    # μ‹€ν—˜ 비ꡐ

3. ν”„λ‘œμ νŠΈ ꡬ쑰


CODEIT-AI-1TEAM-LLM-PROJECT/
β”‚
β”œβ”€β”€ main.py                  # μ‹€ν–‰ μ§„μž…μ 
β”œβ”€β”€ models/                  # 둜컬 λͺ¨λΈ λ‘œλ“œμš© μ–‘μžν™” 파일 μ €μž₯ 폴더(λΉ„κ³΅κ°œ)
β”œβ”€β”€ data/                    # λ¬Έμ„œ 및 벑터DB μ €μž₯ 폴더(λΉ„κ³΅κ°œ)
β”‚   β”œβ”€β”€ files/               # hwp, pdf λ¬Έμ„œ
β”‚   └── data_list.csv        # RFP λ¬Έμ„œ 정보 csv
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ loader/              # λ¬Έμ„œ λ‘œλ”© 및 μ „μ²˜λ¦¬
β”‚   β”œβ”€β”€ evaluation/          # LangSmith 평가
β”‚   β”œβ”€β”€ embedding/           # μž„λ² λ”©, 벑터DB 생성
β”‚   β”œβ”€β”€ retriever/           # λ¬Έμ„œ 검색기
β”‚   β”œβ”€β”€ generator/           # 응닡 생성기
β”‚   β”œβ”€β”€ visualization/       # UI ꡬ성
β”‚   β”œβ”€β”€ notebooks/           # Hugging Face λͺ¨λΈ ν•™μŠ΅ μ½”λ“œ
β”‚   └── utils/               # 곡톡 ν•¨μˆ˜ λͺ¨λ“ˆ
└── README.md
  • main.py: 전체 RAG νŒŒμ΄ν”„λΌμΈ μ‹€ν–‰μ˜ μ§„μž…μ μž…λ‹ˆλ‹€.
  • data/: 원문 λ¬Έμ„œ, μƒμ„±λœ 벑터DB 등이 μ €μž₯λ©λ‹ˆλ‹€.
  • models/: 둜컬 λͺ¨λΈ λ‘œλ“œμš© μ–‘μžν™” λͺ¨λΈ νŒŒμΌμ„ μ €μž₯ν•˜λŠ” κ³³μž…λ‹ˆλ‹€.
  • src/loader: PDF, HWP λ¬Έμ„œλ₯Ό ν…μŠ€νŠΈλ‘œ μΆ”μΆœν•˜κ³  의미 λ‹¨μœ„λ‘œ λΆ„ν• ν•©λ‹ˆλ‹€.
  • src/evaluation: LangSmith 평가 ν™˜κ²½μ„ κ΄€λ¦¬ν•˜κ³  μ‹€ν—˜μ„ μ§„ν–‰ν•©λ‹ˆλ‹€.
  • src/embedding: ν…μŠ€νŠΈ μž„λ² λ”© 벑터λ₯Ό μƒμ„±ν•˜κ³  Chroma DBλ₯Ό κ΅¬μΆ•ν•©λ‹ˆλ‹€.
  • src/retriever: μ‚¬μš©μž μ§ˆλ¬Έμ— λŒ€ν•œ κ΄€λ ¨ λ¬Έμ„œλ₯Ό 벑터DBμ—μ„œ κ²€μƒ‰ν•©λ‹ˆλ‹€.
  • src/generator: κ²€μƒ‰λœ λ¬Έμ„œ 기반으둜 LLM이 응닡을 μƒμ„±ν•©λ‹ˆλ‹€.
  • src/notebooks: 둜컬 λͺ¨λΈμ„ Fine-Tuningν•˜μ—¬ μ–‘μžν™” νŒŒμΌμ„ μƒμ„±ν•©λ‹ˆλ‹€.
  • src/visualization: Streamlit 기반 μ‚¬μš©μž μΈν„°νŽ˜μ΄μŠ€λ₯Ό κ΅¬μ„±ν•©λ‹ˆλ‹€.
  • src/utils: μ„€μ • 확인, 경둜 μ„€μ • λ“± 곡톡 μœ ν‹Έλ¦¬ν‹° ν•¨μˆ˜λ“€μ„ ν¬ν•¨ν•©λ‹ˆλ‹€.

4. νŒ€ μ†Œκ°œ

기본에 μΆ©μ‹€μ‹€ν•˜λ©° μ‹€μ œ μ‚¬μš© κ°€λŠ₯ν•œ λͺ¨λΈμ„ λ§Œλ“€κΈ° μœ„ν•΄ λŠμž„μ—†μ΄ λ…Έλ ₯ν•˜λŠ” νŒ€μž…λ‹ˆλ‹€.

πŸ‘¨πŸΌβ€πŸ’» 멀버 ꡬ성

지동진 κΉ€μ§„μš± μ΄μœ λ…Έ λ°•μ§€μœ€
image image image image
https://github.com/Dongjin-1203 https://github.com/Jinuk93 https://github.com/Leeyuno0419 https://github.com/krapnuyij
hamubr1203@gmail.com rlawlsdnr430@gmail.com yoonolee0419@gmail.com jiyun1147@gmail.com

πŸ‘¨πŸΌβ€πŸ’» μ—­ν•  λΆ„λ‹΄

지동진 κΉ€μ§„μš± μ΄μœ λ…Έ λ°•μ§€μœ€
PM/AI Enginner(Rettriever, Pre-trained, PEFT) Data Scientist AI Engineer(API, Prompt) AI Engineer(HuggingFace, PEFT)
ν”„λ‘œμ νŠΈ 총괄. νŒ€ 회의 μ§„ν–‰. νŒ€ ν˜μ—… ν™˜κ²½ 관리. RAG 개발. λŒ€μ‹œλ³΄λ“œ 개발, PEFT λ‹΄λ‹Ή ν•™μŠ΅ 데이터 ꡬ성. 데이터 μ „μ²˜λ¦¬ νŒŒμ΄ν”„λΌμΈ μž‘μ„±. κ°œλ°œκ°„ ν•„μš”ν•œ μΈμ‚¬μ΄νŠΈ λ„μΆœ 및 정보 μˆ˜μ§‘, 제곡 API λͺ¨λΈ 개발. ν”„λ‘¬ν”„νŠΈ μž‘μ„±. λͺ¨λΈ κ°œμ„  HuggingFace λͺ¨λΈ ν•™μŠ΅, λͺ¨λΈ κ°œμ„ 

5. ν”„λ‘œμ νŠΈ νƒ€μž„λΌμΈ

image

6. μ„œλΉ„μŠ€ μ„€λͺ…

μ„œλΉ„μŠ€ 아킀텍쳐

image

Further Information

개발 μŠ€νƒ 및 κ°œλ°œν™˜κ²½

  • μ–Έμ–΄: image image

  • ν”„λ ˆμž„μ›Œν¬: image image

  • 라이브러리: image image image image

  • ν΄λΌμš°λ“œ μ„œλΉ„μŠ€: image

  • 도ꡬ: image imageimage image

ν˜‘μ—… Tools

image image image image image

기타 링크

Downloads last month
13
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using Dongjin1203/RFP_Documents_chatbot 2