Junhoee commited on
Commit
883c35f
ยท
verified ยท
1 Parent(s): 93d6144

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -41
README.md CHANGED
@@ -9,33 +9,25 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- # Megumin Chatbot Spec
13
-
14
- ## ๊ฐœ์š”
15
 
16
  ๋ฉ”๊ตฌ๋ฐ ํŽ˜๋ฅด์†Œ๋‚˜๋กœ ๋Œ€ํ™”ํ•˜๋Š” Gradio ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡์ž…๋‹ˆ๋‹ค.
17
- Google ADK Agent๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋‹ต๋ณ€ ์ „์— RAG๋กœ ์œ ์‚ฌ ์‚ฌ๋ก€๋ฅผ ์ฐพ์•„ ๋งํˆฌ์™€ ์„ค์ •์„ ์ฐธ๊ณ ํ•ฉ๋‹ˆ๋‹ค.
18
 
19
- ## ํ•ต์‹ฌ ๊ตฌ์„ฑ
20
 
21
  - LLM: `gemini-3.1-flash-lite-preview`
22
  - Agent: Google ADK `LlmAgent`
23
- - UI: Gradio
24
  - ๊ฒ€์ƒ‰: Gemini Embedding + FAISS
25
- - ์„ธ์…˜: `InMemorySessionService`
26
-
27
- ## ๋™์ž‘ ๋ฐฉ์‹
28
-
29
- 1. ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์ด ๋“ค์–ด์˜ค๋ฉด Agent๊ฐ€ RAG tool์„ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค.
30
- 2. ์งˆ๋ฌธ ์ž„๋ฒ ๋”ฉ์„ ๊ธฐ์ค€์œผ๋กœ FAISS์—์„œ ์œ ์‚ฌ ์‚ฌ๋ก€ top-3๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค.
31
- 3. ๊ฒ€์ƒ‰๋œ ๋‹ต๋ณ€ ์‚ฌ๋ก€์™€ ํ˜„์žฌ ์งˆ๋ฌธ์„ ํ•จ๊ป˜ ์ฐธ๊ณ ํ•ด ๋ฉ”๊ตฌ๋ฐ ํŽ˜๋ฅด์†Œ๋‚˜๋กœ ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.
32
- 4. ๋ฉ€ํ‹ฐํ„ด ๋Œ€ํ™”์—์„œ๋Š” ์ตœ๊ทผ 6ํ„ด์„ ์œ ์ง€ํ•˜๊ณ , ๊ทธ ์ด์ „ ๋‚ด์šฉ์€ ์งง์€ ์š”์•ฝ์œผ๋กœ ์••์ถ•ํ•ฉ๋‹ˆ๋‹ค.
33
 
34
- ## ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ
35
 
36
- - `megumin_qa_dataset.json`: ์›๋ณธ Q/A ๋ฐ์ดํ„ฐ์…‹
37
- - `megumin_questions.faiss`: ์งˆ๋ฌธ ์ž„๋ฒ ๋”ฉ ์ธ๋ฑ์Šค
38
- - `megumin_questions_meta.json`: ์ธ๋ฑ์Šค์™€ ์›๋ฌธ ๋ ˆ์ฝ”๋“œ ๋งคํ•‘ ์ •๋ณด
 
39
 
40
  ## ์‹คํ–‰
41
 
@@ -43,46 +35,55 @@ Google ADK Agent๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋‹ต๋ณ€ ์ „์— RAG๋กœ ์œ ์‚ฌ ์‚ฌ๋ก€๋ฅผ ์ฐพ์•„
43
  python app_gradio.py
44
  ```
45
 
46
- Spaces ์ง„์ž…์ :
47
 
48
  ```bash
49
  python app.py
50
  ```
51
 
52
- ## ํ•„์ˆ˜ ํ™˜๊ฒฝ๋ณ€์ˆ˜
53
 
54
- - `GOOGLE_API_KEY`: Gemini API ํ‚ค
55
- - `HF_TOKEN`: private dataset repo๋ฅผ ์ฝ์„ ๋•Œ ํ•„์š”
56
 
57
- ๊ถŒ์žฅ:
58
 
59
- ```env
60
- GOOGLE_API_KEY=your_gemini_api_key
61
- ```
 
 
 
 
62
 
63
- ## ์ธ๋ฑ์Šค ์ƒ์„ฑ
64
 
65
- ์›๋ณธ JSON์—์„œ FAISS ์ธ๋ฑ์Šค๋ฅผ ๋งŒ๋“ค๋ ค๋ฉด:
 
 
 
 
 
66
 
67
- ```bash
68
- python scripts/build_faiss_index.py
69
- ```
70
 
71
- ## ๋‚˜๋ฌด์œ„ํ‚ค QA ๋ณ€ํ™˜
72
 
73
- ์„ ๋ณ„ํ•œ ๋‚˜๋ฌด์œ„ํ‚ค ๋ฌธ์„œ๋ฅผ QA JSON์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋ ค๋ฉด:
 
 
 
74
 
75
  ```bash
76
- python scripts/crawl_namuwiki_to_qa.py --title ๋ฉ”๊ตฌ๋ฐ --title ์นด์ฆˆ๋งˆ
77
  ```
78
 
79
- ๋ณ€ํ™˜ ๊ทœ์น™:
80
 
81
- - `question`: ์งˆ๋ฌธํ˜•์ด ์•„๋‹ˆ๋ผ ๊ฒ€์ƒ‰์šฉ ์†Œ์ œ๋ชฉ ์š”์•ฝ
82
- - `answer`: ์•ฝ 200์ž ๋‚ด์™ธ์˜ ์ค‘๋ฆฝ ์š”์•ฝ
83
- - chunk overlap: 1~2๋ฌธ์žฅ
84
- - ํ‘œ/์ด๋ฏธ์ง€ ์ œ์™ธ
85
 
86
- ## ๋ฐฐํฌ ๋ฉ”๋ชจ
87
 
88
- Hugging Face Spaces์—์„œ๋Š” dataset repo์—์„œ JSON, FAISS, metadata๋ฅผ ๋Ÿฐํƒ€์ž„์— ๋‚ด๋ ค๋ฐ›์•„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
 
 
 
9
  pinned: false
10
  ---
11
 
12
+ # Megumin RAG Chat
 
 
13
 
14
  ๋ฉ”๊ตฌ๋ฐ ํŽ˜๋ฅด์†Œ๋‚˜๋กœ ๋Œ€ํ™”ํ•˜๋Š” Gradio ๊ธฐ๋ฐ˜ ์ฑ—๋ด‡์ž…๋‹ˆ๋‹ค.
15
+ Google ADK๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋ฉ”๊ตฌ๋ฐ ์Šคํƒ€์ผ ๋ฐ์ดํ„ฐ์™€ ๋‚˜๋ฌด์œ„ํ‚ค ๊ธฐ๋ฐ˜ ์„ค์ • ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ๊ฒ€์ƒ‰ํ•ด ๋‹ต๋ณ€ํ•ฉ๋‹ˆ๋‹ค.
16
 
17
+ ## ํ•ต์‹ฌ ์š”์•ฝ
18
 
19
  - LLM: `gemini-3.1-flash-lite-preview`
20
  - Agent: Google ADK `LlmAgent`
 
21
  - ๊ฒ€์ƒ‰: Gemini Embedding + FAISS
22
+ - UI: Gradio
23
+ - ๋ฐ์ดํ„ฐ: ์Šคํƒ€์ผ/ํŽ˜๋ฅด์†Œ๋‚˜์šฉ + ์‚ฌ์‹ค/์„ค์ •์šฉ ์ด์ค‘ RAG
 
 
 
 
 
 
24
 
25
+ ## ํ˜„์žฌ ํŠน์ง•
26
 
27
+ - ๋ฉ”๊ตฌ๋ฐ ํŽ˜๋ฅด์†Œ๋‚˜ ์œ ์ง€
28
+ - ์˜๋ฏธ ์žˆ๋Š” ์งˆ๋ฌธ๋งˆ๋‹ค RAG tool ํ˜ธ์ถœ
29
+ - ์Šคํƒ€์ผ ์‚ฌ๋ก€ top-3 + ์‚ฌ์‹ค ์‚ฌ๋ก€ top-3 ๋™์‹œ ์ฐธ๊ณ 
30
+ - ์ตœ๊ทผ 6ํ„ด ์œ ์ง€, ๊ทธ ์ด์ „์€ ์งง์€ ์š”์•ฝ์œผ๋กœ ์••์ถ•
31
 
32
  ## ์‹คํ–‰
33
 
 
35
  python app_gradio.py
36
  ```
37
 
38
+ Hugging Face Spaces ์ง„์ž…์ :
39
 
40
  ```bash
41
  python app.py
42
  ```
43
 
44
+ ## Hugging Face ๋ฐฐํฌ
45
 
46
+ Spaces์—์„œ๋Š” ์•ฑ ์ฝ”๋“œ์™€ ๋ฐ์ดํ„ฐ์…‹ repo๋ฅผ ๋ถ„๋ฆฌํ•ด์„œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
 
47
 
48
+ ์•ฑ repo์—๋Š” ๋ณดํ†ต ์•„๋ž˜๋งŒ ์˜ฌ๋ฆฌ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
49
 
50
+ - `app.py`
51
+ - `app_gradio.py`
52
+ - `megumin_agent/`
53
+ - `scripts/`
54
+ - `docs/`
55
+ - `requirements.txt`
56
+ - `README.md`
57
 
58
+ dataset repo์—๋Š” ์•„๋ž˜ ํŒŒ์ผ๋“ค์„ ์˜ฌ๋ฆฌ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
59
 
60
+ - `megumin_qa_dataset.json`
61
+ - `megumin_questions.faiss`
62
+ - `megumin_questions_meta.json`
63
+ - `namuwiki_qa.json`
64
+ - `namuwiki_questions.faiss`
65
+ - `namuwiki_questions_meta.json`
66
 
67
+ Spaces ๋Ÿฐํƒ€์ž„์€ dataset repo์—์„œ ์œ„ ํŒŒ์ผ๋“ค์„ ๋‚ด๋ ค๋ฐ›์•„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
 
 
68
 
69
+ ํ•„์ˆ˜ Secret:
70
 
71
+ - `GOOGLE_API_KEY`
72
+ - `HF_TOKEN` (private dataset repo ์‚ฌ์šฉ ์‹œ)
73
+
74
+ ## ์ธ๋ฑ์Šค ์ƒ์„ฑ
75
 
76
  ```bash
77
+ python scripts/build_faiss_index.py
78
  ```
79
 
80
+ ์ด ์Šคํฌ๋ฆฝํŠธ๋Š” ์•„๋ž˜ ๋‘ ์ธ๋ฑ์Šค๋ฅผ ํ•จ๊ป˜ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
81
 
82
+ - ์Šคํƒ€์ผ ๋ฐ์ดํ„ฐ ์ธ๋ฑ์Šค
83
+ - ๋‚˜๋ฌด์œ„ํ‚ค ๋ฐ์ดํ„ฐ ์ธ๋ฑ์Šค
 
 
84
 
85
+ ## ์ฃผ์š” ๋ฌธ์„œ
86
 
87
+ - [ADK ๊ฐœ์š”](docs/Google-ADK.md)
88
+ - [๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ช…์„ธ](docs/data-collection-spec.md)
89
+ - [Agent ๊ตฌ์กฐ ๋ช…์„ธ](docs/agent-architecture.md)