dev-yuje commited on
Commit
3e23aae
ยท
1 Parent(s): 79ef842

feat: expand graph traversal with cross-article relation search and improve tool routing

Browse files
Files changed (1) hide show
  1. src/retrieval/finRetrieval.py +61 -9
src/retrieval/finRetrieval.py CHANGED
@@ -74,6 +74,19 @@ OPTIONAL MATCH (article)-[:MENTIONS]->(company:AICompany)
74
  OPTIONAL MATCH (company)-[:DEVELOPS]->(tech:AITechnology)
75
  OPTIONAL MATCH (company)-[:DEVELOPS]->(svc:AIService)
76
  OPTIONAL MATCH (article)-[:MENTIONS]->(field:AIField)
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  RETURN
78
  node.chunk AS chunk,
79
  article.title AS article_title,
@@ -82,7 +95,9 @@ RETURN
82
  collect(DISTINCT company.name) AS companies,
83
  collect(DISTINCT tech.name) AS technologies,
84
  collect(DISTINCT svc.name) AS services,
85
- collect(DISTINCT field.name) AS fields
 
 
86
  """
87
 
88
 
@@ -141,6 +156,31 @@ CYPHER QUERY:
141
  RETURN a.title AS title, a.url AS url, a.published_date AS published_date, c.chunk AS chunk
142
  ORDER BY a.published_date DESC
143
  LIMIT 3""",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
  ]
145
 
146
  # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
@@ -205,7 +245,9 @@ _prompt_template = CustomRagTemplate(
205
 
206
  - **์ด์Šˆ ์ „๊ฐœ**: [๊ตฌ์ฒด์ ์ธ ์ด์Šˆ ๋ฐœ์ƒ ๋ฐฐ๊ฒฝ ๋ฐ ์ง„ํ–‰ ๊ฒฝ๊ณผ]
207
 
208
- - **๊ธฐ์—… ๋™ํ–ฅ**: [๊ด€๋ จ ํ•ต์‹ฌ ๊ธฐ์—…๋“ค์˜ ์‹ค๋ฌผ ๋น„์ฆˆ๋‹ˆ์Šค ์›€์ง์ž„ ๋ฐ ๋Œ€์‘ ํ–‰๋ณด]
 
 
209
 
210
  - **์ธํ”„๋ผ/์‚ฌํšŒ์  ์š”์ธ**: [์ „๋ ฅ๋ง ๋ถ€์กฑ, ๋Œ€์ค‘์  ๋ถˆ์•ˆ๊ฐ, ํ•˜๋“œ์›จ์–ด์  ์ œ์•ฝ ์‚ฌํ•ญ ๋“ฑ ํ•ต์‹ฌ ์š”์ธ]
211
 
@@ -219,12 +261,13 @@ _prompt_template = CustomRagTemplate(
219
  - **์‹ค์ „ ์ž์†Œ์„œ/๋ฉด์ ‘ ํ™œ์šฉ Tip**: [์ง€์›๋™๊ธฐ๋‚˜ ์—ญ๋Ÿ‰ ๊ธฐ์ˆ ์„œ ์ž‘์„ฑ ์‹œ ๋ณธ์ธ์˜ ์—ญ๋Ÿ‰๊ณผ ์–ด๋–ป๊ฒŒ ์—ฐ๊ณ„ํ•˜์—ฌ ํ’€์–ด๋‚ผ์ง€์— ๋Œ€ํ•œ ๋งž์ถค ๊ฐ€์ด๋“œ]
220
 
221
 
222
- ### ๐Ÿ“ฐ 4. ๊ทผ๊ฑฐ ๋‰ด์Šค ์ถœ์ฒ˜ (GraphRAG ์ถ”์ฒœ ๊ธฐ์‚ฌ)
223
 
224
- > **GraphRAG ์ถ”์ฒœ ๊ด€๋ จ ๋‰ด์Šค 3์„ **
225
- > 1. *[๊ธฐ์‚ฌ ์ œ๋ชฉ 1](๊ธฐ์‚ฌ URL 1)* - ๋ณด๋„์ผ์ž/์–ธ๋ก ์‚ฌ
226
- > 2. *[๊ธฐ์‚ฌ ์ œ๋ชฉ 2](๊ธฐ์‚ฌ URL 2)* - ๋ณด๋„์ผ์ž/์–ธ๋ก ์‚ฌ
227
- > 3. *[๊ธฐ์‚ฌ ์ œ๋ชฉ 3](๊ธฐ์‚ฌ URL 3)* - ๋ณด๋„์ผ์ž/์–ธ๋ก ์‚ฌ
 
228
 
229
  ---
230
 
@@ -276,11 +319,20 @@ class LazyGraphRAG:
276
  tools=[
277
  vector_cypher_retriever.convert_to_tool(
278
  name="vector_retriever",
279
- description="๋‰ด์Šค ๋ณธ๋ฌธ์˜ ํ‚ค์›Œ๋“œ ๋ฐ ์˜๋ฏธ(๏ฟฝ๏ฟฝ๏ฟฝ์šฉ) ์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰. ๋‰ด์Šค ๊ธฐ์‚ฌ์˜ ์‹ค์ œ ์ถœ์ฒ˜(๊ธฐ์‚ฌ ์ œ๋ชฉ, URL)์™€ ๊ด€๋ จ ๊ธฐ์—…/๊ธฐ์ˆ /์„œ๋น„์Šค ๊ทธ๋ž˜ํ”„๋ฅผ ํ•จ๊ป˜ ๋ถ„์„ํ•ด ๋‹ต๋ณ€ํ•  ๋•Œ ์‚ฌ์šฉ.",
 
 
 
 
280
  ),
281
  text2cypher_retriever.convert_to_tool(
282
  name="text2cypher_retriever",
283
- description="์ž์—ฐ์–ด๋ฅผ Cypher๋กœ ๋ณ€ํ™˜. ํŠน์ • ๊ธฐ์—… ์„œ๋น„์Šค ๋ชฉ๋ก, ๊ธฐ์ˆ  ๋ณด์œ  ๊ธฐ์—… ๋“ฑ ๊ตฌ์กฐ์  ์งˆ์˜, ๋˜๋Š” '์ตœ๊ทผ ๊ธฐ์‚ฌ ์š”์•ฝ' ๊ฐ™์€ ์ตœ์‹  ์ „์ฒด ๋‰ด์Šค ๊ฒ€์ƒ‰์— ์‚ฌ์šฉ.",
 
 
 
 
 
284
  ),
285
  ],
286
  )
 
74
  OPTIONAL MATCH (company)-[:DEVELOPS]->(tech:AITechnology)
75
  OPTIONAL MATCH (company)-[:DEVELOPS]->(svc:AIService)
76
  OPTIONAL MATCH (article)-[:MENTIONS]->(field:AIField)
77
+
78
+ // ๋™์ผ ๊ธฐ์—…/๊ธฐ์ˆ /์„œ๋น„์Šค๋ฅผ ์–ธ๊ธ‰ํ•˜๋Š” ๊ด€๋ จ ๊ธฐ์‚ฌ๊นŒ์ง€ ํ™•์žฅ ํƒ์ƒ‰ (ํšก๋‹จ ๊ฒ€์ƒ‰)
79
+ OPTIONAL MATCH (related_article:Article)
80
+ WHERE related_article <> article
81
+ AND (
82
+ EXISTS { (related_article)-[:MENTIONS]->(:AICompany)<-[:MENTIONS]-(article) }
83
+ OR EXISTS { (related_article)-[:MENTIONS]->(:AITechnology)<-[:MENTIONS]-(article) }
84
+ OR EXISTS { (related_article)-[:MENTIONS]->(:AIService)<-[:MENTIONS]-(article) }
85
+ )
86
+ WITH
87
+ node, article, company, tech, svc, field,
88
+ collect(DISTINCT related_article.title)[..3] AS related_titles,
89
+ collect(DISTINCT related_article.url)[..3] AS related_urls
90
  RETURN
91
  node.chunk AS chunk,
92
  article.title AS article_title,
 
95
  collect(DISTINCT company.name) AS companies,
96
  collect(DISTINCT tech.name) AS technologies,
97
  collect(DISTINCT svc.name) AS services,
98
+ collect(DISTINCT field.name) AS fields,
99
+ related_titles AS related_article_titles,
100
+ related_urls AS related_article_urls
101
  """
102
 
103
 
 
156
  RETURN a.title AS title, a.url AS url, a.published_date AS published_date, c.chunk AS chunk
157
  ORDER BY a.published_date DESC
158
  LIMIT 3""",
159
+ """USER INPUT: ์ตœ๊ทผ ๊ฐ€์žฅ ๊ด€์‹ฌ์ด ๋†’์€ AI ๊ธฐ์ˆ ์ด ๋ญ์•ผ?
160
+ CYPHER QUERY:
161
+ MATCH (a:Article)-[:MENTIONS]->(t:AITechnology)
162
+ OPTIONAL MATCH (c:AICompany)-[:DEVELOPS]->(t)
163
+ WITH t, count(DISTINCT a) AS article_count, collect(DISTINCT c.name)[..3] AS companies, collect(DISTINCT a.title)[..3] AS article_titles, collect(DISTINCT a.url)[..3] AS article_urls
164
+ ORDER BY article_count DESC
165
+ RETURN t.name AS tech_name, t.description AS description, article_count, companies, article_titles, article_urls
166
+ LIMIT 5""",
167
+ """USER INPUT: AI ๊ธฐ์ˆ  ํŠธ๋ Œ๋“œ๋ฅผ ๋ถ„์„ํ•ด์ค˜
168
+ CYPHER QUERY:
169
+ MATCH (a:Article)-[:MENTIONS]->(t:AITechnology)
170
+ OPTIONAL MATCH (c:AICompany)-[:DEVELOPS]->(t)
171
+ WITH t, count(DISTINCT a) AS article_count, collect(DISTINCT c.name)[..3] AS companies, collect(DISTINCT a.title)[..2] AS article_titles, collect(DISTINCT a.url)[..2] AS article_urls
172
+ ORDER BY article_count DESC
173
+ RETURN t.name AS tech_name, article_count, companies, article_titles, article_urls
174
+ LIMIT 5""",
175
+ """USER INPUT: ํ˜„๋Œ€์ฐจ ๋˜๋Š” ๋กœ๋ด‡ ๊ด€๋ จ AI ๋‰ด์Šค ์•Œ๋ ค์ค˜
176
+ CYPHER QUERY:
177
+ MATCH (a:Article)-[:MENTIONS]->(c:AICompany)
178
+ WHERE c.name CONTAINS 'ํ˜„๋Œ€' OR c.name CONTAINS '๋กœ๋ด‡'
179
+ OPTIONAL MATCH (a)-[:MENTIONS]->(t:AITechnology)
180
+ OPTIONAL MATCH (a)-[:MENTIONS]->(s:AIService)
181
+ RETURN a.title AS article_title, a.url AS article_url, a.published_date AS article_date,
182
+ collect(DISTINCT c.name) AS companies, collect(DISTINCT t.name) AS technologies, collect(DISTINCT s.name) AS services
183
+ ORDER BY a.published_date DESC LIMIT 5""",
184
  ]
185
 
186
  # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 
245
 
246
  - **์ด์Šˆ ์ „๊ฐœ**: [๊ตฌ์ฒด์ ์ธ ์ด์Šˆ ๋ฐœ์ƒ ๋ฐฐ๊ฒฝ ๋ฐ ์ง„ํ–‰ ๊ฒฝ๊ณผ]
247
 
248
+ - **๊ธฐ์—… ๋™ํ–ฅ**: [๊ด€๋ จ ํ•ต์‹ฌ ๊ธฐ์—…๋“ค์˜ ์‹ค๋ฌผ ๋น„์ฆˆ๋‹ˆ์Šค ์›€์ง์ž„ ๋ฐ ๋Œ€์‘ ํ–‰๋ณด. ์ปจํ…์ŠคํŠธ์— ์—ฌ๋Ÿฌ ๊ธฐ์—…/๊ธฐ์ˆ ์ด ์žˆ๋‹ค๋ฉด ๋ชจ๋‘ ์–ธ๊ธ‰]
249
+
250
+ - **๊ธฐ์ˆ  ํŠธ๋ Œ๋“œ**: [์ปจํ…์ŠคํŠธ์— ๋“ฑ์žฅํ•˜๋Š” ํ•ต์‹ฌ AI ๊ธฐ์ˆ ๋“ค์„ ๋น„๊ต/๋ถ„๋ฅ˜ํ•˜์—ฌ ์ „์ฒด ํŠธ๋ Œ๋“œ ํ๋ฆ„ ๋ถ„์„]
251
 
252
  - **์ธํ”„๋ผ/์‚ฌํšŒ์  ์š”์ธ**: [์ „๋ ฅ๋ง ๋ถ€์กฑ, ๋Œ€์ค‘์  ๋ถˆ์•ˆ๊ฐ, ํ•˜๋“œ์›จ์–ด์  ์ œ์•ฝ ์‚ฌํ•ญ ๋“ฑ ํ•ต์‹ฌ ์š”์ธ]
253
 
 
261
  - **์‹ค์ „ ์ž์†Œ์„œ/๋ฉด์ ‘ ํ™œ์šฉ Tip**: [์ง€์›๋™๊ธฐ๋‚˜ ์—ญ๋Ÿ‰ ๊ธฐ์ˆ ์„œ ์ž‘์„ฑ ์‹œ ๋ณธ์ธ์˜ ์—ญ๋Ÿ‰๊ณผ ์–ด๋–ป๊ฒŒ ์—ฐ๊ณ„ํ•˜์—ฌ ํ’€์–ด๋‚ผ์ง€์— ๋Œ€ํ•œ ๋งž์ถค ๊ฐ€์ด๋“œ]
262
 
263
 
264
+ ### ๐Ÿ“ฐ 4. ๊ทผ๊ฑฐ ๋‰ด์Šค ์ถœ์ฒ˜ (GraphRAG ๊ฒ€์ƒ‰ ๊ธฐ์‚ฌ)
265
 
266
+ > ์ปจํ…์ŠคํŠธ์— ์‹ค์ œ๋กœ ์กด์žฌํ•˜๋Š” ๊ธฐ์‚ฌ URL๋งŒ ๊ธฐ์žฌํ•˜๊ณ , ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ธฐ์‚ฌ๋Š” ์ ˆ๋Œ€ ์ง€์–ด๋‚ด์ง€ ๋งˆ์„ธ์š”.
267
+ > ๊ฒ€์ƒ‰๋œ ๊ธฐ์‚ฌ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์•„๋ž˜ ํ˜•์‹์œผ๋กœ ์—ด๊ฑฐํ•˜๊ณ , ์—†์œผ๋ฉด ์ด ์„น์…˜์„ ์ƒ๋žตํ•˜์„ธ์š”.
268
+ >
269
+ > ์˜ˆ์‹œ:
270
+ > - *[๊ธฐ์‚ฌ ์ œ๋ชฉ](๊ธฐ์‚ฌ URL)* โ€” ๋ณด๋„์ผ์ž
271
 
272
  ---
273
 
 
319
  tools=[
320
  vector_cypher_retriever.convert_to_tool(
321
  name="vector_retriever",
322
+ description=(
323
+ "๋‰ด์Šค ๋ณธ๋ฌธ ์˜๋ฏธ ์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ + ์—ฐ๊ฒฐ๋œ ์—”ํ‹ฐํ‹ฐ(๊ธฐ์—…ยท๊ธฐ์ˆ ยท์„œ๋น„์Šคยท๋ถ„์•ผ) ๊ด€๊ณ„ ๊ทธ๋ž˜ํ”„ ํƒ์ƒ‰. "
324
+ "ํŠน์ • ์ฃผ์ œ/๊ธฐ์—…/๊ธฐ์ˆ ์— ๋Œ€ํ•ด ๋‰ด์Šค ๊ธฐ์‚ฌ ๋ฐ ๊ด€๋ จ ๊ทธ๋ž˜ํ”„ ๊ด€๊ณ„๋ฅผ ํ•จ๊ป˜ ๋ถ„์„ํ•  ๋•Œ ์‚ฌ์šฉ. "
325
+ "์˜ˆ: 'ํ˜„๋Œ€์ฐจ AI ๋‰ด์Šค', 'ํŠน์ • ๊ธฐ์ˆ ์˜ ์ ์šฉ ์‚ฌ๋ก€'."
326
+ ),
327
  ),
328
  text2cypher_retriever.convert_to_tool(
329
  name="text2cypher_retriever",
330
+ description=(
331
+ "์ž์—ฐ์–ด๋ฅผ Neo4j Cypher ์ฟผ๋ฆฌ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ๋ฅผ ์ง‘๊ณ„ยทํƒ์ƒ‰. "
332
+ "'๊ฐ€์žฅ ๋งŽ์ด ์–ธ๊ธ‰๋œ ๊ธฐ์ˆ ', 'ํŠธ๋ Œ๋“œ ๋ถ„์„', 'ํŠน์ • ๊ธฐ์—…์˜ ์„œ๋น„์Šค ๋ชฉ๋ก', "
333
+ "'์–ด๋–ค ๊ธฐ์—…์ด X ๊ธฐ์ˆ ์„ ๊ฐœ๋ฐœํ•˜๋‚˜', '์ตœ๊ทผ ๋‰ด์Šค ์š”์•ฝ' ๋“ฑ "
334
+ "์ง‘๊ณ„(COUNT/ORDER BY)๋‚˜ ๊ตฌ์กฐ์  ๊ด€๊ณ„ ์งˆ์˜์— ๋ฐ˜๋“œ์‹œ ์‚ฌ์šฉ."
335
+ ),
336
  ),
337
  ],
338
  )