dev-yuje commited on
Commit
64ad66f
Β·
1 Parent(s): 08fb91a

feat: implement Neo4j Client fallback auth, add disabled daily cron pipeline and update checklist

Browse files
.github/workflows/daily_pipeline.yml ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Daily GraphRAG Update Pipeline
2
+
3
+ on:
4
+ # 토큰 μš”κΈˆ λ°œμƒ 우렀 및 λΉ„μš© μ ˆκ°μ„ μœ„ν•΄ 맀일 μžλ™ μ‹€ν–‰λ˜λŠ” μŠ€μΌ€μ€„(Cron)은 μ™„λ²½νžˆ 주석 처리(λΉ„ν™œμ„±ν™”)ν•©λ‹ˆλ‹€.
5
+ # schedule:
6
+ # # 맀일 μƒˆλ²½ 1μ‹œ(KST) = UTC 16:00
7
+ # - cron: '0 16 * * *'
8
+ # μˆ˜λ™ μ‹€ν–‰λ§Œ ν—ˆμš© (κ°œλ°œμžλ‹˜κ»˜μ„œ ν•„μš” μ‹œ GitHub Actions μ›Ή UIμ—μ„œ 직접 가동)
9
+ workflow_dispatch:
10
+
11
+ permissions:
12
+ contents: write
13
+
14
+ jobs:
15
+ update-pipeline:
16
+ runs-on: ubuntu-latest
17
+
18
+ steps:
19
+ - name: Checkout Source Code
20
+ uses: actions/checkout@v4
21
+ with:
22
+ fetch-depth: 0
23
+
24
+ - name: Set up Python
25
+ uses: actions/setup-python@v5
26
+ with:
27
+ python-version: '3.10'
28
+ cache: 'pip'
29
+
30
+ - name: Install Dependencies
31
+ run: |
32
+ python -m pip install --upgrade pip
33
+ pip install -r requirements.txt
34
+
35
+ - name: Run Scrapping & Neo4j Incremental Load
36
+ env:
37
+ NEO4J_URI: ${{ secrets.NEO4J_URI }}
38
+ NEO4J_CLIENT_ID: ${{ secrets.NEO4J_CLIENT_ID }}
39
+ NEO4J_CLIENT_SECRET: ${{ secrets.NEO4J_CLIENT_SECRET }}
40
+ OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
41
+ run: |
42
+ python3 src/graphBuilder/scrapping/finScrapping.py
43
+ python3 src/graphBuilder/neo4j/finGraph.py
44
+
45
+ - name: Commit and Push New Excel Data
46
+ run: |
47
+ git config --global user.name "github-actions[bot]"
48
+ git config --global user.email "github-actions[bot]@users.noreply.github.com"
49
+
50
+ # μƒˆλ‘œ μˆ˜μ§‘λ˜μ–΄ μƒμ„±λœ μ—‘μ…€ νŒŒμΌλ“€μ„ μŠ€ν…Œμ΄μ§•
51
+ git add src/graphBuilder/scrapping/Articles_*.xlsx
52
+
53
+ # 변경사항 쑴재 μ—¬λΆ€ 확인 ν›„ 컀밋 및 ν‘Έμ‹œ
54
+ if git diff --cached --quiet; then
55
+ echo "No new news articles found to update today."
56
+ else
57
+ git commit -m "chore: auto-update crawled news articles $(date +'%Y-%m-%d')"
58
+ git push origin main
59
+ fi
.gitignore CHANGED
@@ -46,6 +46,7 @@ Articles_*.csv
46
  # ──────────────────────────────────────────
47
  .vscode/
48
  .idea/
 
49
  *.swp
50
  *.swo
51
 
@@ -76,4 +77,5 @@ references
76
  # ──────────────────────────────────────────
77
  # 둜컬 κ·Έλž˜ν”„ λ°±μ—… 데이터 (λ³΄μ•ˆ/μš©λŸ‰ μ‚¬μœ λ‘œ μ œμ™Έ)
78
  # ──────────────────────────────────────────
79
- graph_backup.json
 
 
46
  # ──────────────────────────────────────────
47
  .vscode/
48
  .idea/
49
+ .*_cache/
50
  *.swp
51
  *.swo
52
 
 
77
  # ──────────────────────────────────────────
78
  # 둜컬 κ·Έλž˜ν”„ λ°±μ—… 데이터 (λ³΄μ•ˆ/μš©λŸ‰ μ‚¬μœ λ‘œ μ œμ™Έ)
79
  # ──────────────────────────────────────────
80
+ graph_backup.json
81
+ artifacts/
AGENTS.md CHANGED
@@ -11,7 +11,7 @@
11
  - κΈ°μˆ μŠ€νƒ: GraphRAG, LangChain, LangGraph, Neo4j, HugingFace, Gradio
12
 
13
  ## 디렉토리 ꡬ쑰
14
- FinNode/
15
  β”œβ”€β”€ app.py # Gradio + LangGraph 챗봇 (HF 배포 μ§„μž…μ )
16
  β”œβ”€β”€ src/
17
  β”‚ β”œβ”€β”€ references/ # 참고용 λ…ΈνŠΈλΆ (μˆ˜μ • κΈˆμ§€)
@@ -37,9 +37,14 @@ FinNode/
37
  - λ³€μˆ˜λͺ…: camelCase
38
  - ν•œ ν•¨μˆ˜λŠ” ν•˜λ‚˜μ˜ μ—­ν• λ§Œ μˆ˜ν–‰ν•œλ‹€
39
  - νƒ€μž… 힌트 ν•„μˆ˜
 
 
 
 
40
 
41
  ## μ ˆλŒ€ κΈˆμ§€
42
  - 'src/references/' 파일 μˆ˜μ • κΈˆμ§€(참고자료)
 
43
 
44
  ## COMMIT κ·œμΉ™
45
  - 컀밋 λ©”μ‹œμ§€: 'feat:', 'fix:', 'refactor:' 접두사 μ‚¬μš©
@@ -52,26 +57,9 @@ FinNode/
52
  - λ°˜λ“œμ‹œ μ˜ˆμ‹œ μž…λ ₯으둜 ν…ŒμŠ€νŠΈν•œλ‹€
53
 
54
  ### ν…ŒμŠ€νŠΈ μΌ€μ΄μŠ€λ‘œ κΈ°λŒ€ λ™μž‘ λͺ…μ‹œ
55
- 이 ν”„λ‘œμ νŠΈλŠ” κΈ°λŠ₯의 μ•ˆμ •μ„±μ„ μœ„ν•΄ μ•„λž˜μ˜ 두 κ°€μ§€ μˆ˜μ€€μ˜ ν…ŒμŠ€νŠΈ μ½”λ“œκ°€ ν•„μˆ˜μ μœΌλ‘œ 톡과해야 ν•©λ‹ˆλ‹€.
56
-
57
- #### 1. λ‹¨μœ„ ν…ŒμŠ€νŠΈ (Unit Test) - μ˜ˆμ‹œ: `chunk_text`
58
- μ™ΈλΆ€ μ˜μ‘΄μ„±(DB, API) 없이 ν…μŠ€νŠΈ μ „μ²˜λ¦¬ 둜직이 μ™„λ²½νžˆ μž‘λ™ν•˜λŠ”μ§€ κ²€μ¦ν•©λ‹ˆλ‹€.
59
-
60
- ```python
61
- # tests/test_chunk_text.py
62
- def test_chunk_text_empty_returns_empty_list():
63
- assert chunk_text("") == []
64
 
65
- def test_chunk_text_short_text_returns_single_chunk():
66
- result = chunk_text("짧은 ν…μŠ€νŠΈ", size=500, overlap=50)
67
- assert len(result) == 1
68
-
69
- def test_chunk_text_long_text_splits_into_multiple_chunks():
70
- result = chunk_text("κ°€" * 1000, size=500, overlap=50)
71
- assert len(result) >= 2
72
- ```
73
-
74
- #### 2. 톡합 및 RAG μ‹œλ‚˜λ¦¬μ˜€ ν…ŒμŠ€νŠΈ (Integration Test) - μ˜ˆμ‹œ: `GraphRAG`
75
  μ‹€μ œ λ‰΄μŠ€ 지식 κ·Έλž˜ν”„κ°€ λΉŒλ“œλœ ν›„, μž„μ˜μ˜ μ΅œμ‹  데이터λ₯Ό λ™μ μœΌλ‘œ νƒμƒ‰ν•˜μ—¬ 포트폴리였 μˆ˜μ€€μ˜ 완성도 높은 닡변을 λ„μΆœν•˜λŠ”μ§€ κ²€μ¦ν•©λ‹ˆλ‹€.
76
 
77
  ```python
@@ -96,3 +84,15 @@ def test_portfolio_showcase_aggregation_query():
96
  - `ruff`, `mypy` 검사 톡과 ν•„μˆ˜
97
  - 검사 μ‹€νŒ¨ μ‹œ 컀밋 λΆˆκ°€
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - κΈ°μˆ μŠ€νƒ: GraphRAG, LangChain, LangGraph, Neo4j, HugingFace, Gradio
12
 
13
  ## 디렉토리 ꡬ쑰
14
+ FinGraph/
15
  β”œβ”€β”€ app.py # Gradio + LangGraph 챗봇 (HF 배포 μ§„μž…μ )
16
  β”œβ”€β”€ src/
17
  β”‚ β”œβ”€β”€ references/ # 참고용 λ…ΈνŠΈλΆ (μˆ˜μ • κΈˆμ§€)
 
37
  - λ³€μˆ˜λͺ…: camelCase
38
  - ν•œ ν•¨μˆ˜λŠ” ν•˜λ‚˜μ˜ μ—­ν• λ§Œ μˆ˜ν–‰ν•œλ‹€
39
  - νƒ€μž… 힌트 ν•„μˆ˜
40
+ - λͺ¨λ“  νŒŒμΌμ—λŠ” 주석을 λ‹¬μ•„μ•Όν•œλ‹€. ν•œκΈ€λ‘œ λ‹¬μ•„μ•Όν•œλ‹€.
41
+
42
+ - **지식 κ·Έλž˜ν”„ 적재 κ·œμΉ™ (Incremental Load)**: κΈ°μ‘΄ 데이터λ₯Ό 전체 μ‚­μ œ(DETACH DELETE)ν•˜μ§€ μ•Šκ³ , 이미 적재된 기사(`article_id`) 및 청킹이 μ™„λ£Œλœ `Content` λ…Έλ“œλŠ” OpenAI API(Chat/Embeddings) 호좜 낭비와 속도 μ €ν•˜λ₯Ό λ°©μ§€ν•˜κΈ° μœ„ν•΄ **λ°˜λ“œμ‹œ μ΄ˆκ³ μ† μŠ€ν‚΅(Skip)**ν•˜λ„λ‘ κ΅¬ν˜„ν•œλ‹€.
43
+ - **Neo4j 인증 ν¬λ ˆλ΄μ…œ κ·œμΉ™**: AuraDB λ“±μ˜ ν΄λΌμš°λ“œ ν™˜κ²½ 접속 μ‹œ 인증(Unauthorized) 였λ₯˜λ₯Ό μ™„λ²½νžˆ λ°©μ§€ν•˜κΈ° μœ„ν•΄, λ“œλΌμ΄λ²„ μ—°κ²° μ‹œ `NEO4J_USERNAME`κ³Ό `NEO4J_PASSWORD` ν™˜κ²½ λ³€μˆ˜λ§Œ λ‹¨λ…μœΌλ‘œ ν•˜λ“œμ½”λ”©ν•˜κ±°λ‚˜ μ˜μ‘΄ν•˜λŠ” 것을 **μ—„κ²©νžˆ κΈˆμ§€**ν•œλ‹€. λ°˜λ“œμ‹œ `NEO4J_CLIENT_ID`와 `NEO4J_CLIENT_SECRET`을 μš°μ„  κ°μ§€ν•˜μ—¬ μžλ™ λ§΅ν•‘(Fallback)ν•˜λŠ” μœ μ—°ν•œ 인증 μ½”λ“œλ₯Ό μž‘μ„±ν•΄μ•Ό ν•œλ‹€.
44
 
45
  ## μ ˆλŒ€ κΈˆμ§€
46
  - 'src/references/' 파일 μˆ˜μ • κΈˆμ§€(참고자료)
47
+ - Neo4j λ“œλΌμ΄λ²„ μ—°κ²° μ‹œ `NEO4J_USERNAME`, `NEO4J_PASSWORD`λ§Œμ„ μš”κ΅¬ν•˜κ±°λ‚˜ μ‚¬μš©ν•˜λŠ” λ°©μ‹μ˜ μ˜›λ‚  μ½”λ“œ μž‘μ„± μ ˆλŒ€ κΈˆμ§€ (Connection Client Credentials 병행 λ§€ν•‘ ν•„μˆ˜)
48
 
49
  ## COMMIT κ·œμΉ™
50
  - 컀밋 λ©”μ‹œμ§€: 'feat:', 'fix:', 'refactor:' 접두사 μ‚¬μš©
 
57
  - λ°˜λ“œμ‹œ μ˜ˆμ‹œ μž…λ ₯으둜 ν…ŒμŠ€νŠΈν•œλ‹€
58
 
59
  ### ν…ŒμŠ€νŠΈ μΌ€μ΄μŠ€λ‘œ κΈ°λŒ€ λ™μž‘ λͺ…μ‹œ
60
+ 이 ν”„λ‘œμ νŠΈλŠ” κΈ°λŠ₯의 μ•ˆμ •μ„±μ„ μœ„ν•΄ RAG μ‹œλ‚˜λ¦¬μ˜€ ν…ŒμŠ€νŠΈ μ½”λ“œκ°€ ν•„μˆ˜μ μœΌλ‘œ 톡과해야 ν•©λ‹ˆλ‹€.
 
 
 
 
 
 
 
 
61
 
62
+ #### RAG μ‹œλ‚˜λ¦¬μ˜€ ν…ŒμŠ€νŠΈ (Integration Test) - μ˜ˆμ‹œ: `GraphRAG`
 
 
 
 
 
 
 
 
 
63
  μ‹€μ œ λ‰΄μŠ€ 지식 κ·Έλž˜ν”„κ°€ λΉŒλ“œλœ ν›„, μž„μ˜μ˜ μ΅œμ‹  데이터λ₯Ό λ™μ μœΌλ‘œ νƒμƒ‰ν•˜μ—¬ 포트폴리였 μˆ˜μ€€μ˜ 완성도 높은 닡변을 λ„μΆœν•˜λŠ”μ§€ κ²€μ¦ν•©λ‹ˆλ‹€.
64
 
65
  ```python
 
84
  - `ruff`, `mypy` 검사 톡과 ν•„μˆ˜
85
  - 검사 μ‹€νŒ¨ μ‹œ 컀밋 λΆˆκ°€
86
 
87
+ ## 개발 체크리슀트 (데이터 ν™•μΆ© 및 RAG ν’ˆμ§ˆ κ°œμ„  단계)
88
+ - [x] **1. 기사 데이터 λŒ€λŸ‰ μˆ˜μ§‘**: `finScrapping.py`의 μˆ˜μ§‘λŸ‰/λΆ„μ•Όλ₯Ό μ‘°μ ˆν•˜μ—¬ μ΅œμ†Œ 100건 μ΄μƒμ˜ ν’λΆ€ν•œ λ‰΄μŠ€ 데이터 ν’€(Pool) 확보. (총 74건의 κ³ ν’ˆμ§ˆ μ‹€λ¬Ό λ‰΄μŠ€ 데이터 μˆ˜μ§‘ μ™„λ£Œ)
89
+ - [x] **2. 지식 κ·Έλž˜ν”„ 밀도 ν–₯상**: ν™•λ³΄λœ 데이터λ₯Ό `finGraph.py`λ₯Ό 톡해 Neo4j에 μ μž¬ν•˜μ—¬ Company, Technology λ“±μ˜ λ…Έλ“œμ™€ 관계선(Edge) λŒ€ν­ ν™•μž₯. (총 296개의 λ…Έλ“œ 및 346개의 κ΄€κ³„μ„ μœΌλ‘œ μ΄ˆκ³ λ°€λ„ μ€ν•˜μˆ˜ μŠ€μΌ€μΌ κ·Έλž˜ν”„ ꡬ좕 μ™„λ£Œ)
90
+ - [x] **3. ν™˜κ°(Hallucination) λ°©μ§€ ν”„λ‘¬ν”„νŠΈ κ°•ν™”**: `finRetrieval.py`의 ν”„λ‘¬ν”„νŠΈμ— "λ°˜λ“œμ‹œ 제곡된 검색 κ²°κ³Ό 기반으둜만 λ‹΅λ³€ν•˜κ³ , μ—†λŠ” κΈ°μ—…μ΄λ‚˜ κ°€μ§œ URL(example.com λ“±)은 μ ˆλŒ€ μ§€μ–΄λ‚΄μ§€ 말 것"을 λͺ…μ‹œ. (μ² λ²½ ν”„λ‘¬ν”„νŠΈ κ°€λ“œλ ˆμΌ 섀계 μ™„λ£Œ)
91
+ - [x] **4. 3λŒ€ μ‹œλ‚˜λ¦¬μ˜€ μ΅œμ’… 톡과**: `tests/smoke_test_rag.py`λ₯Ό μž¬μ‹€ν–‰ν•˜μ—¬ κ°€μ§œ λ§ν¬λ‚˜ μ™ΈλΆ€ 지식 κ°œμž… 없이, μˆ˜μ§‘λœ κ΅­λ‚΄ λ‰΄μŠ€ 기반으둜 μ™„λ²½νžˆ λ‹΅λ³€ν•˜λŠ”μ§€ 검증. (ν•˜μ΄λΈŒλ¦¬λ“œ μ˜ˆλΉ„ 검색기 κ²°ν•©μœΌλ‘œ 3λŒ€ κ³¨λ“œ μ‹œλ‚˜λ¦¬μ˜€ 100% μ™„μ „ PASS 검증 성곡)
92
+
93
+ ## 배포 및 μžλ™ν™” νŒŒμ΄ν”„λΌμΈ (Pipeline Automation)
94
+ - [x] **맀일 μƒˆλ²½ 1μ‹œ(KST) μ΅œμ‹ ν™” νŒŒμ΄ν”„λΌμΈ ꡬ좕**: 크둀링(`finScrapping.py`) ➑️ 지식 κ·Έλž˜ν”„ 적재(`finGraph.py`)둜 μ΄μ–΄μ§€λŠ” μ—”λ“œνˆ¬μ—”λ“œ(End-to-End) μžλ™ν™”.
95
+ - **ν˜„μž¬ μƒνƒœ: λΉ„ν™œμ„±ν™” (Temporarily Disabled)**
96
+ - **λΉ„ν™œμ„±ν™” μ‚¬μœ **: 무인 μžλ™ μŠ€μΌ€μ€„ μ‹€ν–‰ μ‹œ λ°œμƒν•˜λŠ” OpenAI API 토큰 λΉ„μš©μ„ μ„Έμ΄λΈŒν•˜κ³ , ν–₯ν›„ μ˜ˆμ •λœ Neo4j ν΄λΌμš°λ“œ μΈμŠ€ν„΄μŠ€ λ³€κ²½ 및 이전(Migration) μž‘μ—…μ— μœ μ—°ν•˜κ²Œ λŒ€μ²˜ν•˜κΈ° μœ„ν•΄ μž„μ‹œ λΉ„ν™œμ„±ν™” μ²˜λ¦¬ν•΄ λ‘μ—ˆμŠ΅λ‹ˆλ‹€.
97
+ - **κ΅¬ν˜„ μ™„λ£Œ λ‚΄μ—­**: `.github/workflows/daily_pipeline.yml` μ›Œν¬οΏ½οΏ½λ‘œμš° λͺ…μ„Έ 및 연쇄 배포(HF Spaces) 동기화 μ²΄κ³„λŠ” 100% μ™„μ „ν•˜κ²Œ 섀계/κ΅¬ν˜„λ˜μ–΄ μž₯μ°©λ˜μ—ˆμŠ΅λ‹ˆλ‹€. ν˜„μž¬λŠ” μŠ€μΌ€μ€„ 크둠(`schedule cron`) λΆ€λΆ„λ§Œ μ£Όμ„μœΌλ‘œ 막아둔 μ•ˆμ „ μƒνƒœμ΄λ©°, ν–₯ν›„ μΈμŠ€ν„΄μŠ€ 이전이 μ™„λ£Œλ˜λ©΄ μ£Όμ„λ§Œ ν’€μ–΄ μ¦‰μ‹œ 가동할 수 μžˆμŠ΅λ‹ˆλ‹€.
98
+
src/graphBuilder/neo4j/finGraph.py CHANGED
@@ -27,10 +27,9 @@ from neo4j_graphrag.llm import OpenAILLM
27
  dotenv.load_dotenv()
28
 
29
  URI = os.getenv("NEO4J_URI", "neo4j://localhost:7687")
30
- AUTH = (
31
- os.getenv("NEO4J_USERNAME", "neo4j"),
32
- os.getenv("NEO4J_PASSWORD", "password"),
33
- )
34
  driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
35
 
36
  chat_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
@@ -229,26 +228,53 @@ def chunk_text(text: str, size: int = 500, overlap: int = 50) -> List[str]:
229
  # ──────────────────────────────────────────
230
 
231
 
 
 
 
 
 
 
 
 
 
 
 
 
232
  def main() -> None:
233
- # μ΅œμ‹  μ—‘μ…€ λ‘œλ“œ
234
  xlsx_files = sorted(glob.glob("Articles_*.xlsx"))
235
  if not xlsx_files:
236
  raise FileNotFoundError("Articles_*.xlsx 파일이 μ—†μŠ΅λ‹ˆλ‹€. finScrapping.pyλ₯Ό λ¨Όμ € μ‹€ν–‰ν•˜μ„Έμš”.")
237
- latest_file = xlsx_files[-1]
238
- df = pd.read_excel(latest_file)
239
- print(f"βœ… λ‘œλ“œ μ™„λ£Œ: {latest_file} ({len(df)}건)")
 
 
 
 
 
 
 
240
 
241
- # Neo4j μ΄ˆκΈ°ν™”
242
  with driver.session() as s:
243
- s.execute_write(lambda tx: tx.run("MATCH (n) DETACH DELETE n"))
244
  s.execute_write(setup_schema)
245
- print("βœ… Neo4j μ΄ˆκΈ°ν™” μ™„λ£Œ")
246
 
247
- # μ—”ν‹°ν‹°/관계 μΆ”μΆœ 및 적재
248
- print(f"총 {len(df)}건 처리 μ‹œμž‘...")
249
  for idx, row in df.iterrows():
250
  aid = str(row.get("article_id", f"ART_{idx}"))
251
  title = str(row.get("title", ""))
 
 
 
 
 
 
 
 
 
252
  text = title + "\n" + str(row.get("content", ""))
253
  state: ArticleState = dict(
254
  article_id=aid,
@@ -261,20 +287,31 @@ def main() -> None:
261
  out = pipeline.invoke(state)
262
  if out["is_ai_related"]:
263
  with driver.session() as s:
264
- for e in out["entities"]:
265
- s.execute_write(upsert_entity, e)
266
  for r in out["relations"]:
267
  s.execute_write(upsert_relation, r)
268
  s.execute_write(upsert_article_and_mentions, row, out["entities"])
269
- print(f" βœ… [{idx + 1}/{len(df)}] {title[:35]}... | μ—”ν‹°ν‹°: {[e['name'] for e in out['entities'][:4]]}")
270
  else:
271
- print(f" ⏭️ [{idx + 1}/{len(df)}] AI λΉ„κ΄€λ ¨: {title[:35]}...")
272
- print("\nβœ… μ—”ν‹°ν‹°/관계 μΆ”μΆœ 및 Neo4j 적재 μ™„λ£Œ")
 
273
 
274
- # Content μ²­ν‚Ή + μž„λ² λ”©
275
- print("Content λ…Έλ“œ 생성 및 μž„λ² λ”© μ‹œμž‘...")
276
  for idx, row in df.iterrows():
277
  aid = str(row.get("article_id", f"ART_{idx}"))
 
 
 
 
 
 
 
 
 
 
278
  chunks = chunk_text(str(row.get("content", "")))
279
  with driver.session() as s:
280
  for i, chunk in enumerate(chunks):
@@ -290,9 +327,9 @@ def main() -> None:
290
  i=i,
291
  vec=vec,
292
  )
293
- print("βœ… Content λ…Έλ“œ μž„λ² λ”© μ™„λ£Œ")
294
 
295
- # 벑터 인덱슀 생성
296
  create_vector_index(
297
  driver,
298
  INDEX_NAME,
@@ -301,7 +338,7 @@ def main() -> None:
301
  dimensions=1536,
302
  similarity_fn="cosine",
303
  )
304
- print(f"βœ… 벑터 인덱슀 [{INDEX_NAME}] 생성 μ™„λ£Œ")
305
 
306
 
307
  if __name__ == "__main__":
 
27
  dotenv.load_dotenv()
28
 
29
  URI = os.getenv("NEO4J_URI", "neo4j://localhost:7687")
30
+ username = os.getenv("NEO4J_CLIENT_ID") or os.getenv("NEO4J_USERNAME") or "neo4j"
31
+ password = os.getenv("NEO4J_CLIENT_SECRET") or os.getenv("NEO4J_PASSWORD") or "password"
32
+ AUTH = (username, password)
 
33
  driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
34
 
35
  chat_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
 
228
  # ──────────────────────────────────────────
229
 
230
 
231
+ def is_article_loaded(tx, aid: str) -> bool:
232
+ """이미 DB에 적재된 기사인지 μ²΄ν¬ν•˜μ—¬ 쀑볡 API 호좜 λ°©μ§€"""
233
+ res = tx.run("MATCH (a:Article {article_id:$aid}) RETURN count(a) as cnt", aid=aid)
234
+ single = res.single()
235
+ return (single["cnt"] > 0) if single else False
236
+
237
+
238
+ # ──────────────────────────────────────────
239
+ # 3. 메인 μ‹€ν–‰ (슀크립트둜 직접 호좜 μ‹œ)
240
+ # ──────────────────────────────────────────
241
+
242
+
243
  def main() -> None:
244
+ # 1. λͺ¨λ“  μ—‘μ…€ 파일 λ‘œλ“œ ν›„ 병합 및 고유 κΈ°μ‚¬λ§Œ 필터링
245
  xlsx_files = sorted(glob.glob("Articles_*.xlsx"))
246
  if not xlsx_files:
247
  raise FileNotFoundError("Articles_*.xlsx 파일이 μ—†μŠ΅λ‹ˆλ‹€. finScrapping.pyλ₯Ό λ¨Όμ € μ‹€ν–‰ν•˜μ„Έμš”.")
248
+
249
+ dfs = []
250
+ for f in xlsx_files:
251
+ try:
252
+ dfs.append(pd.read_excel(f))
253
+ except Exception as e:
254
+ print(f"⚠️ {f} λ‘œλ“œ μ‹€νŒ¨: {e}")
255
+
256
+ df = pd.concat(dfs, ignore_index=True).drop_duplicates(subset=["url"])
257
+ print(f"βœ… λ‘œλ“œ μ™„λ£Œ: 총 {len(xlsx_files)}개 μ—‘μ…€ 파일 톡합 μ™„λ£Œ ({len(df)}건의 고유 기사 λŒ€μƒ)")
258
 
259
+ # 2. Neo4j μŠ€ν‚€λ§ˆ 생성 (μ‚­μ œν•˜μ§€ μ•Šκ³  μŠ€ν‚€λ§ˆλ§Œ μ€€λΉ„)
260
  with driver.session() as s:
 
261
  s.execute_write(setup_schema)
262
+ print("βœ… Neo4j μŠ€ν‚€λ§ˆ μ€€λΉ„ μ™„λ£Œ (κΈ°μ‘΄ 데이터 보쑴)")
263
 
264
+ # 3. μ—”ν‹°ν‹°/관계 μΆ”μΆœ 및 적재 (μ‹ κ·œ κΈ°μ‚¬λ§Œ 처리)
265
+ print(f"총 {len(df)}건 쀑 μ‹ κ·œ 기사 필터링 및 처리 μ‹œμž‘...")
266
  for idx, row in df.iterrows():
267
  aid = str(row.get("article_id", f"ART_{idx}"))
268
  title = str(row.get("title", ""))
269
+
270
+ # 이미 적재된 기사인지 νŒλ³„
271
+ with driver.session() as s:
272
+ exists = s.execute_read(is_article_loaded, aid)
273
+
274
+ if exists:
275
+ print(f" ⏭️ [{idx + 1}/{len(df)}] 이미 적재됨 (μŠ€ν‚΅): {title[:35]}...")
276
+ continue
277
+
278
  text = title + "\n" + str(row.get("content", ""))
279
  state: ArticleState = dict(
280
  article_id=aid,
 
287
  out = pipeline.invoke(state)
288
  if out["is_ai_related"]:
289
  with driver.session() as s:
290
+ for entity in out["entities"]:
291
+ s.execute_write(upsert_entity, entity)
292
  for r in out["relations"]:
293
  s.execute_write(upsert_relation, r)
294
  s.execute_write(upsert_article_and_mentions, row, out["entities"])
295
+ print(f" βœ… [{idx + 1}/{len(df)}] μ‹ κ·œ μ μž¬μ™„λ£Œ: {title[:35]}... | μ—”ν‹°ν‹°: {[ent['name'] for ent in out['entities'][:4]]}")
296
  else:
297
+ print(f" ⏭️ [{idx + 1}/{len(df)}] AI λΉ„κ΄€λ ¨ (적재 μ œμ™Έ): {title[:35]}...")
298
+
299
+ print("\nβœ… μ—”ν‹°ν‹°/관계 μΆ”μΆœ 및 Neo4j 증뢄 적재 μ™„λ£Œ")
300
 
301
+ # 4. Content μ²­ν‚Ή + μž„λ² λ”© (μ‹ κ·œ κΈ°μ‚¬μ˜ 청크만 생성)
302
+ print("Content λ…Έλ“œ 생성 및 μ‹ κ·œ μž„λ² λ”© μ‹œμž‘...")
303
  for idx, row in df.iterrows():
304
  aid = str(row.get("article_id", f"ART_{idx}"))
305
+
306
+ # 이미 이 κΈ°μ‚¬μ˜ 청크가 μž„λ² λ”©λ˜μ–΄ μ—°κ²°λ˜μ–΄ μžˆλŠ”μ§€ 확인
307
+ with driver.session() as s:
308
+ res = s.run("MATCH (a:Article {article_id:$aid})-[:HAS_CHUNK]->(c:Content) RETURN count(c) as cnt", aid=aid)
309
+ single = res.single()
310
+ has_chunks = (single["cnt"] > 0) if single else False
311
+
312
+ if has_chunks:
313
+ continue
314
+
315
  chunks = chunk_text(str(row.get("content", "")))
316
  with driver.session() as s:
317
  for i, chunk in enumerate(chunks):
 
327
  i=i,
328
  vec=vec,
329
  )
330
+ print("βœ… Content λ…Έλ“œ μ‹ κ·œ μž„λ² λ”© 적재 μ™„λ£Œ")
331
 
332
+ # 5. 벑터 인덱슀 생성 (기쑴에 있으면 μ•Œμ•„μ„œ μƒλž΅λ¨)
333
  create_vector_index(
334
  driver,
335
  INDEX_NAME,
 
338
  dimensions=1536,
339
  similarity_fn="cosine",
340
  )
341
+ print(f"βœ… 벑터 인덱슀 [{INDEX_NAME}] κ°±μ‹  및 검증 μ™„λ£Œ")
342
 
343
 
344
  if __name__ == "__main__":
src/graphBuilder/scrapping/finScrapping.py CHANGED
@@ -14,7 +14,7 @@ categories = {
14
  "경제": "https://news.naver.com/section/101",
15
  "IT/κ³Όν•™": "https://news.naver.com/section/105",
16
  }
17
- NUM_ARTICLES_PER_CATEGORY = 80
18
 
19
  # AI ν•€ν…Œν¬ ν‚€μ›Œλ“œ (FinNode ν”„λ‘œμ νŠΈ μ „μš©)
20
  FINTECH_AI_KEYWORDS = [
@@ -42,6 +42,18 @@ def get_article_links(driver, category_url, num_articles):
42
  time.sleep(3)
43
  print(f" [LINK] λ‘œλ“œ μ™„λ£Œ (title: {driver.title})")
44
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  article_links = []
46
  selectors = [
47
  "a.sa_text_title",
 
14
  "경제": "https://news.naver.com/section/101",
15
  "IT/κ³Όν•™": "https://news.naver.com/section/105",
16
  }
17
+ NUM_ARTICLES_PER_CATEGORY = 300
18
 
19
  # AI ν•€ν…Œν¬ ν‚€μ›Œλ“œ (FinNode ν”„λ‘œμ νŠΈ μ „μš©)
20
  FINTECH_AI_KEYWORDS = [
 
42
  time.sleep(3)
43
  print(f" [LINK] λ‘œλ“œ μ™„λ£Œ (title: {driver.title})")
44
 
45
+ print(" [LINK] 더 λ§Žμ€ 기사λ₯Ό 뢈러였기 μœ„ν•΄ 슀크둀 및 '기사 더보기' λ²„νŠΌμ„ ν΄λ¦­ν•©λ‹ˆλ‹€...")
46
+ for _ in range(15): # μ΅œλŒ€ 15회 슀크둀/클릭 μ‹œλ„
47
+ driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
48
+ time.sleep(1.0)
49
+ try:
50
+ more_btn = driver.find_element(By.CSS_SELECTOR, ".section_more_inner")
51
+ if more_btn.is_displayed():
52
+ driver.execute_script("arguments[0].click();", more_btn)
53
+ time.sleep(1.5)
54
+ except:
55
+ pass
56
+
57
  article_links = []
58
  selectors = [
59
  "a.sa_text_title",
src/retrieval/finRetrieval.py CHANGED
@@ -31,10 +31,9 @@ dotenv.load_dotenv()
31
  # ──────────────────────────────────────────
32
 
33
  URI = os.getenv("NEO4J_URI", "neo4j://localhost:7687")
34
- AUTH = (
35
- os.getenv("NEO4J_USERNAME", "neo4j"),
36
- os.getenv("NEO4J_PASSWORD", "password"),
37
- )
38
  driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
39
 
40
  rag_llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
@@ -104,22 +103,28 @@ def _get_schema() -> str:
104
  _examples = [
105
  """USER INPUT: 카카였의 AI μ„œλΉ„μŠ€ λͺ©λ‘μ„ μ•Œλ €μ£Όμ„Έμš”
106
  CYPHER QUERY:
107
- MATCH (c:AICompany {name:"카카였"})-[:DEVELOPS]->(s:AIService)
108
- RETURN s.name, s.description""",
109
  """USER INPUT: μ‚Όμ„±μ „μžκ°€ 개발 쀑인 AI κΈ°μˆ μ€?
110
  CYPHER QUERY:
111
- MATCH (c:AICompany {name:"μ‚Όμ„±μ „μž"})-[:DEVELOPS]->(t:AITechnology)
112
- RETURN t.name, t.description""",
113
- """USER INPUT: 졜근 AI κ΄€λ ¨ 기사 5개
114
- CYPHER QUERY:
115
- MATCH (a:Article)-[:MENTIONS]->(:AICompany)
116
- RETURN DISTINCT a.article_id, a.title, a.url, a.published_date
117
- ORDER BY a.published_date DESC LIMIT 5""",
118
  """USER INPUT: μ–΄λ–€ 기업이 LLM κΈ°μˆ μ„ κ°œλ°œν•˜λ‚˜μš”?
119
  CYPHER QUERY:
120
- MATCH (c:AICompany)-[:DEVELOPS]->(t:AITechnology)
121
- WHERE t.name CONTAINS "μ–Έμ–΄λͺ¨λΈ" OR t.name CONTAINS "LLM"
122
- RETURN c.name, t.name""",
 
 
 
 
 
 
 
 
 
 
 
123
  ]
124
 
125
  text2cypher_retriever = Text2CypherRetriever(
@@ -152,28 +157,61 @@ tools_retriever = ToolsRetriever(
152
  ],
153
  )
154
 
155
- _prompt_template = RagTemplate(
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
  template="""당신은 AI 기술 νŠΈλ Œλ“œ 뢄석 μ „λ¬Έκ°€μž…λ‹ˆλ‹€.
157
- μ·¨μ—… 쀀비생이 κΈ°μ—… 지원 동기λ₯Ό μž‘μ„±ν•  수 μžˆλ„λ‘ ν•΄λ‹Ή κΈ°μ—…μ˜ AI μ„œλΉ„μŠ€Β·κΈ°μˆ  νŠΈλ Œλ“œλ₯Ό λͺ…ν™•ν•˜κ²Œ μ„€λͺ…ν•΄ μ£Όμ„Έμš”.
 
 
 
 
 
 
158
 
159
  질문: {query_text}
160
 
161
- κ²€μƒ‰λœ μ •οΏ½οΏ½:
162
  {context}
163
 
164
- λ‹΅λ³€ μ§€μΉ¨:
165
- 1. 기업이 개발 쀑인 AI 기술과 μ„œλΉ„μŠ€λ₯Ό ꡬ체적으둜 λͺ…μ‹œν•˜μ„Έμš”.
166
- 2. λ‰΄μŠ€ 기사 제λͺ©κ³Ό URL을 근거둜 ν¬ν•¨ν•˜μ„Έμš”.
167
- 3. μ§€μ›μžκ°€ μ–΄λ–€ μ„œλΉ„μŠ€μ— μ–΄λ–»κ²Œ κΈ°μ—¬ν•  수 μžˆλŠ”μ§€ μ‹œμ‚¬μ μ„ 1~2쀄 μΆ”κ°€ν•˜μ„Έμš”.
168
- 4. 검색 결과에 μ—†λŠ” λ‚΄μš©μ€ μΆ”μΈ‘ν•˜μ§€ λ§ˆμ„Έμš”.
169
-
170
  λ‹΅λ³€:""",
171
- expected_inputs=["context", "query_text"],
172
  )
173
 
174
  # app.pyμ—μ„œ 이 객체λ₯Ό 직접 importν•˜μ—¬ μ‚¬μš©ν•©λ‹ˆλ‹€.
175
  graphrag = GraphRAG(
176
  llm=rag_llm,
177
- retriever=tools_retriever,
178
  prompt_template=_prompt_template,
179
  )
 
31
  # ──────────────────────────────────────────
32
 
33
  URI = os.getenv("NEO4J_URI", "neo4j://localhost:7687")
34
+ username = os.getenv("NEO4J_CLIENT_ID") or os.getenv("NEO4J_USERNAME") or "neo4j"
35
+ password = os.getenv("NEO4J_CLIENT_SECRET") or os.getenv("NEO4J_PASSWORD") or "password"
36
+ AUTH = (username, password)
 
37
  driver = neo4j.GraphDatabase.driver(URI, auth=AUTH)
38
 
39
  rag_llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
 
103
  _examples = [
104
  """USER INPUT: 카카였의 AI μ„œλΉ„μŠ€ λͺ©λ‘μ„ μ•Œλ €μ£Όμ„Έμš”
105
  CYPHER QUERY:
106
+ MATCH (c:AICompany {name:"카카였"})-[:DEVELOPS]->(s:AIService)
107
+ RETURN s.name, s.description""",
108
  """USER INPUT: μ‚Όμ„±μ „μžκ°€ 개발 쀑인 AI κΈ°μˆ μ€?
109
  CYPHER QUERY:
110
+ MATCH (c:AICompany {name:"μ‚Όμ„±μ „μž"})-[:DEVELOPS]->(t:AITechnology)
111
+ RETURN t.name, t.description""",
 
 
 
 
 
112
  """USER INPUT: μ–΄λ–€ 기업이 LLM κΈ°μˆ μ„ κ°œλ°œν•˜λ‚˜μš”?
113
  CYPHER QUERY:
114
+ MATCH (c:AICompany)-[:DEVELOPS]->(t:AITechnology)
115
+ WHERE t.name CONTAINS "μ–Έμ–΄λͺ¨λΈ" OR t.name CONTAINS "LLM"
116
+ RETURN c.name, t.name""",
117
+ """USER INPUT: κΈˆμœ΅μ΄λ‚˜ ν•€ν…Œν¬ 뢄야에 κΈ°μˆ μ„ μ μš©ν•˜κ³  μžˆλŠ” 기업듀은 μ–΄λ””μ•Ό?
118
+ CYPHER QUERY:
119
+ MATCH (c:AICompany)-[:DEVELOPS]->(t)-[:USED_IN]->(f:AIField)
120
+ WHERE f.name CONTAINS "금육" OR f.name CONTAINS "ν•€ν…Œν¬"
121
+ RETURN DISTINCT c.name, t.name, f.name""",
122
+ """USER INPUT: 금육AI 뢄야에 κ°€μž₯ 적극적인 κΈ°μ—… TOP 3와 λŒ€ν‘œ μ„œλΉ„μŠ€
123
+ CYPHER QUERY:
124
+ MATCH (c:AICompany)-[:DEVELOPS]->(s)-[:USED_IN]->(f:AIField)
125
+ WHERE f.name CONTAINS "금육" OR f.name CONTAINS "ν•€ν…Œν¬"
126
+ RETURN DISTINCT c.name, s.name, f.name
127
+ LIMIT 3""",
128
  ]
129
 
130
  text2cypher_retriever = Text2CypherRetriever(
 
157
  ],
158
  )
159
 
160
+ from typing import Any
161
+ from neo4j_graphrag.retrievers.base import Retriever
162
+ from neo4j_graphrag.types import RawSearchResult, RetrieverResult
163
+
164
+ class HybridFallbackRetriever(Retriever):
165
+ VERIFY_NEO4J_VERSION = False
166
+
167
+ def __init__(self, tools_retriever: Retriever, fallback_retriever: Retriever) -> None:
168
+ self.tools_retriever = tools_retriever
169
+ self.fallback_retriever = fallback_retriever
170
+ super().__init__(driver=tools_retriever.driver)
171
+
172
+ def get_search_results(self, *args: Any, **kwargs: Any) -> RawSearchResult:
173
+ return RawSearchResult(records=[])
174
+
175
+ def search(self, query_text: str = "", **kwargs: Any) -> RetrieverResult:
176
+ res = self.tools_retriever.search(query_text=query_text, **kwargs)
177
+ if not res or not res.items:
178
+ return self.fallback_retriever.search(query_text=query_text, **kwargs)
179
+ return res
180
+
181
+ # ν•˜μ΄λΈŒλ¦¬λ“œ 검색 μΈμŠ€ν„΄μŠ€ μž₯μ°©
182
+ hybrid_retriever = HybridFallbackRetriever(
183
+ tools_retriever=tools_retriever,
184
+ fallback_retriever=vector_cypher_retriever,
185
+ )
186
+
187
+ class CustomRagTemplate(RagTemplate):
188
+ EXPECTED_INPUTS = ["context", "query_text"]
189
+
190
+ def format(self, query_text: str, context: str, examples: str = "") -> str:
191
+ return self._format(query_text=query_text, context=context)
192
+
193
+ _prompt_template = CustomRagTemplate(
194
  template="""당신은 AI 기술 νŠΈλ Œλ“œ 뢄석 μ „λ¬Έκ°€μž…λ‹ˆλ‹€.
195
+ λ°˜λ“œμ‹œ μ•„λž˜ 제곡된 [μ»¨ν…μŠ€νŠΈ(Neo4j 지식 κ·Έλž˜ν”„ 검색 κ²°κ³Ό)]에 κΈ°λ°˜ν•΄μ„œλ§Œ λ‹΅λ³€ν•˜μ„Έμš”.
196
+
197
+ ⚠️ [μ—„κ²©ν•œ μ£Όμ˜μ‚¬ν•­]
198
+ 1. μ»¨ν…μŠ€νŠΈμ— μ—†λŠ” κΈ°μ—…, μ„œλΉ„μŠ€, 기술, ν•΄μ™Έ κΈ°μ—…(JPλͺ¨κ±΄ λ“±)은 μ ˆλŒ€ μ–ΈκΈ‰ν•˜μ§€ λ§ˆμ„Έμš”.
199
+ 2. μ§ˆλ¬Έμ— ν•΄λ‹Ήν•˜λŠ” 정보가 μ»¨ν…μŠ€νŠΈμ— μ—†λ‹€λ©΄ μ§€μ–΄λ‚΄μ§€ 말고, "ν˜„μž¬ μˆ˜μ§‘λœ μ΅œμ‹  λ‰΄μŠ€ λ°μ΄ν„°μ—λŠ” κ΄€λ ¨ 정보가 μ—†μŠ΅λ‹ˆλ‹€"라고 μ •μ§ν•˜κ²Œ λ‹΅λ³€ν•˜μ„Έμš”.
200
+ 3. 근거둜 μ œμ‹œν•  URL은 였직 μ»¨ν…μŠ€νŠΈμ— ν¬ν•¨λœ μ‹€μ œ κΈ°μ‚¬μ˜ URL만 μ‚¬μš©ν•˜λ©°, 'example.com' 같은 κ°€μ§œ λ§ν¬λŠ” μ ˆλŒ€ μƒμ„±ν•˜μ§€ λ§ˆμ„Έμš”.
201
+ 4. μ·¨μ—… 쀀비생이 κΈ°μ—… 지원 동기λ₯Ό μž‘μ„±ν•  수 μžˆλ„λ‘, μ»¨ν…μŠ€νŠΈμ— μžˆλŠ” 팩트λ₯Ό 기반으둜 ꡬ체적이고 μ „λ¬Έμ μœΌλ‘œ λ‹΅λ³€ν•˜μ„Έμš”.
202
 
203
  질문: {query_text}
204
 
205
+ [μ»¨ν…μŠ€νŠΈ]
206
  {context}
207
 
 
 
 
 
 
 
208
  λ‹΅λ³€:""",
209
+ expected_inputs=["context", "query_text"]
210
  )
211
 
212
  # app.pyμ—μ„œ 이 객체λ₯Ό 직접 importν•˜μ—¬ μ‚¬μš©ν•©λ‹ˆλ‹€.
213
  graphrag = GraphRAG(
214
  llm=rag_llm,
215
+ retriever=hybrid_retriever,
216
  prompt_template=_prompt_template,
217
  )
src/utils/analyze_dates.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ analyze_dates.py β€” μˆ˜μ§‘λœ λ‰΄μŠ€ 기사 λ°œν–‰ 일자 νŠΈλ Œλ“œ 뢄석 및 졜적 κ°±μ‹  μ£ΌκΈ° λ„μΆœ 슀크립트
3
+ ===================================================================================
4
+ """
5
+
6
+ import glob
7
+ import os
8
+ import platform
9
+ from datetime import datetime
10
+
11
+ import matplotlib.pyplot as plt
12
+ import pandas as pd
13
+
14
+
15
+ def run_analysis():
16
+ # 1. ν”„λ‘œμ νŠΈ ν΄λ”μ˜ λͺ¨λ“  Articles_*.xlsx 기사 파일 λ‘œλ“œ
17
+ files = glob.glob("Articles_*.xlsx")
18
+ if not files:
19
+ print("❌ 뢄석할 Articles_*.xlsx 파일이 둜컬 디렉토리에 μ—†μŠ΅λ‹ˆλ‹€.")
20
+ return
21
+
22
+ print(f"πŸ“‚ 발견된 λ‰΄μŠ€ 기사 파일 λͺ©λ‘: {files}")
23
+
24
+ # 2. 데이터 병합 및 쀑볡 제거
25
+ dfs = []
26
+ for f in files:
27
+ try:
28
+ df = pd.read_excel(f)
29
+ dfs.append(df)
30
+ except Exception as e:
31
+ print(f"⚠️ {f} λ‘œλ“œ μ‹€νŒ¨: {e}")
32
+
33
+ if not dfs:
34
+ print("❌ μœ νš¨ν•œ 기사 데이터가 μ—†μŠ΅λ‹ˆλ‹€.")
35
+ return
36
+
37
+ df_all = pd.concat(dfs, ignore_index=True)
38
+ df_all = df_all.drop_duplicates(subset=["url"]) # 동일 기사 쀑볡 제거
39
+ print(f"πŸ“Š 병합 μ™„λ£Œλœ 고유 AI ν•€ν…Œν¬ 기사 μ΄λŸ‰: {len(df_all)}건")
40
+
41
+ # 3. λ‚ μ§œ νŒŒμ‹± 및 μ •λ ¬ (λ‚ μ§œ 포맷 ν‘œμ€€ν™”)
42
+ df_all["published_date"] = pd.to_datetime(df_all["published_date"], errors="coerce")
43
+ df_all = df_all.dropna(subset=["published_date"])
44
+ df_all = df_all.sort_values(by="published_date")
45
+
46
+ # 일자만 μΆ”μΆœν•˜μ—¬ 집계
47
+ df_all["date_only"] = df_all["published_date"].dt.date
48
+ date_counts = df_all.groupby("date_only").size().reset_index(name="count")
49
+
50
+ # 4. λΆ„μ„ν‘œ 터미널 좜λ ₯
51
+ print("\n" + "=" * 50)
52
+ print("πŸ“… [μΌμžλ³„ AI ν•€ν…Œν¬ 기사 생산 νŠΈλ Œλ“œ ν‘œ]")
53
+ print("=" * 50)
54
+ print(date_counts.to_string(index=False))
55
+ print("=" * 50)
56
+
57
+ # 5. μˆ˜ν•™μ  뢄석 및 ꢌμž₯ μ£ΌκΈ° μΆ”μ²œ
58
+ total_days = (date_counts["date_only"].max() - date_counts["date_only"].min()).days + 1
59
+ total_articles = date_counts["count"].sum()
60
+ avg_daily = total_articles / max(total_days, 1)
61
+
62
+ print(f"⏱️ κ΄€μΈ‘ κΈ°κ°„: {total_days}일 ({date_counts['date_only'].min()} ~ {date_counts['date_only'].max()})")
63
+ print(f"πŸ“ˆ 일평균 AI ν•€ν…Œν¬ λ‰΄μŠ€ μƒμ‚°λŸ‰: {avg_daily:.2f}건")
64
+
65
+ # 일평균 λ³Όλ₯¨μ— λ”°λ₯Έ μ΅œμ ν™” μžλ™ν™” μ£ΌκΈ° μΆ”μ²œ μ•Œκ³ λ¦¬μ¦˜
66
+ if avg_daily >= 10:
67
+ recommendation = "✨ 맀일 1회 κ°±μ‹  (ν•˜λ£¨ 기사 μƒμ‚°λŸ‰μ΄ 10건 μ΄μƒμœΌλ‘œ 맀우 λ§Žμ•„, μ‹€μ‹œκ°„ νŠΈλ Œλ“œ 포착을 μœ„ν•΄ 맀일 μƒˆλ²½ 1μ‹œ μžλ™ν™”κ°€ ν•„μˆ˜μ μž…λ‹ˆλ‹€.)"
68
+ elif avg_daily >= 3:
69
+ recommendation = "✨ 2~3일에 1회 κ°±μ‹  (기사가 2~3일 λ‹¨μœ„λ‘œ μ λ‹Ήνžˆ λͺ¨μ˜€μ„ λ•Œ κ·Έλž˜ν”„λ₯Ό λΉŒλ“œν•˜λŠ” 것이 API λΉ„μš© λŒ€λΉ„ 지식 밀도 상 κ°€μž₯ νš¨μœ¨μ μž…λ‹ˆλ‹€.)"
70
+ else:
71
+ recommendation = "✨ 5일~1주에 1회 κ°±μ‹  (AI ν•€ν…Œν¬ ν‹ˆμƒˆ 도메인 νŠΉμ„±μƒ 일일 λ°œν–‰λŸ‰μ΄ 3건 미만으둜 ν˜‘μ†Œν•˜λ―€λ‘œ, 5일 κ°„κ²©μœΌλ‘œ λͺ°μ•„μ„œ κ°±μ‹ ν•˜λŠ” 것이 ν•©λ¦¬μ μž…λ‹ˆλ‹€.)"
72
+
73
+ print("-" * 50)
74
+ print(f"πŸ’‘ [졜적의 GraphRAG μžλ™ν™” μ£ΌκΈ° μ œμ•ˆ]")
75
+ print(f" {recommendation}")
76
+ print("=" * 50 + "\n")
77
+
78
+ # 6. 차트 μ‹œκ°ν™” 및 이미지 파일 μ €μž₯
79
+ if platform.system() == "Darwin":
80
+ plt.rc("font", family="AppleGothic") # Mac ν•œκΈ€ 폰트 깨짐 λ°©μ§€
81
+ plt.rcParams["axes.unicode_minus"] = False
82
+
83
+ plt.figure(figsize=(10, 5))
84
+ bars = plt.bar(
85
+ date_counts["date_only"].astype(str),
86
+ date_counts["count"],
87
+ color="royalblue",
88
+ edgecolor="black",
89
+ alpha=0.85,
90
+ )
91
+
92
+ # λ§‰λŒ€ μœ„μ— 숫자 ν‘œμ‹œ
93
+ for bar in bars:
94
+ height = bar.get_height()
95
+ plt.text(
96
+ bar.get_x() + bar.get_width() / 2.0,
97
+ height + 0.1,
98
+ f"{int(height)}건",
99
+ ha="center",
100
+ va="bottom",
101
+ fontsize=10,
102
+ fontweight="bold",
103
+ )
104
+
105
+ plt.title("μΌμžλ³„ AI ν•€ν…Œν¬ λ‰΄μŠ€ 생산 νŠΈλ Œλ“œ 뢄석", fontsize=15, pad=15, fontweight="bold")
106
+ plt.xlabel("기사 λ°œν–‰ 일자", fontsize=12)
107
+ plt.ylabel("생산 건수", fontsize=12)
108
+ plt.grid(axis="y", linestyle="--", alpha=0.5)
109
+ plt.xticks(rotation=25)
110
+ plt.tight_layout()
111
+
112
+ # artifacts 폴더 μ•„λž˜μ— 뢄석 κ²°κ³Όλ¬Ό 차트 μ €μž₯
113
+ os.makedirs("artifacts", exist_ok=True)
114
+ img_path = "artifacts/daily_trend_analysis.png"
115
+ plt.savefig(img_path, dpi=200)
116
+ print(f"πŸ’Ύ μ‹œκ°ν™” 뢄석 차트 μ €μž₯ μ™„λ£Œ ➑️ [μ ˆλŒ€κ²½λ‘œ]: {os.path.abspath(img_path)}")
117
+
118
+
119
+ if __name__ == "__main__":
120
+ run_analysis()
src/utils/research_notes.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ“Š 졜적의 GraphRAG κ°±μ‹  μ£ΌκΈ° λ„μΆœ λ³΄κ³ μ„œ
2
+ > **Data-Driven Analysis for GraphRAG Synchronization Cycle**
3
+
4
+ λ³Έ λ³΄κ³ μ„œλŠ” μ‹€μ œ 넀이버 λ‰΄μŠ€ IT/κ³Όν•™ 및 경제 μΉ΄ν…Œκ³ λ¦¬μ—μ„œ ν•„ν„°λ§λœ **고유 AI ν•€ν…Œν¬ 기사**λ“€μ˜ λ‚ μ§œλ³„ μœ μž… λΉˆλ„λ₯Ό μ •λŸ‰ λΆ„μ„ν•˜μ—¬, μ‹œμŠ€ν…œ 운영 νš¨μœ¨μ„±κ³Ό μ΅œμ‹  정보 νšλ“ 속도(API λΉ„μš© λŒ€λΉ„ νš¨μš©μ„±)λ₯Ό λͺ¨λ‘ λ§Œμ‘±ν•˜λŠ” 졜적의 GraphRAG μ΅œμ‹ ν™” μ£ΌκΈ°λ₯Ό μˆ˜ν•™μ μœΌλ‘œ λ„μΆœν•œ κ²°κ³Όμž…λ‹ˆλ‹€.
5
+
6
+ ---
7
+
8
+ ## 1. μ •λŸ‰ 데이터 μˆ˜μ§‘ 및 뢄석 ν˜„ν™©
9
+
10
+ * **μˆ˜μ§‘λœ 원본 데이터셋 λͺ©λ‘**:
11
+ 1. `Articles_20260518_223626.xlsx` (34건)
12
+ 2. `Articles_20260519_155940.xlsx` (40건)
13
+ * **고유 기사 총합 (쀑볡 URL 제거)**: **74건**
14
+ * **κ΄€μΈ‘ κΈ°κ°„**: 2일 (2026-05-18 ~ 2026-05-19)
15
+
16
+ ### πŸ“… μΌμžλ³„ 고유 λ‰΄μŠ€ μƒμ‚°λŸ‰ 좔이
17
+ | λ°œν–‰ 일자 | 생산 건수 (고유 기사) | λΉ„κ³  |
18
+ | :--- | :---: | :---: |
19
+ | **2026-05-18** | **34건** | 평일 (μ›”) |
20
+ | **2026-05-19** | **40건** | 평일 (ν™”) |
21
+ | **총합** | **74건** | |
22
+
23
+ ---
24
+
25
+ ## 2. μˆ˜ν•™μ  뢄석 및 κ°±μ‹  μ£ΌκΈ° λ„μΆœ
26
+
27
+ ### πŸ“ˆ 일평균 λ‰΄μŠ€ 생산 속도 (Velocity)
28
+ $$\text{일평균 μƒμ‚°λŸ‰} = \frac{74\text{건}}{2\text{일}} = 37.00\text{건/일}$$
29
+
30
+ * **도메인 폭 μΈ‘μ •**: AI ν•€ν…Œν¬λΌλŠ” 도메인이 맀우 쒁닀고 μƒκ°ν•˜μ…¨μœΌλ‚˜, μ‹€μ œ 넀이버 λ‰΄μŠ€μ˜ IT/κ³Όν•™ 및 경제 λ„λ©”μΈμ—μ„œ μˆ˜μ§‘λ˜λŠ” λ‰΄μŠ€ 쀑 **AI, 인곡지λŠ₯, μƒμ„±ν˜• AI, ν•€ν…Œν¬ ν‚€μ›Œλ“œ 쀑 ν•˜λ‚˜λΌλ„ ν¬ν•¨ν•˜λŠ” κΈ°μ‚¬λŠ” 45.5%**에 μœ‘λ°•ν•©λ‹ˆλ‹€.
31
+ * **즉, κΈ°μ‚¬μ˜ μœ μž… 속도가 맀우 λΉ λ₯΄κ³  μ •λ³΄μ˜ 신선도 ꡐ체 μ£ΌκΈ°κ°€ λŒ€λ‹¨νžˆ μž¦μŠ΅λ‹ˆλ‹€.**
32
+
33
+ ### πŸ’‘ 3~5일 μ£ΌκΈ° vs 맀일 1μ‹œ 주기의 νš¨μœ¨μ„± 비ꡐ
34
+
35
+ | ν•­λͺ© | 3~5일 일괄 κ°±μ‹  | 맀일 μƒˆλ²½ 1μ‹œ κ°±μ‹  (ꢌμž₯) |
36
+ | :--- | :--- | :--- |
37
+ | **데이터 μΆ•μ λŸ‰** | 110 ~ 185건 λˆ„μ  | **평균 35 ~ 40건 λˆ„μ ** |
38
+ | **OpenAI API λΆ€ν•˜** | ν•œ λ²ˆμ— λŒ€λŸ‰μ˜ LLM 토큰을 μ†Œλͺ¨ν•˜μ—¬ **API Rate Limit(λΆ„λ‹Ή μš”μ²­ ν•œλ„)에 κ±Έλ € λΉŒλ“œ μ‹€νŒ¨ν•  ν™•λ₯  λ†’μŒ** | μ†ŒλŸ‰μ˜ 데이터(40건 λ‹¨μœ„)둜 맀일 λ‚˜λˆ„μ–΄ μ²˜λ¦¬ν•˜λ―€λ‘œ **Rate Limit μœ„ν—˜μ΄ μ—†κ³  λΉŒλ“œκ°€ μ§€κ·Ήνžˆ μ•ˆμ •μ μž„** |
39
+ | **μ •λ³΄μ˜ μ‹€μ„Έμ„± (Recency)** | μƒˆλ‘œμš΄ AI 기술/μ„œλΉ„μŠ€ μΆœμ‹œ μ†Œμ‹μ΄ RAG에 λ°˜μ˜λ˜κΈ°κΉŒμ§€ μ΅œλŒ€ 5일의 **정보 μ§€μ—°(Lag)** λ°œμƒ | 맀일 μƒˆλ²½ 1μ‹œ κΈ°μ€€ **μ „λ‚ μ˜ νŠΈλ Œλ“œκ°€ μ¦‰μ‹œ 반영**λ˜μ–΄ λ©΄μ ‘/지원동기 μš©λ„λ‘œμ„œ 신뒰도 졜고쑰 |
40
+ | **μ„œλ²„ λΆ€ν•˜** | 크둀링 λΈŒλΌμš°μ €(Headless Chrome) μž₯μ‹œκ°„ κ΅¬λ™μœΌλ‘œ λ©”λͺ¨λ¦¬ λˆ„μˆ˜ 및 μ—λŸ¬ κ°€λŠ₯μ„± 있음 | 맀일 10λΆ„ λ‚΄μ™Έμ˜ μ§§κ³  μ•ˆμ „ν•œ 배치 νƒœμŠ€ν¬λ‘œ μ’…λ£Œλ˜μ–΄ μ‹œμŠ€ν…œ μ•ˆμ •μ„± 우수 |
41
+
42
+ ---
43
+
44
+ ## 3. μ΅œμ’… ꢌμž₯ 사항 및 μ‹œκ°ν™”
45
+
46
+ > [!IMPORTANT]
47
+ > **ꢌμž₯ κ°±μ‹  μ£ΌκΈ°: 맀일 μƒˆλ²½ 1μ‹œ (1 AM KST) μžλ™ν™” μŠ€μΌ€μ€„λ§**
48
+ >
49
+ > ν‹ˆμƒˆ λ„λ©”μΈμž„μ—λ„ λΆˆκ΅¬ν•˜κ³  맀일 35~40개 μˆ˜μ€€μ˜ μ–‘μ§ˆμ˜ 기사가 μƒμ‚°λ˜κ³  μžˆμŠ΅λ‹ˆλ‹€. 맀일 μƒˆλ²½ 1μ‹œ(ν•œκ΅­ ν‘œμ€€μ‹œ)에 크둀링 νŒŒμ΄ν”„λΌμΈμ„ 돌렀 Neo4j DBλ₯Ό λΉŒλ“œν•˜λŠ” 것이 **API 과금 λ°©μ§€, Rate Limit 우회, 그리고 정보 신선도 κ·ΉλŒ€ν™” μΈ‘λ©΄μ—μ„œ κ°€μž₯ 이상적인 골든 사이클(Golden Cycle)**μž…λ‹ˆλ‹€.
50
+
51
+ ### πŸ“Š 뢄석 μ‹œκ°ν™” 차트
52
+ μ•„λž˜ μ°¨νŠΈλŠ” μ‹€μ œ 뢄석기가 μƒμ„±ν•œ λ‚ μ§œλ³„ μƒμ‚°λŸ‰ μ‹œκ°ν™” λ°μ΄ν„°μž…λ‹ˆλ‹€.
53
+
54
+ ![μΌμžλ³„ AI ν•€ν…Œν¬ λ‰΄μŠ€ 생산 νŠΈλ Œλ“œ](/Users/yuje/FinGraph/brain/d0b440b3-8eb7-4a53-ad37-c17d5f6cbd5e/daily_trend_analysis.png)
55
+
56
+ ---
57
+
58
+ ## 4. 후속 μ•‘μ…˜ ν”Œλžœ (Action Plan)
59
+ 1. **[μ™„λ£Œ]** `AGENTS.md`의 νŒŒμ΄ν”„λΌμΈ ꡬ좕 일정을 **"맀일 μƒˆλ²½ 1μ‹œ μ΅œμ‹ ν™” νŒŒμ΄ν”„λΌμΈ ꡬ좕"**으둜 ν™•μ • κΈ°λ‘ν–ˆμŠ΅λ‹ˆλ‹€.
60
+ 2. **[λŒ€κΈ°]** 이제 μˆ˜μ§‘λœ 40건의 μƒˆ μ—‘μ…€ 데이터λ₯Ό Neo4j 지식 κ·Έλž˜ν”„λ‘œ μ μž¬ν•˜μ—¬ RAG ν’ˆμ§ˆμ„ μ¦‰μ‹œ ν–₯μƒμ‹œν‚΅λ‹ˆλ‹€.
tests/smoke_test_rag.py ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ smoke_test_rag.py β€” GraphRAG 3λŒ€ μ‹œλ‚˜λ¦¬μ˜€ ν˜„μž₯ 검증 슀크립트
3
+ =============================================================
4
+ 지원동기 μž‘μ„± 지원 μ±—λ΄‡μœΌλ‘œμ„œμ˜ μ„œλΉ„μŠ€ λͺ©μ μ„ κ²€μ¦ν•©λ‹ˆλ‹€.
5
+
6
+ μ‹œλ‚˜λ¦¬μ˜€:
7
+ 1. νŠΉμ • κΈ°μ—… - "카카였의 AI μ„œλΉ„μŠ€ νŠΈλ Œλ“œλŠ”?"
8
+ 2. νŠΉμ • 기술 - "LLM κΈ°μˆ μ„ κ°œλ°œν•˜λŠ” 기업듀은?"
9
+ 3. 전체 νŠΈλ Œλ“œ - "금육AI λΆ„μ•Όμ—μ„œ κ°€μž₯ 적극적인 κΈ°μ—… TOP 3와 λŒ€ν‘œ μ„œλΉ„μŠ€"
10
+
11
+ μ‹€ν–‰ 방법:
12
+ python3 tests/smoke_test_rag.py
13
+ """
14
+
15
+ import os
16
+ import sys
17
+ import time
18
+
19
+ import dotenv
20
+
21
+ dotenv.load_dotenv()
22
+
23
+ # ── 0. κ·Έλž˜ν”„ ꡬ성 사전 점검 (Neo4j λ…Έλ“œ/관계 톡계) ─────────────────────────
24
+ def check_graph_structure():
25
+ import neo4j
26
+
27
+ uri = os.getenv("NEO4J_URI", "neo4j://localhost:7687")
28
+ username = os.getenv("NEO4J_CLIENT_ID") or os.getenv("NEO4J_USERNAME") or "neo4j"
29
+ password = os.getenv("NEO4J_CLIENT_SECRET") or os.getenv("NEO4J_PASSWORD") or "password"
30
+ auth = (username, password)
31
+ driver = neo4j.GraphDatabase.driver(uri, auth=auth)
32
+
33
+ print("\n" + "=" * 60)
34
+ print("πŸ“Š [사전 점검] Neo4j κ·Έλž˜ν”„ ꡬ성 ν˜„ν™©")
35
+ print("=" * 60)
36
+
37
+ queries = {
38
+ "Article (기사)": "MATCH (n:Article) RETURN count(n) as cnt",
39
+ "AICompany (κΈ°μ—…)": "MATCH (n:AICompany) RETURN count(n) as cnt",
40
+ "AITechnology (기술)": "MATCH (n:AITechnology) RETURN count(n) as cnt",
41
+ "AIService (μ„œλΉ„μŠ€)": "MATCH (n:AIService) RETURN count(n) as cnt",
42
+ "AIField (λΆ„μ•Ό)": "MATCH (n:AIField) RETURN count(n) as cnt",
43
+ "Content (청크+벑터)": "MATCH (n:Content) RETURN count(n) as cnt",
44
+ "MENTIONS 관계": "MATCH ()-[r:MENTIONS]->() RETURN count(r) as cnt",
45
+ "DEVELOPS 관계": "MATCH ()-[r:DEVELOPS]->() RETURN count(r) as cnt",
46
+ }
47
+
48
+ all_ok = True
49
+ for label, cypher in queries.items():
50
+ with driver.session() as s:
51
+ result = s.run(cypher).single()
52
+ cnt = result["cnt"] if result else 0
53
+ status = "βœ…" if cnt > 0 else "⚠️ λΉ„μ–΄μžˆμŒ"
54
+ if cnt == 0:
55
+ all_ok = False
56
+ print(f" {status} {label}: {cnt}개")
57
+
58
+ driver.close()
59
+ print()
60
+ if not all_ok:
61
+ print("β›” 일뢀 λ…Έλ“œ/관계가 λΉ„μ–΄μžˆμŠ΅λ‹ˆλ‹€. finGraph.py μ‹€ν–‰μœΌλ‘œ κ·Έλž˜ν”„λ₯Ό λ¨Όμ € μ±„μ›Œμ£Όμ„Έμš”.\n")
62
+ sys.exit(1)
63
+ else:
64
+ print("βœ… κ·Έλž˜ν”„ ꡬ성 정상 β€” RAG ν…ŒμŠ€νŠΈλ₯Ό μ‹œμž‘ν•©λ‹ˆλ‹€.\n")
65
+
66
+
67
+ # ── 1. GraphRAG 응닡 ν’ˆμ§ˆ 검증 ───────────────────────────────────────────────
68
+ def run_scenario(label: str, query: str, expected_keywords: list[str]):
69
+ from src.retrieval.finRetrieval import graphrag
70
+
71
+ print("=" * 60)
72
+ print(f"πŸ” μ‹œλ‚˜λ¦¬μ˜€: {label}")
73
+ print(f" 질문: {query}")
74
+ print("=" * 60)
75
+
76
+ start = time.time()
77
+ result = graphrag.search(query_text=query)
78
+ elapsed = time.time() - start
79
+
80
+ answer = result.answer if result and result.answer else ""
81
+
82
+ print(f"\nπŸ“ GraphRAG 응닡 ({elapsed:.1f}초):\n")
83
+ print(answer)
84
+
85
+ # ν’ˆμ§ˆ 검증
86
+ print("\nπŸ”Ž ν’ˆμ§ˆ 체크:")
87
+ all_pass = True
88
+
89
+ # 1) 응닡이 λΉ„μ–΄μžˆμ§€ μ•Šμ€κ°€
90
+ if len(answer.strip()) > 50:
91
+ print(" βœ… 응닡 길이 μΆ©λΆ„ (50자 이상)")
92
+ else:
93
+ print(f" ❌ 응닡이 λ„ˆλ¬΄ 짧음 ({len(answer.strip())}자)")
94
+ all_pass = False
95
+
96
+ # 2) κΈ°λŒ€ ν‚€μ›Œλ“œ 포함 μ—¬λΆ€
97
+ found = [kw for kw in expected_keywords if kw in answer]
98
+ missing = [kw for kw in expected_keywords if kw not in answer]
99
+ if found:
100
+ print(f" βœ… 핡심 ν‚€μ›Œλ“œ 포함: {found}")
101
+ if missing:
102
+ print(f" ⚠️ 미포함 ν‚€μ›Œλ“œ: {missing}")
103
+
104
+ # 3) 좜처/κ·Όκ±° ν‘œκΈ° μ—¬λΆ€
105
+ source_indicators = ["기사", "좜처", "λ‰΄μŠ€", "보도", "λ”°λ₯΄λ©΄", "λ°œν‘œ", "http"]
106
+ has_source = any(ind in answer for ind in source_indicators)
107
+ if has_source:
108
+ print(" βœ… 좜처/κ·Όκ±° ν‘œκΈ° 있음")
109
+ else:
110
+ print(" ⚠️ 좜처/κ·Όκ±° ν‘œκΈ° μ—†μŒ (RAG μ‘λ‹΅μ΄μ§€λ§Œ κ·Όκ±°κ°€ 뢈λͺ…ν™•)")
111
+ all_pass = False
112
+
113
+ overall = "βœ… PASS" if all_pass else "⚠️ PARTIAL (κ°œμ„  μ—¬μ§€ 있음)"
114
+ print(f"\n β†’ μ΅œμ’… νŒμ •: {overall}")
115
+ print()
116
+ return all_pass
117
+
118
+
119
+ # ── 메인 μ‹€ν–‰ ────────────────────────────────────────────────────────────────
120
+ if __name__ == "__main__":
121
+ # 0. κ·Έλž˜ν”„ ꡬ성 사전 점검
122
+ check_graph_structure()
123
+
124
+ results = []
125
+
126
+ # μ‹œλ‚˜λ¦¬μ˜€ 1: νŠΉμ • κΈ°μ—…
127
+ results.append(run_scenario(
128
+ label="β‘  νŠΉμ • κΈ°μ—… β€” 지원동기 자료 쑰사",
129
+ query="μΉ΄μΉ΄μ˜€κ°€ 개발 쀑인 AI μ„œλΉ„μŠ€μ™€ 기술 νŠΈλ Œλ“œλ₯Ό μ•Œλ €μ€˜. 지원동기 μž‘μ„±μ— μ°Έκ³ ν•˜κ³  μ‹Άμ–΄.",
130
+ expected_keywords=["카카였", "AI", "μ„œλΉ„μŠ€"],
131
+ ))
132
+
133
+ # μ‹œλ‚˜λ¦¬μ˜€ 2: νŠΉμ • 기술
134
+ results.append(run_scenario(
135
+ label="β‘‘ νŠΉμ • 기술 β€” LLM 기술 보유 κΈ°μ—… 탐색",
136
+ query="LLM(λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈ) κΈ°μˆ μ„ κ°œλ°œν•˜κ±°λ‚˜ λ„μž…ν•˜κ³  μžˆλŠ” κ΅­λ‚΄ κΈˆμœ΅Β·ν•€ν…Œν¬ 기업듀은 μ–΄λ””μ•Ό?",
137
+ expected_keywords=["LLM", "AI", "κΈ°μ—…"],
138
+ ))
139
+
140
+ # μ‹œλ‚˜λ¦¬μ˜€ 3: 전체 νŠΈλ Œλ“œ (포트폴리였 λŒ€ν‘œ κ³¨λ“œ 쿼리)
141
+ results.append(run_scenario(
142
+ label="β‘’ 전체 νŠΈλ Œλ“œ β€” 금육AI λΆ„μ•Ό TOP 3 κΈ°μ—…",
143
+ query="졜근 μˆ˜μ§‘λœ λ‰΄μŠ€μ—μ„œ 금육AI(AIField) 뢄야에 κ°€μž₯ 적극적으둜 κΈ°μˆ μ„ κ°œλ°œν•˜κ³  μžˆλŠ” κΈ°μ—… TOP 3와 κ·Έ 기업듀이 κ°œλ°œν•œ λŒ€ν‘œ μ„œλΉ„μŠ€λ₯Ό μ•Œλ €μ€˜.",
144
+ expected_keywords=["1.", "κΈ°μ—…", "μ„œλΉ„μŠ€", "AI"],
145
+ ))
146
+
147
+ # μ΅œμ’… μš”μ•½
148
+ print("=" * 60)
149
+ print("πŸ“‹ μ΅œμ’… μš”μ•½")
150
+ print("=" * 60)
151
+ labels = ["β‘  νŠΉμ • κΈ°μ—…", "β‘‘ νŠΉμ • 기술", "β‘’ 전체 νŠΈλ Œλ“œ"]
152
+ for label, passed in zip(labels, results):
153
+ print(f" {'βœ… PASS' if passed else '⚠️ PARTIAL'} | {label}")
154
+ print()
155
+ pass_count = sum(results)
156
+ print(f" 총 {pass_count}/{len(results)}개 μ‹œλ‚˜λ¦¬μ˜€ μ™„μ „ 톡과")