Spaces:

jeongsoo
/

ObsidianStyleGraphViewer

Sleeping

App Files Files Community

jeongsoo commited on Apr 27, 2025

Commit

5ccf0d4

1 Parent(s): 4e34450

Initial commit

Browse files

Files changed (9) hide show

.gitignore +41 -0
Procfile +1 -0
README.md +48 -0
app.py +338 -0
data/child_mind_data.json +168 -0
huggingface-metadata.json +11 -0
main.py +327 -0
requirements.txt +9 -0
templates/index.html +197 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,41 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+venv/
+env/
+ENV/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+# Logs
+*.log
+# Cached data
+*.pickle
+*.pkl
+*.npy
+*.npz

Procfile ADDED Viewed

	@@ -0,0 +1 @@


1	+ web: gunicorn app:app

README.md CHANGED Viewed

@@ -11,3 +11,51 @@ license: mit
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# 한국어 단어 의미 네트워크 시각화
+이 프로젝트는 한국어 단어들 간의 의미적 관계를 3D 공간에서 시각화하는 웹 애플리케이션입니다.
+## 주요 기능
+- BGE-M3 다국어 임베딩 모델을 사용한 단어 임베딩 생성
+- t-SNE 알고리즘을 통한 차원 축소
+- 코사인 유사도 기반 단어 간 관계 분석
+- Plotly 라이브러리를 활용한 인터랙티브 3D 시각화
+- 유사도 임계값 조절 기능
+## 기술 스택
+- **백엔드**: Flask
+- **프론트엔드**: Vue.js, Bootstrap 5
+- **데이터 처리**: SentenceTransformers, scikit-learn, numpy
+- **시각화**: Plotly, NetworkX
+## 사용 방법
+1. 웹 인터페이스에서 유사도 임계값을 슬라이더로 조절합니다.
+2. "그래프 생성" 버튼을 클릭하여 시각화를 업데이트합니다.
+3. 마우스로 3D 그래프를 회전, 확대/축소하며 단어 간 관계를 탐색합니다.
+4. 단어에 마우스를 올리면 해당 단어와 연결된 다른 단어들의 정보를 확인할 수 있습니다.
+## 시각화 해석
+- **위치**: 의미적으로 유사한 단어들은 3D 공간에서 서로 가까이 위치합니다.
+- **엣지(연결선)**: 코사인 유사도가 임계값을 넘는 단어 쌍을 연결합니다.
+- **색상**: Z축 값에 따라 색상이 달라지며, 유사한 색상의 단어들은 Z축 방향으로 유사한 의미를 갖습니다.
+## 로컬에서 실행하기
+```bash
+pip install -r requirements.txt
+python app.py
+```
+## 참고
+이 애플리케이션은 허깅페이스 Spaces에 배포되어 있으며, 웹 브라우저에서 바로 사용할 수 있습니다.
+BGE-M3 임베딩 모델은 다국어 지원에 최적화되어 있어 한국어 단어 간의 의미적 관계를 효과적으로 분석합니다.
+---
+© 2025 한국어 단어 의미 네트워크 시각화 프로젝트

app.py ADDED Viewed

	@@ -0,0 +1,338 @@

+import json
+import os
+from flask import Flask, render_template, request, jsonify, send_from_directory
+from sentence_transformers import SentenceTransformer
+from sklearn.manifold import TSNE
+import matplotlib.pyplot as plt
+import matplotlib.font_manager as fm
+import numpy as np
+import platform
+import networkx as nx
+import plotly.graph_objects as go
+from sklearn.metrics.pairwise import cosine_similarity
+import plotly
+import joblib
+from datetime import datetime
+import time
+app = Flask(__name__)
+# 모델 및 임베딩 캐시를 위한 변수들
+model = None
+model_name = 'BAAI/bge-m3'
+embeddings_cache = {}
+graph_cache = {}
+# --- 한글 폰트 설정 함수 ---
+def set_korean_font():
+    """
+    현재 운영체제에 맞는 한글 폰트를 matplotlib 및 Plotly용으로 설정 시도하고,
+    Plotly에서 사용할 폰트 이름을 반환합니다.
+    """
+    system_name = platform.system()
+    plotly_font_name = None # Plotly에서 사용할 폰트 이름
+    # Matplotlib 폰트 설정
+    if system_name == "Windows":
+        font_name = "Malgun Gothic"
+        plotly_font_name = "Malgun Gothic"
+    elif system_name == "Darwin":  # MacOS
+        font_name = "AppleGothic"
+        plotly_font_name = "AppleGothic"
+    elif system_name == "Linux":
+        # Linux에서 선호하는 한글 폰트 경로 또는 이름 설정
+        font_path = "/usr/share/fonts/truetype/nanum/NanumGothic.ttf"
+        plotly_font_name_linux = "NanumGothic" # Plotly는 폰트 '이름'을 주로 사용
+        if os.path.exists(font_path):
+            font_name = fm.FontProperties(fname=font_path).get_name()
+            plotly_font_name = plotly_font_name_linux
+        else:
+            # 시스템에서 'Nanum' 포함 폰트 찾기 시도
+            try:
+                available_fonts = [f.name for f in fm.fontManager.ttflist]
+                nanum_fonts = [name for name in available_fonts if 'Nanum' in name]
+                if nanum_fonts:
+                    font_name = nanum_fonts[0]
+                    # Plotly에서 사용할 이름도 비슷하게 설정 (정확한 이름은 시스템마다 다를 수 있음)
+                    plotly_font_name = font_name if 'Nanum' in font_name else plotly_font_name_linux
+                else:
+                    # 다른 OS 폰트 시도
+                    if "Malgun Gothic" in available_fonts:
+                        font_name = "Malgun Gothic"
+                        plotly_font_name = "Malgun Gothic"
+                    elif "AppleGothic" in available_fonts:
+                        font_name = "AppleGothic"
+                        plotly_font_name = "AppleGothic"
+                    else:
+                        font_name = None
+            except Exception as e:
+                font_name = None
+            if not font_name:
+                font_name = None
+                plotly_font_name = None # Plotly도 기본값 사용
+    else:  # 기타 OS
+        font_name = None
+        plotly_font_name = None
+    # Matplotlib 폰트 설정 적용
+    if font_name:
+        try:
+            plt.rc('font', family=font_name)
+            plt.rc('axes', unicode_minus=False)
+        except Exception as e:
+            plt.rcdefaults()
+            plt.rc('axes', unicode_minus=False)
+    else:
+        plt.rcdefaults()
+        plt.rc('axes', unicode_minus=False)
+    if not plotly_font_name:
+        plotly_font_name = 'sans-serif' # Plotly 기본값 지정
+    return plotly_font_name # Plotly에서 사용할 폰트 이름 반환
+# --- 데이터 로드 함수 ---
+def load_words_from_json(filepath):
+    """ JSON 파일에서 'word' 필드만 리스트로 로드합니다. """
+    try:
+        with open(filepath, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+        # data가 리스트 형태라고 가정
+        if isinstance(data, list):
+             words = [item.get('word', '') for item in data if item.get('word')]
+             # 빈 문자열 제거
+             words = [word for word in words if word]
+             return words
+        else:
+            print(f"오류: 파일 '{filepath}'의 최상위 형식이 리스트가 아닙니다.")
+            return None
+    except FileNotFoundError:
+        print(f"오류: 파일 '{filepath}'를 찾을 수 없습니다.")
+        return None
+    except json.JSONDecodeError:
+        print(f"오류: 파일 '{filepath}'의 JSON 형식이 잘못되었습니다.")
+        return None
+    except Exception as e:
+        print(f"데이터 로딩 중 오류 발생: {e}")
+        return None
+def generate_graph(data_file_path='child_mind_data.json', similarity_threshold=0.7):
+    """그래프 생성 함수"""
+    global model, model_name, embeddings_cache, graph_cache
+    # 그래프 캐시 확인
+    cache_key = f"{data_file_path}_{similarity_threshold}"
+    if cache_key in graph_cache:
+        return graph_cache[cache_key]
+    # 한글 폰트 설정
+    plotly_font = set_korean_font()
+    # 데이터 로드
+    word_list = load_words_from_json(data_file_path)
+    if not word_list:
+        return {"error": "데이터를 로드할 수 없습니다."}
+    # 중복 제거
+    word_list = sorted(list(set(word_list)))
+    # 임베딩 모델 초기화 (최초 1회)
+    if model is None:
+        try:
+            model = SentenceTransformer(model_name)
+        except Exception as e:
+            return {"error": f"모델 로딩 실패: {e}"}
+    # 임베딩 생성 (캐시 활용)
+    if data_file_path in embeddings_cache:
+        embeddings = embeddings_cache[data_file_path]
+    else:
+        try:
+            embeddings = model.encode(word_list, show_progress_bar=True, normalize_embeddings=True)
+            # 임베딩 캐시 저장
+            embeddings_cache[data_file_path] = embeddings
+        except Exception as e:
+            return {"error": f"임베딩 생성 실패: {e}"}
+    # 3D 좌표 생성 - t-SNE 사용
+    effective_perplexity = min(30, len(word_list) - 1)
+    if effective_perplexity <= 0:
+        effective_perplexity = 5  # 매우 작은 데이터셋 대비
+    try:
+        tsne = TSNE(n_components=3, random_state=42, perplexity=effective_perplexity, max_iter=1000, init='pca', learning_rate='auto')
+        embeddings_3d = tsne.fit_transform(embeddings)
+    except Exception as e:
+        return {"error": f"t-SNE 차원 축소 실패: {e}"}
+    # 유사도 계산 및 엣지 정의
+    try:
+        similarity_matrix = cosine_similarity(embeddings)
+    except Exception as e:
+        return {"error": f"유사도 계산 실패: {e}"}
+    edges = []
+    edge_weights = []
+    for i in range(len(word_list)):
+        for j in range(i + 1, len(word_list)):
+            similarity = similarity_matrix[i, j]
+            if similarity > similarity_threshold:
+                edges.append((word_list[i], word_list[j]))
+                edge_weights.append(similarity)
+    # NetworkX 그래프 생성
+    G = nx.Graph()
+    for i, word in enumerate(word_list):
+        G.add_node(word, pos=(embeddings_3d[i, 0], embeddings_3d[i, 1], embeddings_3d[i, 2]))
+    # 엣지와 가중치 추가
+    for edge, weight in zip(edges, edge_weights):
+        G.add_edge(edge[0], edge[1], weight=weight)
+    # Plotly 그래프 생성
+    # 엣지 좌표 추출
+    edge_x = []
+    edge_y = []
+    edge_z = []
+    if edges:
+        for edge in G.edges():
+            x0, y0, z0 = G.nodes[edge[0]]['pos']
+            x1, y1, z1 = G.nodes[edge[1]]['pos']
+            edge_x.extend([x0, x1, None])
+            edge_y.extend([y0, y1, None])
+            edge_z.extend([z0, z1, None])
+        # 엣지용 Scatter3d 트레이스 생성
+        edge_trace = go.Scatter3d(
+            x=edge_x, y=edge_y, z=edge_z,
+            mode='lines',
+            line=dict(width=1, color='#888'),
+            hoverinfo='none'
+        )
+    else:
+        edge_trace = go.Scatter3d(x=[], y=[], z=[], mode='lines')
+    # 노드 위치와 텍스트 추출
+    node_x = [G.nodes[node]['pos'][0] for node in G.nodes()]
+    node_y = [G.nodes[node]['pos'][1] for node in G.nodes()]
+    node_z = [G.nodes[node]['pos'][2] for node in G.nodes()]
+    node_text = list(G.nodes())
+    node_adjacencies = []
+    node_hover_text = []
+    for node, adjacencies in enumerate(G.adjacency()):
+        num_connections = len(adjacencies[1])
+        node_adjacencies.append(num_connections)
+        node_hover_text.append(f'{node_text[node]}<br>연결: {num_connections}개')
+    # 노드용 Scatter3d 트레이스 생성
+    node_trace = go.Scatter3d(
+        x=node_x, y=node_y, z=node_z,
+        mode='markers+text',
+        text=node_text,
+        hovertext=node_hover_text,
+        hoverinfo='text',
+        textposition='top center',
+        textfont=dict(
+            size=10,
+            color='black',
+            family=plotly_font
+        ),
+        marker=dict(
+            size=6,
+            color=node_z,
+            colorscale='Viridis',
+            opacity=0.9,
+            colorbar=dict(thickness=15, title='Node Depth (Z-axis)', xanchor='left', title_side='right')
+        )
+    )
+    # 레이아웃 설정
+    layout = go.Layout(
+        title=dict(
+            text=f'어휘 의미 유사성 기반 3D 그래프 (BGE-M3, Threshold: {similarity_threshold})',
+            font=dict(size=16, family=plotly_font)
+        ),
+        showlegend=False,
+        hovermode='closest',
+        margin=dict(b=20, l=5, r=5, t=40),
+        scene=dict(
+            xaxis=dict(title='TSNE Dimension 1', showticklabels=False, backgroundcolor="rgb(230, 230,230)", gridcolor="white", zerolinecolor="white"),
+            yaxis=dict(title='TSNE Dimension 2', showticklabels=False, backgroundcolor="rgb(230, 230,230)", gridcolor="white", zerolinecolor="white"),
+            zaxis=dict(title='TSNE Dimension 3', showticklabels=False, backgroundcolor="rgb(230, 230,230)", gridcolor="white", zerolinecolor="white"),
+            aspectratio=dict(x=1, y=1, z=0.8)
+        )
+    )
+    # Figure 생성
+    fig = go.Figure(data=[edge_trace, node_trace], layout=layout)
+    # Plotly JSON 변환
+    graph_json = plotly.io.to_json(fig)
+    # 결과 캐시 저장
+    graph_cache[cache_key] = graph_json
+    return graph_json
+# 메인 페이지
+@app.route('/')
+def index():
+    return render_template('index.html')
+# 그래프 생성 API
+@app.route('/generate-graph', methods=['POST'])
+def create_graph():
+    try:
+        data = request.json
+        threshold = float(data.get('threshold', 0.7))
+        use_default_data = data.get('use_default_data', True)
+        # 사용자가 업로드한 데이터 또는 기본 데이터 사용
+        if use_default_data:
+            data_file = 'child_mind_data.json'
+        else:
+            # 여기서는 예시로 default만 처리합니다
+            # 실제로는 업로드된 파일을 처리하는 코드가 필요합니다
+            data_file = 'child_mind_data.json'
+        graph_json = generate_graph(data_file, threshold)
+        return graph_json
+    except Exception as e:
+        return jsonify({'error': str(e)}), 500
+# 기본 데이터 정보 제공 API
+@app.route('/data-info')
+def get_data_info():
+    try:
+        data_file = 'child_mind_data.json'
+        words = load_words_from_json(data_file)
+        if words:
+            return jsonify({
+                'wordCount': len(words),
+                'sampleWords': words[:10] if len(words) > 10 else words
+            })
+        else:
+            return jsonify({'error': '데이터를 로드할 수 없습니다.'}), 400
+    except Exception as e:
+        return jsonify({'error': str(e)}), 500
+if __name__ == '__main__':
+    # 허깅페이스 스페이스에서는 app.run() 대신 app 변수를 노출합니다
+    # 로컬 개발용
+    # app.run(debug=True, host='0.0.0.0', port=7860)
+    pass

data/child_mind_data.json ADDED Viewed

	@@ -0,0 +1,168 @@

+[
+  {"word": "학교"},
+  {"word": "선생님"},
+  {"word": "친구"},
+  {"word": "숙제"},
+  {"word": "책"},
+  {"word": "공부"},
+  {"word": "운동장"},
+  {"word": "놀이"},
+  {"word": "점심"},
+  {"word": "쉬는시간"},
+  {"word": "집"},
+  {"word": "가족"},
+  {"word": "엄마"},
+  {"word": "아빠"},
+  {"word": "형"},
+  {"word": "누나"},
+  {"word": "동생"},
+  {"word": "할머니"},
+  {"word": "할아버지"},
+  {"word": "밥"},
+  {"word": "빵"},
+  {"word": "우유"},
+  {"word": "과자"},
+  {"word": "장난감"},
+  {"word": "인형"},
+  {"word": "게임"},
+  {"word": "축구"},
+  {"word": "농구"},
+  {"word": "수영"},
+  {"word": "피아노"},
+  {"word": "기타"},
+  {"word": "강아지"},
+  {"word": "고양이"},
+  {"word": "새"},
+  {"word": "물고기"},
+  {"word": "해"},
+  {"word": "비"},
+  {"word": "바람"},
+  {"word": "눈"},
+  {"word": "봄"},
+  {"word": "여름"},
+  {"word": "가을"},
+  {"word": "겨울"},
+  {"word": "크다"},
+  {"word": "작다"},
+  {"word": "많다"},
+  {"word": "적다"},
+  {"word": "빠르다"},
+  {"word": "느리다"},
+  {"word": "재미있다"},
+  {"word": "어렵다"},
+  {"word": "쉽다"},
+  {"word": "예쁘다"},
+  {"word": "멋있다"},
+  {"word": "좋다"},
+  {"word": "나쁘다"},
+  {"word": "슬프다"},
+  {"word": "기쁘다"},
+  {"word": "뜨겁다"},
+  {"word": "차갑다"},
+  {"word": "무겁다"},
+  {"word": "가볍다"},
+  {"word": "가다"},
+  {"word": "오다"},
+  {"word": "먹다"},
+  {"word": "마시다"},
+  {"word": "자다"},
+  {"word": "일어나다"},
+  {"word": "놀다"},
+  {"word": "공부하다"},
+  {"word": "읽다"},
+  {"word": "쓰다"},
+  {"word": "그리다"},
+  {"word": "만들다"},
+  {"word": "보다"},
+  {"word": "듣다"},
+  {"word": "말하다"},
+  {"word": "웃다"},
+  {"word": "울다"},
+  {"word": "뛰다"},
+  {"word": "던지다"},
+  {"word": "잡다"},
+  {"word": "생각하다"},
+  {"word": "알다"},
+  {"word": "모르다"},
+  {"word": "좋아하다"},
+  {"word": "싫어하다"},
+  {"word": "국어"},
+  {"word": "수학"},
+  {"word": "영어"},
+  {"word": "과학"},
+  {"word": "사회"},
+  {"word": "음악"},
+  {"word": "미술"},
+  {"word": "체육"},
+  {"word": "숫자"},
+  {"word": "더하기"},
+  {"word": "빼기"},
+  {"word": "곱하기"},
+  {"word": "나누기"},
+  {"word": "분수"},
+  {"word": "도형"},
+  {"word": "색깔"},
+  {"word": "역사"},
+  {"word": "나라"},
+  {"word": "식물"},
+  {"word": "동물"},
+  {"word": "소리"},
+  {"word": "발표"},
+  {"word": "토론"},
+  {"word": "실험"},
+  {"word": "시험"},
+  {"word": "평가"},
+  {"word": "행복"},
+  {"word": "슬픔"},
+  {"word": "분노"},
+  {"word": "두려움"},
+  {"word": "놀람"},
+  {"word": "즐거움"},
+  {"word": "짜증"},
+  {"word": "질투"},
+  {"word": "시간"},
+  {"word": "공간"},
+  {"word": "규칙"},
+  {"word": "약속"},
+  {"word": "노력"},
+  {"word": "결과"},
+  {"word": "이유"},
+  {"word": "방법"},
+  {"word": "중요하다"},
+  {"word": "권리"},
+  {"word": "의무"},
+  {"word": "텔레비전"},
+  {"word": "컴퓨터"},
+  {"word": "휴대폰"},
+  {"word": "돈"},
+  {"word": "용돈"},
+  {"word": "선물"},
+  {"word": "옷"},
+  {"word": "신발"},
+  {"word": "부드럽다"},
+  {"word": "귀엽다"},
+  {"word": "멋지다"},
+  {"word": "조용하다"},
+  {"word": "크게"},
+  {"word": "작게"},
+  {"word": "빨리"},
+  {"word": "천천히"},
+  {"word": "자주"},
+  {"word": "가끔"},
+  {"word": "항상"},
+  {"word": "보통"},
+  {"word": "갑자기"},
+  {"word": "그리고"},
+  {"word": "그러나"},
+  {"word": "그래서"},
+  {"word": "왜냐하면"},
+  {"word": "~은"},
+  {"word": "~는"},
+  {"word": "~이"},
+  {"word": "~가"},
+  {"word": "~을"},
+  {"word": "~를"},
+  {"word": "~에게"},
+  {"word": "~와"},
+  {"word": "~과"}
+]

huggingface-metadata.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+    "title": "한국어 단어 의미 네트워크 시각화",
+    "emoji": "🌐",
+    "colorFrom": "indigo",
+    "colorTo": "purple",
+    "sdk": "docker",
+    "app_port": 7860,
+    "app_file": "app.py",
+    "pinned": false,
+    "license": "mit"
+}

main.py ADDED Viewed

	@@ -0,0 +1,327 @@

+import json
+from sentence_transformers import SentenceTransformer
+from sklearn.manifold import TSNE
+import matplotlib.pyplot as plt # matplotlib은 폰트 설정 로직에 필요
+import matplotlib.font_manager as fm
+import numpy as np
+import platform
+import os
+import networkx as nx # 그래프 구조 생성
+import plotly.graph_objects as go # 3D 시각화
+from sklearn.metrics.pairwise import cosine_similarity # 유사도 계산
+# --- 한글 폰트 설정 함수 ---
+def set_korean_font():
+    """
+    현재 운영체제에 맞는 한글 폰트를 matplotlib 및 Plotly용으로 설정 시도하고,
+    Plotly에서 사용할 폰트 이름을 반환합니다.
+    """
+    system_name = platform.system()
+    plotly_font_name = None # Plotly에서 사용할 폰트 이름
+    # Matplotlib 폰트 설정
+    if system_name == "Windows":
+        font_name = "Malgun Gothic"
+        plotly_font_name = "Malgun Gothic"
+    elif system_name == "Darwin":  # MacOS
+        font_name = "AppleGothic"
+        plotly_font_name = "AppleGothic"
+    elif system_name == "Linux":
+        # Linux에서 선호하는 한글 폰트 경로 또는 이름 설정
+        font_path = "/usr/share/fonts/truetype/nanum/NanumGothic.ttf"
+        plotly_font_name_linux = "NanumGothic" # Plotly는 폰트 '이름'을 주로 사용
+        if os.path.exists(font_path):
+            font_name = fm.FontProperties(fname=font_path).get_name()
+            plotly_font_name = plotly_font_name_linux
+            print(f"Using font: {font_name} from {font_path}")
+        else:
+            # 시스템에서 'Nanum' 포함 폰트 찾기 시도
+            try:
+                available_fonts = [f.name for f in fm.fontManager.ttflist]
+                nanum_fonts = [name for name in available_fonts if 'Nanum' in name]
+                if nanum_fonts:
+                    font_name = nanum_fonts[0]
+                    # Plotly에서 사용할 이름도 비슷하게 설정 (정확한 이름은 시스템마다 다를 수 있음)
+                    plotly_font_name = font_name if 'Nanum' in font_name else plotly_font_name_linux
+                    print(f"Found and using system font: {font_name}")
+                else:
+                    # 다른 OS 폰트 시도
+                    if "Malgun Gothic" in available_fonts:
+                        font_name = "Malgun Gothic"
+                        plotly_font_name = "Malgun Gothic"
+                    elif "AppleGothic" in available_fonts:
+                        font_name = "AppleGothic"
+                        plotly_font_name = "AppleGothic"
+                    else:
+                        font_name = None
+                    if font_name: print(f"Trying fallback font: {font_name}")
+            except Exception as e:
+                print(f"Error finding Linux font: {e}")
+                font_name = None
+            if not font_name:
+                print("Warning: Linux 한글 폰트를 자동으로 찾지 못했습니다. Matplotlib 기본 폰트를 사용합니다.")
+                font_name = None
+                plotly_font_name = None # Plotly도 기본값 사용
+    else:  # 기타 OS
+        font_name = None
+        plotly_font_name = None
+    # Matplotlib 폰트 설정 적용
+    if font_name:
+        try:
+            plt.rc('font', family=font_name)
+            plt.rc('axes', unicode_minus=False)
+            print(f"Matplotlib font set to: {font_name}")
+        except Exception as e:
+            print(f"Error setting Matplotlib font '{font_name}': {e}. Using default.")
+            plt.rcdefaults()
+            plt.rc('axes', unicode_minus=False)
+            # Plotly 폰트 이름도 기본값으로 되돌릴 수 있음 (선택적)
+            # plotly_font_name = None
+    else:
+        print("Matplotlib Korean font not set. Using default font.")
+        plt.rcdefaults()
+        plt.rc('axes', unicode_minus=False)
+    if not plotly_font_name:
+        print("Plotly font name not explicitly found, will use Plotly default (sans-serif).")
+        plotly_font_name = 'sans-serif' # Plotly 기본값 지정
+    print(f"Plotly will try to use font: {plotly_font_name}")
+    return plotly_font_name # Plotly에서 사용할 폰트 이름 반환
+# --- 데이터 로드 함수 ---
+def load_titles_from_json(filepath):
+    """ JSON 파일에서 'title'만 리스트로 로드합니다. """
+    try:
+        with open(filepath, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+        # data가 리스트 형태라고 가정
+        if isinstance(data, list):
+             titles = [item.get('word', '') for item in data if item.get('word')]
+             # 빈 문자열 제거
+             titles = [title for title in titles if title]
+             return titles
+        else:
+            print(f"오류: 파일 '{filepath}'의 최상위 형식이 리스트가 아닙니다.")
+            return None
+    except FileNotFoundError:
+        print(f"오류: 파일 '{filepath}'를 찾을 수 없습니다.")
+        return None
+    except json.JSONDecodeError:
+        print(f"오류: 파일 '{filepath}'의 JSON 형식이 잘못되었습니다.")
+        return None
+    except Exception as e:
+        print(f"데이터 로딩 중 오류 발생: {e}")
+        return None
+# --- 메인 실행 부분 ---
+if __name__ == "__main__":
+    # 한글 폰트 설정 (matplotlib용, Plotly용 이름도 받아옴)
+    plotly_font = set_korean_font()
+    # --- 설정값 ---
+    data_file_path = 'child_mind_data.json'  # 입력 데이터 파일 경로
+    embedding_model_name = 'BAAI/bge-m3'  # 수정: BGE-M3 모델로 변경
+    similarity_threshold = 0.7  # 엣지를 생성할 코사인 유사도 임계값 (0.0 ~ 1.0)
+    tsne_perplexity = 30  # t-SNE perplexity (데이터 수보다 작아야 함)
+    tsne_max_iter = 1000  # t-SNE 반복 횟수
+    # ---
+    # 1. 데이터 로드 (어휘 제목 리스트)
+    print(f"데이터 로딩 시도: {data_file_path}")
+    word_list = load_titles_from_json(data_file_path)
+    if not word_list:
+        print("시각화할 어휘 데이터가 없습니다. 프로그램을 종료합니다.")
+        exit() # 데이터 없으면 종료
+    else:
+        print(f"총 {len(word_list)}개의 유효한 어휘를 로드했습니다.")
+        # 중복 제거 (선택적)
+        original_count = len(word_list)
+        word_list = sorted(list(set(word_list)))
+        if len(word_list) < original_count:
+            print(f"중복 제거 후 {len(word_list)}개의 고유한 어휘가 남았습니다.")
+        # 2. 임베딩 모델 로드
+        print(f"임베딩 모델 로딩 중: {embedding_model_name} ...")
+        try:
+            model = SentenceTransformer(embedding_model_name)
+        except Exception as e:
+            print(f"오류: 임베딩 모델 '{embedding_model_name}' 로딩에 실패했습니다. {e}")
+            print("인터넷 연결 및 모델 이름을 확인하세요.")
+            exit()
+        print("모델 로딩 완료.")
+        # 3. 임베딩 생성
+        print("어휘 임베딩 생성 중...")
+        try:
+            # BGE 모델 특화 파라미터 추가
+            embeddings = model.encode(word_list, show_progress_bar=True, normalize_embeddings=True)
+        except Exception as e:
+            print(f"오류: 임베딩 생성 중 문제가 발생했습니다. {e}")
+            exit()
+        print(f"임베딩 생성 완료. 각 어휘는 {embeddings.shape[1]}차원 벡터로 변환되었습니다.")
+        # 4. 3D 좌표 생성 - t-SNE 사용
+        print("3차원 좌표 생성 중 (t-SNE)...")
+        # perplexity 값 조정 (데이터 수보다 작아야 함)
+        effective_perplexity = min(tsne_perplexity, len(word_list) - 1)
+        if effective_perplexity <= 0:
+             print(f"Warning: 데이터 수가 너무 적어 ({len(word_list)}개) perplexity를 5로 강제 설정합니다.")
+             effective_perplexity = 5 # 매우 작은 데이터셋 대비
+        try:
+            tsne = TSNE(n_components=3, random_state=42, perplexity=effective_perplexity, max_iter=tsne_max_iter, init='pca', learning_rate='auto')
+            embeddings_3d = tsne.fit_transform(embeddings)
+        except Exception as e:
+            print(f"오류: t-SNE 차원 축소 중 문제가 발생했습니다. {e}")
+            exit()
+        print("3차원 좌표 생성 완료.")
+        # 5. 유사도 계산 및 엣지 정의
+        print("어휘 간 유사도 계산 및 엣지 정의 중...")
+        try:
+            similarity_matrix = cosine_similarity(embeddings)
+        except Exception as e:
+            print(f"오류: 코사인 유사도 계산 중 문제가 발생했습니다. {e}")
+            exit()
+        edges = []
+        edge_weights = [] # 엣지 두께 등에 활용할 가중치
+        for i in range(len(word_list)):
+            for j in range(i + 1, len(word_list)): # 중복 및 자기 자신 연결 방지
+                similarity = similarity_matrix[i, j]
+                if similarity > similarity_threshold:
+                    edges.append((word_list[i], word_list[j]))
+                    edge_weights.append(similarity) # 유사도 값을 가중치로 사용
+        print(f"유사도 임계값 ({similarity_threshold}) 초과 엣지 {len(edges)}개 정의 완료.")
+        if not edges:
+            print("Warning: 정의된 엣지가 없습니다. 유사도 임계값이 너무 높거나 데이터 간 유사성이 낮을 수 있습니다.")
+            # 엣지가 없어도 노드만 표시하도록 계속 진행
+        # 6. NetworkX 그래프 생성
+        print("NetworkX 그래프 객체 생성 중...")
+        G = nx.Graph()
+        for i, word in enumerate(word_list):
+            # 노드 속성으로 3D 좌표 저장
+            G.add_node(word, pos=(embeddings_3d[i, 0], embeddings_3d[i, 1], embeddings_3d[i, 2]))
+        # 엣지와 가중치 추가
+        for edge, weight in zip(edges, edge_weights):
+            G.add_edge(edge[0], edge[1], weight=weight)
+        print("NetworkX 그래프 생성 완료.")
+        # --- Plotly를 사용한 3D 시각화 ---
+        print("Plotly 3D 그래프 생성 중...")
+        # 엣지 좌표 추출
+        edge_x = []
+        edge_y = []
+        edge_z = []
+        if edges: # 엣지가 있을 경우에만 처리
+            for edge in G.edges():
+                x0, y0, z0 = G.nodes[edge[0]]['pos']
+                x1, y1, z1 = G.nodes[edge[1]]['pos']
+                edge_x.extend([x0, x1, None]) # None을 넣어 선을 분리
+                edge_y.extend([y0, y1, None])
+                edge_z.extend([z0, z1, None])
+            # 엣지용 Scatter3d 트레이스 생성
+            edge_trace = go.Scatter3d(
+                x=edge_x, y=edge_y, z=edge_z,
+                mode='lines',
+                line=dict(width=1, color='#888'), # 엣지 색상 및 두께
+                hoverinfo='none' # 엣지에는 호버 정보 없음
+            )
+        else:
+            edge_trace = go.Scatter3d(x=[], y=[], z=[], mode='lines') # 엣지 없으면 빈 트레이스
+        # 노드 위치와 텍스트 추출
+        node_x = [G.nodes[node]['pos'][0] for node in G.nodes()]
+        node_y = [G.nodes[node]['pos'][1] for node in G.nodes()]
+        node_z = [G.nodes[node]['pos'][2] for node in G.nodes()]
+        node_text = list(G.nodes()) # 노드 이름 (어휘)
+        node_adjacencies = [] # 연결된 엣지 수 (마커 크기 등에 활용 가능)
+        node_hover_text = [] # 노드 호버 텍스트
+        for node, adjacencies in enumerate(G.adjacency()):
+             num_connections = len(adjacencies[1])
+             node_adjacencies.append(num_connections)
+             node_hover_text.append(f'{node_text[node]}<br>Connections: {num_connections}')
+        # 노드용 Scatter3d 트레이스 생성
+        node_trace = go.Scatter3d(
+            x=node_x, y=node_y, z=node_z,
+            mode='markers+text',
+            text=node_text,
+            hovertext=node_hover_text,
+            hoverinfo='text',
+            textposition='top center',
+            textfont=dict(
+                size=10,
+                color='black',
+                family=plotly_font
+            ),
+            marker=dict(
+                size=6,
+                color=node_z,
+                colorscale='Viridis',
+                opacity=0.9,
+                colorbar=dict(thickness=15, title='Node Depth (Z-axis)', xanchor='left', title_side='right')
+                # titleside → title_side
+            )
+        )
+        # 레이아웃 설정
+        layout = go.Layout(
+            title=dict(
+                text=f'어휘 의미 유사성 기반 3D 그래프 (BGE-M3, Threshold: {similarity_threshold})',
+                font=dict(size=16, family=plotly_font)
+            ),
+            showlegend=False,
+            hovermode='closest', # 가장 가까운 데이터 포인트 정보 표시
+            margin=dict(b=20, l=5, r=5, t=40), # 여백
+            scene=dict( # 3D 씬 설정
+                xaxis=dict(title='TSNE Dimension 1', showticklabels=False, backgroundcolor="rgb(230, 230,230)", gridcolor="white", zerolinecolor="white"),
+                yaxis=dict(title='TSNE Dimension 2', showticklabels=False, backgroundcolor="rgb(230, 230,230)", gridcolor="white", zerolinecolor="white"),
+                zaxis=dict(title='TSNE Dimension 3', showticklabels=False, backgroundcolor="rgb(230, 230,230)", gridcolor="white", zerolinecolor="white"),
+                aspectratio=dict(x=1, y=1, z=0.8) # 축 비율 조정
+            ),
+            # 주석 추가 (옵션)
+            # annotations=[
+            #     dict(
+            #         showarrow=False,
+            #         text=f"Data: {data_file_path}<br>Model: {embedding_model_name}",
+            #         xref="paper", yref="paper",
+            #         x=0.005, y=0.005
+            #     )
+            # ]
+        )
+        # Figure 생성 및 표시
+        fig = go.Figure(data=[edge_trace, node_trace], layout=layout)
+        print("*"*20)
+        print(" 인터랙티브 3D 그래프를 표시합니다. ")
+        print(" - 마우스 휠: 줌 인/아웃")
+        print(" - 마우스 드래그: 회전")
+        print(" - 노드 위에 마우스 올리기: 어휘 이름 및 연결 수 확인")
+        print("*"*20)
+        # HTML 파일로 저장 (선택적)
+        # fig.write_html("3d_graph_visualization.html")
+        # print("그래프를 '3d_graph_visualization.html' 파일로 저장했습니다.")
+        fig.show() # 웹 브라우저 또는 IDE 출력 창에 표시
+        print("그래프 표시 완료.")

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+flask==2.2.3
+sentence-transformers==2.2.2
+scikit-learn==1.2.2
+numpy==1.24.3
+matplotlib==3.7.1
+networkx==3.1
+plotly==5.14.1
+joblib==1.2.0
+gunicorn==20.1.0

templates/index.html ADDED Viewed

	@@ -0,0 +1,197 @@

+<!DOCTYPE html>
+<html lang="ko">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>한국어 단어 의미 네트워크 시각화</title>
+    <!-- Plotly.js -->
+    <script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
+    <!-- Bootstrap CSS -->
+    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css" rel="stylesheet">
+    <style>
+        body {
+            font-family: 'Malgun Gothic', 'Apple Gothic', 'NanumGothic', sans-serif;
+            padding: 20px;
+            background-color: #f8f9fa;
+        }
+        #graphContainer {
+            height: 80vh;
+            width: 100%;
+            border-radius: 8px;
+            background-color: #fff;
+            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
+            margin-bottom: 20px;
+        }
+        .controls-container {
+            background-color: #fff;
+            padding: 20px;
+            border-radius: 8px;
+            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
+            margin-bottom: 20px;
+        }
+        .spinner {
+            position: absolute;
+            top: 50%;
+            left: 50%;
+            transform: translate(-50%, -50%);
+            z-index: 1000;
+        }
+        .info-section {
+            margin-top: 20px;
+            background-color: #fff;
+            padding: 20px;
+            border-radius: 8px;
+            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
+        }
+        h1, h3 {
+            color: #343a40;
+        }
+        .btn-primary {
+            background-color: #5a67d8;
+            border-color: #5a67d8;
+        }
+        .btn-primary:hover {
+            background-color: #4c51bf;
+            border-color: #4c51bf;
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <div class="row mb-4">
+            <div class="col-12">
+                <h1 class="text-center my-4">한국어 단어 의미 네트워크 시각화</h1>
+                <div class="alert alert-info" role="alert">
+                    이 도구는 한국어 단어들 간의 의미적 관계를 3D 공간에서 시각화합니다. BGE-M3 다국어 임베딩 모델을 사용하여 단어 간 의미적 유사성을 분석합니다.
+                </div>
+            </div>
+        </div>
+        <div class="row">
+            <div class="col-md-3">
+                <div class="controls-container">
+                    <h3>설정</h3>
+                    <div class="mb-3">
+                        <label for="thresholdSlider" class="form-label">유사도 임계값 ({{ threshold }})</label>
+                        <input type="range" class="form-range" id="thresholdSlider" min="0.1" max="0.9" step="0.05" v-model="threshold">
+                        <div class="form-text">높은 값 = 더 엄격한 연결 기준 (적은 엣지)</div>
+                    </div>
+                    <div class="d-grid gap-2">
+                        <button class="btn btn-primary" @click="generateGraph" :disabled="isLoading">
+                            그래프 생성
+                        </button>
+                    </div>
+                    <div class="info-section">
+                        <h3>데이터 정보</h3>
+                        <div v-if="dataInfo">
+                            <p><strong>단어 수:</strong> {{ dataInfo.wordCount }}</p>
+                            <p><strong>샘플 단어:</strong></p>
+                            <div class="text-muted">{{ dataInfo.sampleWords.join(', ') }}</div>
+                        </div>
+                        <div v-else>
+                            <p class="text-muted">데이터 정보를 로드 중입니다...</p>
+                        </div>
+                    </div>
+                </div>
+            </div>
+            <div class="col-md-9">
+                <div id="graphContainer" style="position: relative;">
+                    <div class="spinner" v-if="isLoading">
+                        <div class="spinner-border text-primary" role="status">
+                            <span class="visually-hidden">로딩 중...</span>
+                        </div>
+                    </div>
+                </div>
+                <div class="alert alert-secondary">
+                    <h5>조작 방법:</h5>
+                    <ul>
+                        <li>마우스 휠: 확대/축소</li>
+                        <li>마우스 드래그: 회전</li>
+                        <li>마우스 오른쪽 버튼 드래그: 이동</li>
+                        <li>단어에 마우스 오버: 상세 정보 확인</li>
+                    </ul>
+                </div>
+            </div>
+        </div>
+        <div class="row mt-4">
+            <div class="col-12">
+                <div class="card">
+                    <div class="card-body">
+                        <h3 class="card-title">이 시각화에 대해</h3>
+                        <p class="card-text">
+                            이 도구는 다음과 같은 기술을 사용하여 한국어 단어 네트워크를 시각화합니다:
+                        </p>
+                        <ul>
+                            <li><strong>BAAI/bge-m3 임베딩:</strong> 다양한 언어에 최적화된 최신 임베딩 모델로, 한국어에서도 우수한 성능을 보입니다.</li>
+                            <li><strong>t-SNE 차원 축소:</strong> 복잡한 고차원 벡터를 3D 공간에 투영하여 의미적 관계를 시각화합니다.</li>
+                            <li><strong>코사인 유사도:</strong> 단어 벡터 간 각도를 기반으로 의미적 유사성을 측정합니다.</li>
+                            <li><strong>Plotly 시각화:</strong> 인터랙티브한 3D 시각화를 제공합니다.</li>
+                        </ul>
+                        <p class="card-text">
+                            각 단어는 3D 공간의 점으로 표시되며, 유사도가 높은 단어들은 연결선(엣지)으로 연결됩니다. 색상은 z축 값에 따라 다르게 표시됩니다.
+                        </p>
+                    </div>
+                </div>
+            </div>
+        </div>
+    </div>
+    <!-- Vue.js -->
+    <script src="https://cdn.jsdelivr.net/npm/vue@3.2.36/dist/vue.global.prod.js"></script>
+    <!-- Axios -->
+    <script src="https://cdn.jsdelivr.net/npm/axios/dist/axios.min.js"></script>
+    <script>
+        const app = Vue.createApp({
+            data() {
+                return {
+                    threshold: 0.7,
+                    isLoading: false,
+                    dataInfo: null,
+                    errorMessage: ''
+                }
+            },
+            mounted() {
+                this.loadDataInfo();
+                this.generateGraph();
+            },
+            methods: {
+                loadDataInfo() {
+                    axios.get('/data-info')
+                        .then(response => {
+                            this.dataInfo = response.data;
+                        })
+                        .catch(error => {
+                            console.error('데이터 정보 로드 오류:', error);
+                        });
+                },
+                generateGraph() {
+                    this.isLoading = true;
+                    this.errorMessage = '';
+                    axios.post('/generate-graph', {
+                        threshold: this.threshold,
+                        use_default_data: true
+                    })
+                        .then(response => {
+                            const graphData = response.data;
+                            Plotly.newPlot('graphContainer', JSON.parse(graphData));
+                            this.isLoading = false;
+                        })
+                        .catch(error => {
+                            console.error('그래프 생성 오류:', error);
+                            this.errorMessage = '그래프 생성 중 오류가 발생했습니다.';
+                            this.isLoading = false;
+                        });
+                }
+            }
+        });
+        app.mount('.container');
+    </script>
+</body>
+</html>