Spaces:

Wen1201
/

bayesian-network

Sleeping

App Files Files Community

Wen1201 commited on Oct 31, 2025

Commit

7d4f1a2

verified ·

1 Parent(s): 9fa6123

Upload 8 files

Browse files

Files changed (8) hide show

.streamlitconfig.toml.txt +14 -0
BC_imputed_micerf_period13_fid_course_D4.csv +0 -0
README.md +238 -10
app.py +462 -0
bn_core.py +410 -0
llm_assistant.py +265 -0
requirements.txt +9 -0
utils.py +387 -0

.streamlitconfig.toml.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+[theme]
+primaryColor = "#2d6ca2"
+backgroundColor = "#e8f1f8"
+secondaryBackgroundColor = "#f0f7fc"
+textColor = "#2b3a67"
+font = "sans serif"
+[server]
+maxUploadSize = 200
+enableCORS = false
+enableXsrfProtection = true
+[browser]
+gatherUsageStats = false

BC_imputed_micerf_period13_fid_course_D4.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

README.md CHANGED Viewed

@@ -1,12 +1,240 @@
----
-title: Bayesian Network
-emoji: ⚡
-colorFrom: yellow
-colorTo: purple
-sdk: gradio
-sdk_version: 5.49.1
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🔬 Bayesian Network Analysis System
+一個完整的貝葉斯網路分析系統，整合 AI 助手協助解讀分析結果。
+## ✨ 主要功能
+### 1. 貝葉斯網路分析
+- ✅ 多種結構學習演算法: NB, TAN, CL, Hill Climbing, PC
+- ✅ 自動特徵識別與處理(分類/連續變數)
+- ✅ 完整的模型評估指標
+- ✅ 互動式網路結構視覺化
+- ✅ 條件機率表查詢
+- ✅ ROC 曲線與混淆矩陣
+### 2. AI 問答助手
+- ✅ 自動生成分析總結
+- ✅ 解釋模型指標
+- ✅ 提供改進建議
+- ✅ 解析網路結構
+- ✅ 支援多輪對話
+### 3. 多用戶支援
+- ✅ Session 隔離
+- ✅ 線程安全
+- ✅ 獨立的分析結果儲存
+## 🚀 快速開始
+### 本地運行
+```bash
+# 1. 克隆專案
+git clone <your-repo-url>
+cd bayesian-network-app
+# 2. 安裝依賴
+pip install -r requirements.txt
+# 3. 放置預設資料集
+# 將 BC_imputed_micerf_period13_fid_course_D4.csv 放在根目錄
+# 4. 運行應用
+streamlit run app.py
+```
+### 部署到 Hugging Face Spaces
+1. **創建新的 Space**
+   - 前往 https://huggingface.co/spaces
+   - 點擊 "Create new Space"
+   - 選擇 Streamlit SDK
+   - 設定 Space 名稱
+2. **上傳檔案**
+   ```
+   your-space/
+   ├── app.py
+   ├── bn_core.py
+   ├── llm_assistant.py
+   ├── utils.py
+   ├── requirements.txt
+   ├── BC_imputed_micerf_period13_fid_course_D4.csv (optional)
+   └── README.md
+   ```
+3. **配置 Space Settings**
+   - SDK: Streamlit
+   - Python version: 3.10
+   - Hardware: CPU Basic (免費) 或 升級硬體以獲得更好效能
+4. **推送到 Hugging Face**
+   ```bash
+   git add .
+   git commit -m "Initial commit"
+   git push
+   ```
+## 📋 檔案結構
+```
+├── app.py                 # 主應用程式(Streamlit 介面)
+├── bn_core.py            # 貝葉斯網路核心邏輯
+├── llm_assistant.py      # LLM 問答助手
+├── utils.py              # 工具函數(視覺化、資料處理)
+├── requirements.txt      # Python 套件依賴
+├── README.md            # 說明文件
+└── BC_imputed_micerf_period13_fid_course_D4.csv  # 預設資料集(optional)
+```
+## 🎯 使用說明
+### 1. 設定 API Key
+- 在側邊欄輸入您的 OpenAI API Key
+- API Key 僅在當前 session 有效，不會被儲存
+### 2. 選擇資料來源
+- **使用預設資料集**: 使用內建的乳癌資料集
+- **上傳自己的資料**: 支援 CSV 格式
+### 3. 配置模型
+- 選擇分類特徵和連續特徵
+- 選擇目標變數(必須是二元分類)
+- 設定模型參數:
+  - 測試集比例
+  - 網路結構學習演算法
+  - 參數估計方法
+  - 其他超參數
+### 4. 執行分析
+- 點擊 "Run Analysis" 開始訓練
+- 等待分析完成(通常 10-60 秒)
+- 查看結果:
+  - 網路結構圖
+  - 效能指標
+  - 混淆矩陣
+  - ROC 曲線
+  - 條件機率表
+### 5. 使用 AI 助手
+- 切換到 "AI Assistant" 標籤
+- 詢問關於分析結果的問題
+- 使用快速問題按鈕獲取常見資訊:
+  - 📊 分析總結
+  - 🎯 效能評估
+  - 🔍 結構解釋
+  - ⚠️ 限制說明
+  - 💡 改進建議
+## 🔧 技術架構
+### 後端
+- **pgmpy**: 貝葉斯網路建模與推論
+- **scikit-learn**: 資料分割與評估指標
+- **pandas/numpy**: 資料處理
+### 前端
+- **Streamlit**: Web 應用框架
+- **Plotly**: 互動式視覺化
+### AI 整合
+- **OpenAI GPT-4**: 問答助手
+- 自定義提示詞工程
+- 上下文管理
+### 多用戶支援
+- Session 隔離機制
+- 線程鎖確保資料一致性
+- 獨立的結果儲存空間
+## 📊 支援的演算法
+| 演算法 | 說明 | 適用場景 |
+|--------|------|----------|
+| **NB** | Naive Bayes | 快速、簡單，適合初步分析 |
+| **TAN** | Tree-Augmented Naive Bayes | 比 NB 更靈活，保留樹狀結構 |
+| **CL** | Chow-Liu Tree | 學習最佳樹狀結構 |
+| **HC** | Hill Climbing | 探索更複雜的結構，需選擇評分方法 |
+| **PC** | PC Algorithm | 基於條件獨立性測試，需設定顯著性水準 |
+## 📈 評估指標說明
+- **Accuracy**: 整體準確率
+- **Precision**: 精確率(預測為正的樣本中實際為正的比例)
+- **Recall**: 召回率(實際為正的樣本中被正確預測的比例)
+- **F1-Score**: Precision 和 Recall 的調和平均
+- **AUC**: ROC 曲線下面積
+- **G-mean**: 幾何平均數(適合不平衡資料)
+- **P-mean**: 另一種平衡指標
+- **Specificity**: 特異性(實際為負的樣本中被正確預測的比例)
+## ⚠️ 注意事項
+### 資料要求
+1. CSV 格式
+2. 目標變數必須是二元分類(0/1 或類似)
+3. 分類特徵不可有超過 40 個唯一值
+4. 避免過多缺失值
+### API Key 安全
+- API Key 僅儲存在 session state
+- 不會被記錄或上傳
+- 每個用戶使用自己的 API Key
+### 效能考量
+- 大型資料集(>10000 rows)可能需要較長時間
+- PC 演算法比其他演算法慢
+- 建議先用小樣本測試
+## 🐛 常見問題
+### Q1: 訓練失敗怎麼辦?
+**A**: 檢查:
+- 是否有過多缺失值
+- 分類特徵是否有過多唯一值
+- 目標變數是否為二元分類
+- 嘗試更換演算法
+### Q2: AI 助手無法回應?
+**A**: 確認:
+- OpenAI API Key 是否正確
+- 是否有網路連線
+- API Key 是否有額度
+### Q3: 如何改善模型效能?
+**A**: 嘗試:
+- 特徵工程(創造新特徵)
+- 調整連續變數的分箱數量
+- 嘗試不同的演算法
+- 使用貝葉斯估計器並調整 equivalent_sample_size
+### Q4: 多用戶同時使用會衝突嗎?
+**A**: 不會，系統使用:
+- 唯一的 session_id 區分用戶
+- 線程鎖保護共享資源
+- 獨立的結果儲存空間
+## 🔄 更新日誌
+### v1.0.0 (2025-01)
+- ✅ 初始版本發布
+- ✅ 支援 5 種結構學習演算法
+- ✅ 整合 OpenAI GPT-4 助手
+- ✅ 完整的視覺化功能
+- ✅ 多用戶支援
+## 📝 授權
+此專案基於原 Django 專案改寫，保留原有功能並新增 AI 助手功能。
+## 🤝 貢獻
+歡迎提交 Issue 和 Pull Request!
+## 📧 聯絡
+如有問題，請透過 GitHub Issues 聯繫。
 ---
+**祝您使用愉快！🎉**

app.py ADDED Viewed

	@@ -0,0 +1,462 @@

+import streamlit as st
+import pandas as pd
+import numpy as np
+import plotly.graph_objects as go
+import plotly.express as px
+from io import BytesIO
+import base64
+import json
+from datetime import datetime
+import uuid
+# 頁面配置
+st.set_page_config(
+    page_title="Bayesian Network Analysis System",
+    page_icon="🔬",
+    layout="wide",
+    initial_sidebar_state="expanded"
+)
+# 導入自定義模組
+from bn_core import BayesianNetworkAnalyzer
+from llm_assistant import LLMAssistant
+from utils import (
+    plot_roc_curve,
+    plot_confusion_matrix,
+    plot_probability_distribution,
+    generate_network_graph,
+    create_cpd_table
+)
+# 初始化 session state
+if 'session_id' not in st.session_state:
+    st.session_state.session_id = str(uuid.uuid4())
+if 'analysis_results' not in st.session_state:
+    st.session_state.analysis_results = None
+if 'chat_history' not in st.session_state:
+    st.session_state.chat_history = []
+if 'model_trained' not in st.session_state:
+    st.session_state.model_trained = False
+# 標題
+st.title("🔬 Bayesian Network Analysis System")
+st.markdown("---")
+# Sidebar - OpenAI API Key
+with st.sidebar:
+    st.header("⚙️ Configuration")
+    api_key = st.text_input(
+        "OpenAI API Key",
+        type="password",
+        help="Enter your OpenAI API key to use the AI assistant"
+    )
+    if api_key:
+        st.session_state.api_key = api_key
+        st.success("✅ API Key loaded")
+    st.markdown("---")
+    # 資料來源選擇
+    st.subheader("📊 Data Source")
+    data_source = st.radio(
+        "Select data source:",
+        ["Use Default Dataset", "Upload Your Data"]
+    )
+    uploaded_file = None
+    if data_source == "Upload Your Data":
+        uploaded_file = st.file_uploader(
+            "Upload CSV file",
+            type=['csv'],
+            help="Upload your dataset in CSV format"
+        )
+# 主要內容區
+tab1, tab2 = st.tabs(["📈 Analysis", "💬 AI Assistant"])
+# Tab 1: 分析介面
+with tab1:
+    col1, col2 = st.columns([2, 1])
+    with col1:
+        st.header("Model Configuration")
+        # 載入資料
+        if data_source == "Use Default Dataset":
+            # 使用預設資料集
+            @st.cache_data
+            def load_default_data():
+                # 這裡放入預設資料集的路徑
+                df = pd.read_csv("BC_imputed_micerf_period13_fid_course_D4.csv")
+                return df
+            try:
+                df = load_default_data()
+                st.success(f"✅ Default dataset loaded: {df.shape[0]} rows, {df.shape[1]} columns")
+            except:
+                st.error("❌ Default dataset not found. Please upload your own data.")
+                df = None
+        else:
+            if uploaded_file:
+                df = pd.read_csv(uploaded_file)
+                st.success(f"✅ Data loaded: {df.shape[0]} rows, {df.shape[1]} columns")
+            else:
+                st.info("👆 Please upload a CSV file to begin")
+                df = None
+        if df is not None:
+            # 特徵選擇
+            st.subheader("🎯 Feature Selection")
+            # 自動識別特徵類型
+            numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
+            categorical_cols = df.select_dtypes(include=['object', 'category']).columns.tolist()
+            # 二元分類變數(用於目標變數)
+            binary_cols = [col for col in df.columns if df[col].nunique() == 2]
+            col_feat1, col_feat2 = st.columns(2)
+            with col_feat1:
+                st.markdown("**Categorical Features**")
+                cat_features = st.multiselect(
+                    "Select categorical features:",
+                    options=categorical_cols,
+                    default=categorical_cols[:5] if len(categorical_cols) > 0 else []
+                )
+            with col_feat2:
+                st.markdown("**Continuous Features**")
+                con_features = st.multiselect(
+                    "Select continuous features:",
+                    options=numeric_cols,
+                    default=numeric_cols[:3] if len(numeric_cols) > 0 else []
+                )
+            # 目標變數
+            target_variable = st.selectbox(
+                "🎯 Target Variable (Y):",
+                options=binary_cols,
+                help="Must be a binary classification variable"
+            )
+            # 驗證選擇
+            selected_features = cat_features + con_features
+            if target_variable in selected_features:
+                st.error("❌ Target variable cannot be in feature list!")
+                st.stop()
+            st.markdown("---")
+            # 模型參數
+            st.subheader("⚙️ Model Parameters")
+            col_param1, col_param2, col_param3 = st.columns(3)
+            with col_param1:
+                test_fraction = st.slider(
+                    "Test Dataset Proportion:",
+                    min_value=0.1,
+                    max_value=0.5,
+                    value=0.25,
+                    step=0.05
+                )
+                algorithm = st.selectbox(
+                    "Network Structure:",
+                    options=['NB', 'TAN', 'CL', 'HC', 'PC'],
+                    format_func=lambda x: {
+                        'NB': 'Naive Bayes',
+                        'TAN': 'Tree-Augmented Naive Bayes',
+                        'CL': 'Chow-Liu',
+                        'HC': 'Hill Climbing',
+                        'PC': 'PC Algorithm'
+                    }[x]
+                )
+            with col_param2:
+                estimator = st.selectbox(
+                    "Parameter Estimator:",
+                    options=['ml', 'bn'],
+                    format_func=lambda x: {
+                        'ml': 'Maximum Likelihood',
+                        'bn': 'Bayesian Estimator'
+                    }[x]
+                )
+                if estimator == 'bn':
+                    equivalent_sample_size = st.number_input(
+                        "Equivalent Sample Size:",
+                        min_value=1,
+                        value=3,
+                        step=1
+                    )
+                else:
+                    equivalent_sample_size = 3
+                # 條件性參數
+                if algorithm == 'HC':
+                    score_method = st.selectbox(
+                        "Scoring Method:",
+                        options=['BIC', 'AIC', 'K2', 'BDeu', 'BDs']
+                    )
+                else:
+                    score_method = 'BIC'
+            with col_param3:
+                if algorithm == 'PC':
+                    sig_level = st.number_input(
+                        "Significance Level:",
+                        min_value=0.01,
+                        max_value=1.0,
+                        value=0.05,
+                        step=0.01
+                    )
+                else:
+                    sig_level = 0.05
+                n_bins = st.number_input(
+                    "Number of Bins (for continuous):",
+                    min_value=3,
+                    max_value=20,
+                    value=10,
+                    step=1
+                )
+            # 執行分析按鈕
+            st.markdown("---")
+            if st.button("🚀 Run Analysis", type="primary", use_container_width=True):
+                with st.spinner("🔄 Training Bayesian Network..."):
+                    try:
+                        # 初始化分析器
+                        analyzer = BayesianNetworkAnalyzer(
+                            session_id=st.session_state.session_id
+                        )
+                        # 執行分析
+                        results = analyzer.run_analysis(
+                            df=df,
+                            cat_features=cat_features,
+                            con_features=con_features,
+                            target_variable=target_variable,
+                            test_fraction=test_fraction,
+                            algorithm=algorithm,
+                            estimator=estimator,
+                            equivalent_sample_size=equivalent_sample_size,
+                            score_method=score_method,
+                            sig_level=sig_level,
+                            n_bins=n_bins
+                        )
+                        # 儲存結果
+                        st.session_state.analysis_results = results
+                        st.session_state.model_trained = True
+                        st.success("✅ Analysis completed!")
+                        st.rerun()
+                    except Exception as e:
+                        st.error(f"❌ Error during analysis: {str(e)}")
+                        st.exception(e)
+    with col2:
+        st.header("Quick Stats")
+        if df is not None:
+            st.metric("Total Samples", df.shape[0])
+            st.metric("Total Features", df.shape[1])
+            st.metric("Selected Features", len(selected_features) if 'selected_features' in locals() else 0)
+            if st.session_state.model_trained:
+                st.success("✅ Model Trained")
+            else:
+                st.info("⏳ Awaiting Training")
+    # 顯示結果
+    if st.session_state.analysis_results:
+        st.markdown("---")
+        st.header("📊 Analysis Results")
+        results = st.session_state.analysis_results
+        # 網路結構
+        st.subheader("🕸️ Bayesian Network Structure")
+        network_fig = generate_network_graph(results['model'])
+        st.plotly_chart(network_fig, use_container_width=True)
+        # 效能指標
+        st.subheader("📈 Performance Metrics")
+        col_m1, col_m2 = st.columns(2)
+        with col_m1:
+            st.markdown("**Training Set**")
+            train_metrics = results['train_metrics']
+            metric_cols = st.columns(4)
+            metric_cols[0].metric("Accuracy", f"{train_metrics['accuracy']:.2f}%")
+            metric_cols[1].metric("Precision", f"{train_metrics['precision']:.2f}%")
+            metric_cols[2].metric("Recall", f"{train_metrics['recall']:.2f}%")
+            metric_cols[3].metric("F1-Score", f"{train_metrics['f1']:.2f}%")
+            # 混淆矩陣
+            conf_fig_train = plot_confusion_matrix(
+                train_metrics['confusion_matrix'],
+                title="Training Set Confusion Matrix"
+            )
+            st.plotly_chart(conf_fig_train, use_container_width=True)
+            # ROC Curve
+            roc_fig_train = plot_roc_curve(
+                train_metrics['fpr'],
+                train_metrics['tpr'],
+                train_metrics['auc'],
+                title="Training Set ROC Curve"
+            )
+            st.plotly_chart(roc_fig_train, use_container_width=True)
+        with col_m2:
+            st.markdown("**Test Set**")
+            test_metrics = results['test_metrics']
+            metric_cols = st.columns(4)
+            metric_cols[0].metric("Accuracy", f"{test_metrics['accuracy']:.2f}%")
+            metric_cols[1].metric("Precision", f"{test_metrics['precision']:.2f}%")
+            metric_cols[2].metric("Recall", f"{test_metrics['recall']:.2f}%")
+            metric_cols[3].metric("F1-Score", f"{test_metrics['f1']:.2f}%")
+            # 混淆矩陣
+            conf_fig_test = plot_confusion_matrix(
+                test_metrics['confusion_matrix'],
+                title="Test Set Confusion Matrix"
+            )
+            st.plotly_chart(conf_fig_test, use_container_width=True)
+            # ROC Curve
+            roc_fig_test = plot_roc_curve(
+                test_metrics['fpr'],
+                test_metrics['tpr'],
+                test_metrics['auc'],
+                title="Test Set ROC Curve"
+            )
+            st.plotly_chart(roc_fig_test, use_container_width=True)
+        # 條件機率表
+        st.subheader("📋 Conditional Probability Tables")
+        selected_node = st.selectbox(
+            "Select a node to view its CPD:",
+            options=list(results['cpds'].keys())
+        )
+        if selected_node:
+            cpd_df = create_cpd_table(results['cpds'][selected_node])
+            st.dataframe(cpd_df, use_container_width=True)
+        # 評分指標
+        st.subheader("📊 Model Scores")
+        score_cols = st.columns(5)
+        scores = results['scores']
+        score_cols[0].metric("Log-Likelihood", f"{scores['log_likelihood']:.2f}")
+        score_cols[1].metric("BIC Score", f"{scores['bic']:.2f}")
+        score_cols[2].metric("K2 Score", f"{scores['k2']:.2f}")
+        score_cols[3].metric("BDeu Score", f"{scores['bdeu']:.2f}")
+        score_cols[4].metric("BDs Score", f"{scores['bds']:.2f}")
+# Tab 2: AI 助手
+with tab2:
+    st.header("💬 AI Analysis Assistant")
+    if not st.session_state.get('api_key'):
+        st.warning("⚠️ Please enter your OpenAI API Key in the sidebar to use the AI assistant.")
+    elif not st.session_state.model_trained:
+        st.info("ℹ️ Please train a model first in the Analysis tab to use the AI assistant.")
+    else:
+        # 初始化 LLM 助手
+        if 'llm_assistant' not in st.session_state:
+            st.session_state.llm_assistant = LLMAssistant(
+                api_key=st.session_state.api_key,
+                session_id=st.session_state.session_id
+            )
+        # 顯示聊天歷史
+        chat_container = st.container()
+        with chat_container:
+            for message in st.session_state.chat_history:
+                with st.chat_message(message["role"]):
+                    st.markdown(message["content"])
+        # 聊天輸入
+        if prompt := st.chat_input("Ask me anything about your analysis results..."):
+            # 添加用戶訊息
+            st.session_state.chat_history.append({
+                "role": "user",
+                "content": prompt
+            })
+            with st.chat_message("user"):
+                st.markdown(prompt)
+            # 獲取 AI 回應
+            with st.chat_message("assistant"):
+                with st.spinner("Thinking..."):
+                    response = st.session_state.llm_assistant.get_response(
+                        user_message=prompt,
+                        analysis_results=st.session_state.analysis_results
+                    )
+                    st.markdown(response)
+            # 添加助手訊息
+            st.session_state.chat_history.append({
+                "role": "assistant",
+                "content": response
+            })
+        # 快速問題按鈕
+        st.markdown("---")
+        st.subheader("💡 Quick Questions")
+        quick_questions = [
+            "📊 Give me a summary of the analysis results",
+            "🎯 What is the model's performance?",
+            "🔍 Explain the Bayesian Network structure",
+            "⚠️ What are the limitations of this model?",
+            "💡 How can I improve the model?"
+        ]
+        cols = st.columns(len(quick_questions))
+        for idx, (col, question) in enumerate(zip(cols, quick_questions)):
+            if col.button(question, key=f"quick_{idx}"):
+                st.session_state.chat_history.append({
+                    "role": "user",
+                    "content": question
+                })
+                response = st.session_state.llm_assistant.get_response(
+                    user_message=question,
+                    analysis_results=st.session_state.analysis_results
+                )
+                st.session_state.chat_history.append({
+                    "role": "assistant",
+                    "content": response
+                })
+                st.rerun()
+# Footer
+st.markdown("---")
+st.markdown(
+    """
+    <div style='text-align: center'>
+        <p>🔬 Bayesian Network Analysis System | Built with Streamlit</p>
+        <p>Powered by OpenAI GPT-4 | Session ID: {}</p>
+    </div>
+    """.format(st.session_state.session_id[:8]),
+    unsafe_allow_html=True
+)

bn_core.py ADDED Viewed

	@@ -0,0 +1,410 @@

+import pandas as pd
+import numpy as np
+from pgmpy.models import BayesianNetwork
+from pgmpy.estimators import (
+    TreeSearch, HillClimbSearch, PC,
+    MaximumLikelihoodEstimator, BayesianEstimator,
+    BicScore, AICScore, K2Score, BDeuScore, BDsScore
+)
+from pgmpy.inference import VariableElimination
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import (
+    confusion_matrix, accuracy_score, precision_score,
+    recall_score, f1_score, roc_curve, roc_auc_score
+)
+from pgmpy.metrics import log_likelihood_score, structure_score
+import threading
+from datetime import datetime
+class BayesianNetworkAnalyzer:
+    """
+    貝葉斯網路分析器
+    支援多用戶同時使用,每個 session 獨立處理
+    """
+    # 類級別的鎖,用於線程安全
+    _lock = threading.Lock()
+    # 儲存各 session 的分析結果
+    _session_results = {}
+    def __init__(self, session_id):
+        """
+        初始化分析器
+        Args:
+            session_id: 唯一的 session 識別碼
+        """
+        self.session_id = session_id
+        self.model = None
+        self.inference = None
+        self.train_data = None
+        self.test_data = None
+        self.bins_dict = {}
+    def run_analysis(self, df, cat_features, con_features, target_variable,
+                     test_fraction=0.25, algorithm='NB', estimator='ml',
+                     equivalent_sample_size=3, score_method='BIC',
+                     sig_level=0.05, n_bins=10):
+        """
+        執行完整的貝葉斯網路分析
+        Args:
+            df: 原始資料框
+            cat_features: 分類特徵列表
+            con_features: 連續特徵列表
+            target_variable: 目標變數名稱
+            test_fraction: 測試集比例
+            algorithm: 結構學習演算法
+            estimator: 參數估計方法
+            equivalent_sample_size: 等效樣本大小(用於貝葉斯估計)
+            score_method: 評分方法(用於 Hill Climbing)
+            sig_level: 顯著性水準(用於 PC 演算法)
+            n_bins: 連續變數分箱數量
+        Returns:
+            dict: 包含所有分析結果的字典
+        """
+        with self._lock:
+            try:
+                # 1. 資料預處理
+                processed_df = self._preprocess_data(
+                    df, cat_features, con_features, target_variable, n_bins
+                )
+                # 2. 分割訓練/測試集
+                self.train_data, self.test_data = train_test_split(
+                    processed_df,
+                    test_size=test_fraction,
+                    random_state=42,
+                    stratify=processed_df[target_variable]
+                )
+                # 3. 學習網路結構
+                self.model = self._learn_structure(
+                    algorithm, score_method, sig_level, target_variable
+                )
+                # 4. 參數估計
+                self._fit_parameters(estimator, equivalent_sample_size)
+                # 5. 初始化推論引擎
+                self.inference = VariableElimination(self.model)
+                # 6. 評估模型
+                train_metrics = self._evaluate_model(
+                    self.train_data, target_variable, "train"
+                )
+                test_metrics = self._evaluate_model(
+                    self.test_data, target_variable, "test"
+                )
+                # 7. 獲取 CPD
+                cpds = self._get_all_cpds()
+                # 8. 計算模型評分
+                scores = self._calculate_scores()
+                # 9. 整理結果
+                results = {
+                    'model': self.model,
+                    'inference': self.inference,
+                    'train_metrics': train_metrics,
+                    'test_metrics': test_metrics,
+                    'cpds': cpds,
+                    'scores': scores,
+                    'parameters': {
+                        'algorithm': algorithm,
+                        'estimator': estimator,
+                        'test_fraction': test_fraction,
+                        'n_features': len(cat_features) + len(con_features),
+                        'cat_features': cat_features,
+                        'con_features': con_features,
+                        'target_variable': target_variable,
+                        'n_bins': n_bins
+                    },
+                    'timestamp': datetime.now().isoformat()
+                }
+                # 儲存到 session results
+                self._session_results[self.session_id] = results
+                return results
+            except Exception as e:
+                raise Exception(f"Analysis failed: {str(e)}")
+    def _preprocess_data(self, df, cat_features, con_features,
+                         target_variable, n_bins):
+        """資料預處理"""
+        # 選擇需要的欄位
+        selected_columns = cat_features + con_features + [target_variable]
+        processed_df = df[selected_columns].copy()
+        # 處理缺失值
+        processed_df = processed_df.dropna()
+        # 處理連續變數 - 分箱
+        for col in con_features:
+            if col in processed_df.columns:
+                # 記錄分箱邊界
+                bin_edges = pd.cut(
+                    processed_df[col],
+                    bins=n_bins,
+                    retbins=True,
+                    duplicates='drop'
+                )[1]
+                self.bins_dict[col] = bin_edges
+                # 創建分箱標籤
+                bin_labels = [
+                    f"{round(bin_edges[i], 2)}-{round(bin_edges[i+1], 2)}"
+                    for i in range(len(bin_edges) - 1)
+                ]
+                # 應用分箱
+                processed_df[col] = pd.cut(
+                    processed_df[col],
+                    bins=bin_edges,
+                    labels=bin_labels,
+                    include_lowest=True
+                ).astype(str)
+        # 確保分類變數為字串類型
+        for col in cat_features:
+            if col in processed_df.columns:
+                processed_df[col] = processed_df[col].astype(str)
+        # 確保目標變數為整數
+        if target_variable in processed_df.columns:
+            processed_df[target_variable] = processed_df[target_variable].astype(int)
+        return processed_df
+    def _learn_structure(self, algorithm, score_method, sig_level, target_variable):
+        """學習網路結構"""
+        if algorithm == 'NB':
+            # Naive Bayes
+            edges = [
+                (target_variable, feature)
+                for feature in self.train_data.columns
+                if feature != target_variable
+            ]
+            model = BayesianNetwork(edges)
+        elif algorithm == 'TAN':
+            # Tree-Augmented Naive Bayes
+            tan_search = TreeSearch(self.train_data)
+            structure = tan_search.estimate(
+                estimator_type='tan',
+                class_node=target_variable
+            )
+            model = BayesianNetwork(structure.edges())
+        elif algorithm == 'CL':
+            # Chow-Liu
+            tan_search = TreeSearch(self.train_data)
+            structure = tan_search.estimate(
+                estimator_type='chow-liu',
+                class_node=target_variable
+            )
+            model = BayesianNetwork(structure.edges())
+        elif algorithm == 'HC':
+            # Hill Climbing
+            hc = HillClimbSearch(self.train_data)
+            # 選擇評分方法
+            scoring_methods = {
+                'BIC': BicScore(self.train_data),
+                'AIC': AICScore(self.train_data),
+                'K2': K2Score(self.train_data),
+                'BDeu': BDeuScore(self.train_data),
+                'BDs': BDsScore(self.train_data)
+            }
+            structure = hc.estimate(
+                scoring_method=scoring_methods[score_method]
+            )
+            model = BayesianNetwork(structure.edges())
+        elif algorithm == 'PC':
+            # PC Algorithm
+            pc = PC(self.train_data)
+            # 嘗試不同的 max_cond_vars 直到成功
+            for max_cond in [5, 4, 3, 2, 1]:
+                try:
+                    structure = pc.estimate(
+                        significance_level=sig_level,
+                        max_cond_vars=max_cond,
+                        ci_test='chi_square',
+                        variant='stable'
+                    )
+                    # 檢查是否有效
+                    if structure.edges():
+                        model = BayesianNetwork(structure.edges())
+                        break
+                except:
+                    continue
+            else:
+                # 如果都失敗,使用 Naive Bayes
+                edges = [
+                    (target_variable, feature)
+                    for feature in self.train_data.columns
+                    if feature != target_variable
+                ]
+                model = BayesianNetwork(edges)
+        else:
+            raise ValueError(f"Unknown algorithm: {algorithm}")
+        return model
+    def _fit_parameters(self, estimator, equivalent_sample_size):
+        """參數估計"""
+        if estimator == 'bn':
+            self.model.fit(
+                self.train_data,
+                estimator=BayesianEstimator,
+                equivalent_sample_size=equivalent_sample_size
+            )
+        else:
+            self.model.fit(
+                self.train_data,
+                estimator=MaximumLikelihoodEstimator
+            )
+    def _predict_probabilities(self, data, target_variable):
+        """預測機率"""
+        true_labels = []
+        predicted_probs = []
+        model_nodes = set(self.model.nodes())
+        for idx, row in data.iterrows():
+            # 準備 evidence
+            evidence = {
+                k: v for k, v in row.drop(target_variable).to_dict().items()
+                if k in model_nodes
+            }
+            true_label = row[target_variable]
+            true_labels.append(true_label)
+            try:
+                result = self.inference.query(
+                    variables=[target_variable],
+                    evidence=evidence
+                )
+                probs = result.values
+                predicted_probs.append(probs)
+            except:
+                # 如果推論失敗,使用邊際機率
+                predicted_probs.append(None)
+        # 過濾有效的結果
+        valid_data = [
+            (label, prob)
+            for label, prob in zip(true_labels, predicted_probs)
+            if prob is not None and len(prob) > 1
+        ]
+        if not valid_data:
+            return [], []
+        valid_labels, valid_probs = zip(*valid_data)
+        prob_array = np.array([prob[1] for prob in valid_probs])
+        return list(valid_labels), prob_array
+    def _evaluate_model(self, data, target_variable, dataset_name):
+        """評估模型效能"""
+        # 預測
+        true_labels, pred_probs = self._predict_probabilities(
+            data, target_variable
+        )
+        if len(true_labels) == 0:
+            return {
+                'accuracy': 0,
+                'precision': 0,
+                'recall': 0,
+                'f1': 0,
+                'auc': 0,
+                'confusion_matrix': [[0, 0], [0, 0]],
+                'fpr': [0],
+                'tpr': [0]
+            }
+        # 二元預測
+        pred_labels = (pred_probs >= 0.5).astype(int)
+        # 計算指標
+        accuracy = accuracy_score(true_labels, pred_labels) * 100
+        precision = precision_score(true_labels, pred_labels, zero_division=0) * 100
+        recall = recall_score(true_labels, pred_labels, zero_division=0) * 100
+        f1 = f1_score(true_labels, pred_labels, zero_division=0) * 100
+        # ROC 曲線
+        fpr, tpr, _ = roc_curve(true_labels, pred_probs)
+        auc = roc_auc_score(true_labels, pred_probs)
+        # 混淆矩陣
+        cm = confusion_matrix(true_labels, pred_labels).tolist()
+        # G-mean 和 P-mean
+        tn, fp, fn, tp = confusion_matrix(true_labels, pred_labels).ravel()
+        sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0
+        specificity = tn / (tn + fp) if (tn + fp) > 0 else 0
+        g_mean = np.sqrt(sensitivity * precision / 100) * 100
+        p_mean = np.sqrt(specificity * sensitivity) * 100
+        return {
+            'accuracy': accuracy,
+            'precision': precision,
+            'recall': recall,
+            'f1': f1,
+            'auc': auc,
+            'g_mean': g_mean,
+            'p_mean': p_mean,
+            'specificity': specificity * 100,
+            'confusion_matrix': cm,
+            'fpr': fpr.tolist(),
+            'tpr': tpr.tolist(),
+            'predicted_probs': pred_probs.tolist()
+        }
+    def _get_all_cpds(self):
+        """獲取所有條件機率表"""
+        cpds = {}
+        for node in self.model.nodes():
+            cpd = self.model.get_cpds(node)
+            cpds[node] = cpd
+        return cpds
+    def _calculate_scores(self):
+        """計算模型評分"""
+        scores = {
+            'log_likelihood': log_likelihood_score(self.model, self.train_data),
+            'bic': structure_score(self.model, self.train_data, scoring_method='bic'),
+            'k2': structure_score(self.model, self.train_data, scoring_method='k2'),
+            'bdeu': structure_score(self.model, self.train_data, scoring_method='bdeu'),
+            'bds': structure_score(self.model, self.train_data, scoring_method='bds')
+        }
+        return scores
+    @classmethod
+    def get_session_results(cls, session_id):
+        """獲取特定 session 的結果"""
+        return cls._session_results.get(session_id)
+    @classmethod
+    def clear_session_results(cls, session_id):
+        """清除特定 session 的結果"""
+        if session_id in cls._session_results:
+            del cls._session_results[session_id]

llm_assistant.py ADDED Viewed

	@@ -0,0 +1,265 @@

+from openai import OpenAI
+import json
+import numpy as np
+class LLMAssistant:
+    """
+    LLM 問答助手
+    協助用戶理解貝葉斯網路分析結果
+    """
+    def __init__(self, api_key, session_id):
+        """
+        初始化 LLM 助手
+        Args:
+            api_key: OpenAI API key
+            session_id: 唯一的 session 識別碼
+        """
+        self.client = OpenAI(api_key=api_key)
+        self.session_id = session_id
+        self.conversation_history = []
+        # 系統提示詞
+        self.system_prompt = """You are an expert data scientist specializing in Bayesian Networks and machine learning.
+Your role is to help users understand their Bayesian Network analysis results.
+You should:
+1. Explain complex statistical concepts in simple terms
+2. Provide insights about model performance metrics
+3. Suggest improvements when asked
+4. Explain the structure and relationships in the Bayesian Network
+5. Help interpret conditional probability tables (CPTs)
+6. Discuss limitations and assumptions of the model
+Always be clear, concise, and educational. Use examples when helpful.
+Format your responses with proper markdown for better readability."""
+    def get_response(self, user_message, analysis_results):
+        """
+        獲取 AI 回應
+        Args:
+            user_message: 用戶訊息
+            analysis_results: 分析結果字典
+        Returns:
+            str: AI 回應
+        """
+        # 準備上下文資訊
+        context = self._prepare_context(analysis_results)
+        # 添加用戶訊息到歷史
+        self.conversation_history.append({
+            "role": "user",
+            "content": user_message
+        })
+        # 構建訊息列表
+        messages = [
+            {"role": "system", "content": self.system_prompt},
+            {"role": "system", "content": f"Analysis Context:\n{context}"}
+        ] + self.conversation_history
+        try:
+            # 調用 OpenAI API
+            response = self.client.chat.completions.create(
+                model="gpt-4o-mini",
+                messages=messages,
+                temperature=0.7,
+                max_tokens=1500
+            )
+            assistant_message = response.choices[0].message.content
+            # 添加助手回應到歷史
+            self.conversation_history.append({
+                "role": "assistant",
+                "content": assistant_message
+            })
+            return assistant_message
+        except Exception as e:
+            return f"❌ Error: {str(e)}\n\nPlease check your API key and try again."
+    def _prepare_context(self, results):
+        """準備分析結果的上下文資訊"""
+        if not results:
+            return "No analysis results available yet."
+        # 提取關鍵資訊
+        params = results['parameters']
+        train_metrics = results['train_metrics']
+        test_metrics = results['test_metrics']
+        scores = results['scores']
+        # 構建上下文字串
+        context = f"""
+## Model Configuration
+- Algorithm: {params['algorithm']}
+- Estimator: {params['estimator']}
+- Number of Features: {params['n_features']}
+  - Categorical: {len(params['cat_features'])}
+  - Continuous: {len(params['con_features'])}
+- Target Variable: {params['target_variable']}
+- Test Set Proportion: {params['test_fraction']:.0%}
+## Training Set Performance
+- Accuracy: {train_metrics['accuracy']:.2f}%
+- Precision: {train_metrics['precision']:.2f}%
+- Recall: {train_metrics['recall']:.2f}%
+- F1-Score: {train_metrics['f1']:.2f}%
+- AUC: {train_metrics['auc']:.4f}
+- G-mean: {train_metrics['g_mean']:.2f}%
+- P-mean: {train_metrics['p_mean']:.2f}%
+- Specificity: {train_metrics['specificity']:.2f}%
+## Test Set Performance
+- Accuracy: {test_metrics['accuracy']:.2f}%
+- Precision: {test_metrics['precision']:.2f}%
+- Recall: {test_metrics['recall']:.2f}%
+- F1-Score: {test_metrics['f1']:.2f}%
+- AUC: {test_metrics['auc']:.4f}
+- G-mean: {test_metrics['g_mean']:.2f}%
+- P-mean: {test_metrics['p_mean']:.2f}%
+- Specificity: {test_metrics['specificity']:.2f}%
+## Model Scores
+- Log-Likelihood: {scores['log_likelihood']:.2f}
+- BIC Score: {scores['bic']:.2f}
+- K2 Score: {scores['k2']:.2f}
+- BDeu Score: {scores['bdeu']:.2f}
+- BDs Score: {scores['bds']:.2f}
+## Network Structure
+- Total Nodes: {len(results['model'].nodes())}
+- Total Edges: {len(results['model'].edges())}
+- Network Edges: {list(results['model'].edges())[:10]}... (showing first 10)
+"""
+        return context
+    def generate_summary(self, analysis_results):
+        """
+        自動生成分析結果總結
+        Args:
+            analysis_results: 分析結果字典
+        Returns:
+            str: 總結文字
+        """
+        summary_prompt = """Based on the analysis results provided in the context, please generate a comprehensive summary that includes:
+1. **Model Overview**: Brief description of the model type and configuration
+2. **Performance Analysis**:
+   - Overall model performance on both training and test sets
+   - Comparison between training and test performance (overfitting/underfitting)
+   - Key strengths and weaknesses
+3. **Network Structure Insights**: What the learned structure tells us about variable relationships
+4. **Recommendations**: Specific suggestions for improvement
+5. **Limitations**: Important caveats and limitations to consider
+Format the summary in clear markdown with appropriate sections and bullet points."""
+        return self.get_response(summary_prompt, analysis_results)
+    def explain_metric(self, metric_name, analysis_results):
+        """
+        解釋特定指標
+        Args:
+            metric_name: 指標名稱
+            analysis_results: 分析結果字典
+        Returns:
+            str: 指標解釋
+        """
+        explain_prompt = f"""Please explain the following metric in the context of this analysis:
+Metric: {metric_name}
+Include:
+1. What this metric measures
+2. The value obtained in this analysis (training and test)
+3. How to interpret this value
+4. What it tells us about model performance
+5. How it relates to other metrics in the analysis"""
+        return self.get_response(explain_prompt, analysis_results)
+    def suggest_improvements(self, analysis_results):
+        """
+        提供改進建議
+        Args:
+            analysis_results: 分析結果字典
+        Returns:
+            str: 改進建議
+        """
+        improve_prompt = """Based on the current model performance and configuration, please provide specific, actionable recommendations for improvement.
+Consider:
+1. Feature engineering opportunities
+2. Algorithm selection
+3. Hyperparameter tuning
+4. Data quality issues
+5. Model complexity trade-offs
+Prioritize recommendations by potential impact."""
+        return self.get_response(improve_prompt, analysis_results)
+    def explain_network_structure(self, analysis_results):
+        """
+        解釋網路結構
+        Args:
+            analysis_results: 分析結果字典
+        Returns:
+            str: 網路結構解釋
+        """
+        structure_prompt = """Please explain the learned Bayesian Network structure:
+1. What are the key relationships (edges) discovered?
+2. What do these relationships tell us about the domain?
+3. Are there any surprising or interesting patterns?
+4. How does the structure relate to the target variable?
+5. What are the implications for prediction and inference?"""
+        return self.get_response(structure_prompt, analysis_results)
+    def compare_algorithms(self, analysis_results):
+        """
+        比較不同演算法
+        Args:
+            analysis_results: 分析結果字典
+        Returns:
+            str: 演算法比較
+        """
+        compare_prompt = f"""The current model uses the {analysis_results['parameters']['algorithm']} algorithm.
+Please:
+1. Explain the characteristics of this algorithm
+2. Compare it with other available algorithms (NB, TAN, CL, HC, PC)
+3. Discuss when this algorithm is most appropriate
+4. Suggest if a different algorithm might be better for this dataset
+5. Explain the trade-offs involved"""
+        return self.get_response(compare_prompt, analysis_results)
+    def reset_conversation(self):
+        """重置對話歷史"""
+        self.conversation_history = []

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+streamlit==1.31.0
+pandas==2.1.4
+numpy==1.26.3
+plotly==5.18.0
+scikit-learn==1.4.0
+pgmpy==0.1.25
+networkx==3.2.1
+openai==1.12.0
+graphviz==0.20.1

utils.py ADDED Viewed

	@@ -0,0 +1,387 @@

+import plotly.graph_objects as go
+import plotly.express as px
+import pandas as pd
+import numpy as np
+import networkx as nx
+from plotly.subplots import make_subplots
+def plot_roc_curve(fpr, tpr, auc, title="ROC Curve"):
+    """
+    繪製 ROC 曲線
+    Args:
+        fpr: False positive rate
+        tpr: True positive rate
+        auc: Area under curve
+        title: 圖表標題
+    Returns:
+        plotly figure
+    """
+    fig = go.Figure()
+    # ROC 曲線
+    fig.add_trace(go.Scatter(
+        x=fpr,
+        y=tpr,
+        mode='lines',
+        name=f'ROC Curve (AUC = {auc:.4f})',
+        line=dict(color='#2d6ca2', width=2)
+    ))
+    # 對角線(隨機分類器)
+    fig.add_trace(go.Scatter(
+        x=[0, 1],
+        y=[0, 1],
+        mode='lines',
+        name='Random Classifier',
+        line=dict(color='gray', width=1, dash='dash')
+    ))
+    fig.update_layout(
+        title=title,
+        xaxis_title='False Positive Rate',
+        yaxis_title='True Positive Rate',
+        width=600,
+        height=500,
+        template='plotly_white',
+        legend=dict(x=0.6, y=0.1)
+    )
+    return fig
+def plot_confusion_matrix(cm, title="Confusion Matrix"):
+    """
+    繪製混淆矩陣
+    Args:
+        cm: 混淆矩陣 (2x2 list)
+        title: 圖表標題
+    Returns:
+        plotly figure
+    """
+    # 轉換為 numpy array
+    cm_array = np.array(cm)
+    # 計算百分比
+    cm_percent = cm_array / cm_array.sum() * 100
+    # 創建標籤
+    labels = [
+        [f'{cm_array[i][j]}<br>({cm_percent[i][j]:.1f}%)'
+         for j in range(2)]
+        for i in range(2)
+    ]
+    fig = go.Figure(data=go.Heatmap(
+        z=cm_array,
+        x=['Predicted: 0', 'Predicted: 1'],
+        y=['Actual: 0', 'Actual: 1'],
+        text=labels,
+        texttemplate='%{text}',
+        textfont={"size": 14},
+        colorscale='Blues',
+        showscale=True
+    ))
+    fig.update_layout(
+        title=title,
+        width=500,
+        height=450,
+        template='plotly_white'
+    )
+    return fig
+def plot_probability_distribution(probs, title="Probability Distribution"):
+    """
+    繪製機率分佈圖
+    Args:
+        probs: 預測機率列表
+        title: 圖表標題
+    Returns:
+        plotly figure
+    """
+    fig = go.Figure()
+    fig.add_trace(go.Histogram(
+        x=probs,
+        nbinsx=20,
+        name='Predicted Probabilities',
+        marker=dict(
+            color='#2d6ca2',
+            line=dict(color='white', width=1)
+        )
+    ))
+    fig.update_layout(
+        title=title,
+        xaxis_title='Predicted Probability for Class 1',
+        yaxis_title='Frequency',
+        width=700,
+        height=400,
+        template='plotly_white',
+        showlegend=False
+    )
+    fig.update_xaxes(range=[0, 1])
+    return fig
+def generate_network_graph(model):
+    """
+    生成貝葉斯網路結構圖
+    Args:
+        model: BayesianNetwork 模型
+    Returns:
+        plotly figure
+    """
+    # 創建 NetworkX 圖
+    G = nx.DiGraph()
+    G.add_edges_from(model.edges())
+    # 使用層次佈局
+    try:
+        pos = nx.spring_layout(G, k=2, iterations=50, seed=42)
+    except:
+        pos = nx.circular_layout(G)
+    # 提取節點和邊的座標
+    edge_x = []
+    edge_y = []
+    for edge in G.edges():
+        x0, y0 = pos[edge[0]]
+        x1, y1 = pos[edge[1]]
+        edge_x.extend([x0, x1, None])
+        edge_y.extend([y0, y1, None])
+    edge_trace = go.Scatter(
+        x=edge_x, y=edge_y,
+        line=dict(width=2, color='#888'),
+        hoverinfo='none',
+        mode='lines',
+        showlegend=False
+    )
+    node_x = []
+    node_y = []
+    node_text = []
+    for node in G.nodes():
+        x, y = pos[node]
+        node_x.append(x)
+        node_y.append(y)
+        node_text.append(node)
+    node_trace = go.Scatter(
+        x=node_x, y=node_y,
+        mode='markers+text',
+        hoverinfo='text',
+        text=node_text,
+        textposition="top center",
+        showlegend=False,
+        marker=dict(
+            size=30,
+            color='#2d6ca2',
+            line=dict(width=2, color='white')
+        )
+    )
+    # 添加箭頭
+    annotations = []
+    for edge in G.edges():
+        x0, y0 = pos[edge[0]]
+        x1, y1 = pos[edge[1]]
+        # 計算箭頭位置(在邊的中點)
+        mid_x = (x0 + x1) / 2
+        mid_y = (y0 + y1) / 2
+        annotations.append(
+            dict(
+                ax=x0, ay=y0,
+                axref='x', ayref='y',
+                x=x1, y=y1,
+                xref='x', yref='y',
+                showarrow=True,
+                arrowhead=2,
+                arrowsize=1,
+                arrowwidth=2,
+                arrowcolor='#888'
+            )
+        )
+    fig = go.Figure(data=[edge_trace, node_trace])
+    fig.update_layout(
+        title='Bayesian Network Structure',
+        titlefont_size=16,
+        showlegend=False,
+        hovermode='closest',
+        margin=dict(b=20, l=5, r=5, t=40),
+        annotations=annotations,
+        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
+        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
+        width=900,
+        height=700,
+        template='plotly_white'
+    )
+    return fig
+def create_cpd_table(cpd):
+    """
+    創建條件機率表的 DataFrame
+    Args:
+        cpd: CPD 物件
+    Returns:
+        pandas DataFrame
+    """
+    if cpd is None:
+        return pd.DataFrame()
+    # 獲取變數資訊
+    variable = cpd.variable
+    evidence_vars = cpd.variables[1:] if len(cpd.variables) > 1 else []
+    # 如果是根節點(沒有父節點)
+    if not evidence_vars:
+        values = np.round(cpd.values.flatten(), 4)
+        df = pd.DataFrame(
+            {variable: values},
+            index=[f"{variable}({i})" for i in range(len(values))]
+        )
+        return df
+    # 有父節點的情況
+    evidence_card = cpd.cardinality[1:]
+    # 生成多層索引欄位
+    from itertools import product
+    column_values = list(product(*[range(card) for card in evidence_card]))
+    # 創建欄位名稱
+    columns = pd.MultiIndex.from_tuples(
+        [tuple(f"{var}({val})" for var, val in zip(evidence_vars, vals))
+         for vals in column_values],
+        names=evidence_vars
+    )
+    # 重塑 CPD 值
+    reshaped_values = cpd.values.reshape(len(cpd.values), -1)
+    reshaped_values = np.round(reshaped_values, 4)
+    # 創建 DataFrame
+    df = pd.DataFrame(
+        reshaped_values,
+        index=[f"{variable}({i})" for i in range(len(cpd.values))],
+        columns=columns
+    )
+    return df
+def create_metrics_comparison_table(train_metrics, test_metrics):
+    """
+    創建訓練集和測試集指標比較表
+    Args:
+        train_metrics: 訓練集指標字典
+        test_metrics: 測試集指標字典
+    Returns:
+        pandas DataFrame
+    """
+    metrics_data = {
+        'Metric': [
+            'Accuracy', 'Precision', 'Recall', 'F1-Score',
+            'AUC', 'G-mean', 'P-mean', 'Specificity'
+        ],
+        'Training Set': [
+            f"{train_metrics['accuracy']:.2f}%",
+            f"{train_metrics['precision']:.2f}%",
+            f"{train_metrics['recall']:.2f}%",
+            f"{train_metrics['f1']:.2f}%",
+            f"{train_metrics['auc']:.4f}",
+            f"{train_metrics['g_mean']:.2f}%",
+            f"{train_metrics['p_mean']:.2f}%",
+            f"{train_metrics['specificity']:.2f}%"
+        ],
+        'Test Set': [
+            f"{test_metrics['accuracy']:.2f}%",
+            f"{test_metrics['precision']:.2f}%",
+            f"{test_metrics['recall']:.2f}%",
+            f"{test_metrics['f1']:.2f}%",
+            f"{test_metrics['auc']:.4f}",
+            f"{test_metrics['g_mean']:.2f}%",
+            f"{test_metrics['p_mean']:.2f}%",
+            f"{test_metrics['specificity']:.2f}%"
+        ]
+    }
+    df = pd.DataFrame(metrics_data)
+    return df
+def export_results_to_json(results, filename="analysis_results.json"):
+    """
+    將結果匯出為 JSON 格式
+    Args:
+        results: 分析結果字典
+        filename: 檔案名稱
+    Returns:
+        JSON 字串
+    """
+    import json
+    # 移除無法序列化的物件
+    exportable_results = {
+        'parameters': results['parameters'],
+        'train_metrics': {
+            k: v for k, v in results['train_metrics'].items()
+            if k not in ['fpr', 'tpr', 'predicted_probs']
+        },
+        'test_metrics': {
+            k: v for k, v in results['test_metrics'].items()
+            if k not in ['fpr', 'tpr', 'predicted_probs']
+        },
+        'scores': results['scores'],
+        'network_edges': list(results['model'].edges()),
+        'timestamp': results['timestamp']
+    }
+    return json.dumps(exportable_results, indent=2)
+def calculate_performance_gap(train_metrics, test_metrics):
+    """
+    計算訓練集和測試集之間的效能差距
+    Args:
+        train_metrics: 訓練集指標
+        test_metrics: 測試集指標
+    Returns:
+        dict: 效能差距字典
+    """
+    gaps = {
+        'accuracy_gap': train_metrics['accuracy'] - test_metrics['accuracy'],
+        'precision_gap': train_metrics['precision'] - test_metrics['precision'],
+        'recall_gap': train_metrics['recall'] - test_metrics['recall'],
+        'f1_gap': train_metrics['f1'] - test_metrics['f1'],
+        'auc_gap': train_metrics['auc'] - test_metrics['auc']
+    }
+    # 判斷是否有過擬合
+    avg_gap = np.mean([abs(v) for v in gaps.values()])
+    overfitting_status = "High" if avg_gap > 10 else "Moderate" if avg_gap > 5 else "Low"
+    gaps['average_gap'] = avg_gap
+    gaps['overfitting_risk'] = overfitting_status
+    return gaps