Spaces:

Wen1201
/

BayesianPyMc1

Sleeping

App Files Files Community

Wen1201 commited on Jan 15

Commit

662a0be

verified ·

1 Parent(s): 754d71d

Upload 7 files

Browse files

Files changed (7) hide show

README.md +287 -7
app.py +654 -0
bayesian_core.py +306 -0
bayesian_llm_assistant.py +349 -0
bayesian_utils.py +349 -0
pokemon_speed_meta_results.csv +19 -0
requirements.txt +12 -0

README.md CHANGED Viewed

@@ -1,12 +1,292 @@
 ---
-title: BayesianPyMc1
-emoji: 💻
-colorFrom: pink
-colorTo: purple
-sdk: gradio
-sdk_version: 6.3.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Pokemon Speed Bayesian Analysis System
+emoji: 🔬
+colorFrom: blue
+colorTo: indigo
+sdk: streamlit
+sdk_version: 1.31.0
 app_file: app.py
 pinned: false
+---
+# 🎲 貝氏階層模型分析系統 - 寶可夢速度對勝率影響分析
+## 📋 系統簡介
+這是一個基於 Streamlit 和 PyMC 的貝氏階層模型分析系統，專為分析寶可夢速度對勝率的影響而設計，結合 AI 助手提供統計解釋和對戰策略建議。
+## 🎯 主要功能
+### 1. 貝氏階層模型分析
+- ✅ MCMC 抽樣（NUTS 採樣器）
+- ✅ 後驗分佈估計
+- ✅ 95% HDI（最高密度區間）
+- ✅ 階層結構（借用資訊）
+- ✅ 異質性評估（sigma）
+- ✅ 收斂診斷（R-hat, ESS）
+### 2. 四種視覺化圖表
+- 📈 **Trace Plot**: MCMC 收斂診斷
+- 📊 **Posterior Plot**: 後驗分佈圖
+- 🌲 **Forest Plot**: 各屬性效應圖
+- 🔍 **DAG 圖**: 模型結構圖
+### 3. AI 智能助手
+- 💬 自然語言對話
+- 📖 統計指標解釋
+- 🎮 對戰策略建議
+- 📚 貝氏統計教學
+- 🔍 結果深度分析
+## 📦 安裝步驟
+### 1. Python 環境要求
+```bash
+# 需要 Python 3.11
+python --version  # 應顯示 Python 3.11.x
+```
+### 2. 安裝依賴套件
+```bash
+pip install -r requirements_bayesian.txt
+```
+### 3. 安裝 Graphviz（用於生成 DAG 圖）
+#### Windows
+```bash
+# 使用 Chocolatey
+choco install graphviz
+# 或從官網下載：https://graphviz.org/download/
+```
+#### macOS
+```bash
+brew install graphviz
+```
+#### Linux (Ubuntu/Debian)
+```bash
+sudo apt-get update
+sudo apt-get install graphviz
+```
+### 4. 準備資料
+將寶可夢速度對戰資料 CSV 檔放在同一目錄下，檔名為 `pokemon_speed_meta_results.csv`
+**資料格式要求：**
+| Trial_Type | rc | nc | rt | nt |
+|------------|----|----|----|----|
+| Water      | 45 | 100| 62 | 100|
+| Fire       | 38 | 100| 55 | 100|
+| Grass      | 42 | 100| 58 | 100|
+**欄位說明：**
+- `Trial_Type`: 屬性名稱（例如：Water, Fire, Grass）
+- `rc`: 控制組（速度慢）的勝場數
+- `nc`: 控制組的總場數
+- `rt`: 實驗組（速度快）的勝場數
+- `nt`: 實驗組的總場數
+### 5. 設定 Google Gemini API Key
+- 在系統左側邊欄輸入您的 Google Gemini API Key
+- API Key 用於 AI 助手功能
+- 獲取 API Key: https://makersuite.google.com/app/apikey
+### 6. 執行程式
+```bash
+streamlit run app_bayesian.py
+```
+## 🔧 檔案結構
+```
+bayesian_analysis/
+├── app_bayesian.py              # Streamlit 主程式
+├── bayesian_core.py             # 貝氏模型核心邏輯
+├── bayesian_llm_assistant.py    # AI 對話助手
+├── bayesian_utils.py            # 視覺化工具
+├── requirements_bayesian.txt    # 依賴套件
+├── README.md                    # 說明文件
+└── pokemon_speed_meta_results.csv  # 資料檔（需自行準備）
+```
+## 📊 使用方式
+### Step 1: 載入資料
+1. 選擇「使用預設資料集」或「上傳您的資料」
+2. 如果上傳，請確保 CSV 格式正確
+### Step 2: 設定 MCMC 參數
+在左側邊欄設定：
+- **抽樣數 (Samples)**: 預設 2000（建議 1000-5000）
+- **調整期 (Tune)**: 預設 1000（建議 500-2000）
+- **鏈數 (Chains)**: 預設 2（建議 2-4）
+- **目標接受率**: 預設 0.95（建議 0.90-0.99）
+### Step 3: 執行分析
+1. 點擊「開始貝氏分析」按鈕
+2. 等待 MCMC 抽樣完成（可能需要數分鐘）
+3. 查看結果的五個子頁面：
+   - **📊 概覽**: 關鍵指標和摘要
+   - **📈 Trace & Posterior**: 收斂診斷和後驗分佈
+   - **🌲 Forest Plot**: 各屬性效應比較
+   - **🔍 DAG 模型圖**: 模型結構視覺化
+   - **📋 詳細報告**: 完整文字報告
+### Step 4: 使用 AI 助手
+1. 切換到「AI 助手」頁面
+2. 在聊天框輸入問題，或點擊快速問題按鈕
+3. AI 會根據分析結果提供解釋和建議
+## 💡 統計指標說明
+### 貝氏階層模型參數
+#### d (整體效應)
+- **意義**: 所有屬性的平均對數勝算比
+- **解讀**: d > 0 表示速度快整體有利
+#### sigma (屬性間變異)
+- **意義**: 不同屬性對速度反應的差異程度
+- **解讀**:
+  - sigma < 0.3: 低異質性（屬性反應相似）
+  - 0.3 ≤ sigma ≤ 0.5: 中等異質性
+  - sigma > 0.5: 高異質性（屬性反應差異大）
+#### or_speed (勝算比)
+- **意義**: exp(d)，速度快的勝算倍數
+- **解讀**: OR = 1.5 表示速度快的勝率是速度慢的 1.5 倍
+#### delta (屬性特定效應)
+- **意義**: 每個屬性的個別速度效應
+- **解讀**: delta[i] 告訴我們第 i 個屬性對速度的反應
+### HDI (Highest Density Interval)
+- **95% HDI**: 參數有 95% 機率落在此區間
+- **與信賴區間的差異**: HDI 是貝氏可信區間，直接表示參數的機率分佈
+### 收斂診斷
+#### R-hat
+- **目標**: < 1.1
+- **意義**: 鏈間與鏈內變異的比例
+- **解讀**: R-hat ≈ 1.0 表示良好收斂
+#### ESS (Effective Sample Size)
+- **目標**: > 100（最好 > 400）
+- **意義**: 考慮自相關後的有效樣本數
+- **解讀**: ESS 越高，估計越精確
+## 🎮 應用場景
+### 1. 速度重要性分析
+判斷速度對整體勝率的影響有多大
+### 2. 屬性異質性評估
+了解哪些屬性特別依賴速度，哪些不依賴
+### 3. 組隊策略制定
+根據統計結果選擇合適的寶可夢
+### 4. 教學用途
+學習貝氏階層模型的原理和應用
+## ⚙️ 技術架構
+### 核心技術
+- **Streamlit**: Web 應用框架
+- **PyMC**: 貝氏推論引擎
+- **ArviZ**: 貝氏分析視覺化
+- **pandas**: 資料處理
+- **plotly**: 互動式視覺化
+- **matplotlib**: 靜態圖表
+- **Google Gemini**: AI 助手
+### 模型特色
+- ✅ 階層結構（借用資訊）
+- ✅ NUTS 採樣器（自動調整）
+- ✅ 完整的收斂診斷
+- ✅ 多鏈並行
+- ✅ Session 隔離（多用戶支援）
+## 🔒 隱私與安全
+- 所有分析在本地執行
+- Session 資料獨立儲存
+- 超過 1 小時自動清理
+- API Key 不會被儲存
+## 📝 範例問題（給 AI 助手）
+### 基礎概念
+- "什麼是貝氏階層模型？"
+- "為什麼要用 HDI 而不是 p 值？"
+- "什麼是收縮效應？"
+### 結果解讀
+- "為什麼 d 是正的？這代表什麼？"
+- "sigma 告訴我們什麼資訊？"
+- "哪些屬性對速度最敏感？"
+### 實戰應用
+- "我該如何組建隊伍？"
+- "飛行系需要速度嗎？"
+- "這個結果對競技對戰有什麼啟示？"
+### 統計方法
+- "貝氏和頻率論有什麼差別？"
+- "R-hat 為什麼重要？"
+- "為什麼用階層模型而不是分開分析？"
+## 🚀 進階功能
+### 自訂 MCMC 參數
+根據資料大小和複雜度調整：
+- 資料少（< 10 個屬性）: Samples=1000, Chains=2
+- 資料中等（10-20 個屬性）: Samples=2000, Chains=2
+- 資料多（> 20 個屬性）: Samples=3000, Chains=4
+### 診斷技巧
+1. 檢查 Trace Plot 是否平穩
+2. 確認 R-hat < 1.1
+3. 確保 ESS > 400
+4. 觀察後驗分佈是否合理
+## 🐛 常見問題
+### Q1: DAG 圖無法生成
+**A**: 需要安裝系統級 Graphviz，請參考「安裝步驟」第 3 步
+### Q2: MCMC 很慢
+**A**:
+- 減少抽樣數或鏈數
+- 使用更強的電腦
+- 簡化模型結構
+### Q3: R-hat > 1.1
+**A**:
+- 增加抽樣數
+- 增加調整期
+- 增加鏈數
+- 檢查資料是否有問題
+### Q4: AI 助手回應錯誤
+**A**:
+- 檢查 API Key 是否正確
+- 確認網路連線
+- 重新整理頁面
+## 📧 聯絡資訊
+如有問題或建議，歡迎聯繫開發團隊。
+## 📄 授權
+本專案僅供學術研究和教學使用。
 ---
+**Powered by Streamlit, PyMC & Google Gemini** 🎲

app.py ADDED Viewed

	@@ -0,0 +1,654 @@

+import streamlit as st
+import pandas as pd
+import uuid
+from datetime import datetime, timedelta
+import atexit
+import os
+import sys
+# 頁面配置
+st.set_page_config(
+    page_title="Bayesian Hierarchical Model - Pokémon Speed Analysis",
+    page_icon="🎲",
+    layout="wide",
+    initial_sidebar_state="expanded"
+)
+# 自定義 CSS
+st.markdown("""
+<style>
+    .streamlit-expanderHeader {
+        background-color: #e8f1f8;
+        border: 1px solid #b0cfe8;
+        border-radius: 5px;
+        font-weight: 600;
+        color: #1b4f72;
+    }
+    .streamlit-expanderHeader:hover {
+        background-color: #d0e7f8;
+    }
+    .stMetric {
+        background-color: #f8fbff;
+        padding: 10px;
+        border-radius: 5px;
+        border: 1px solid #d0e4f5;
+    }
+    .stButton > button {
+        width: 100%;
+        border-radius: 20px;
+        font-weight: 600;
+        transition: all 0.3s ease;
+    }
+    .stButton > button:hover {
+        transform: translateY(-2px);
+        box-shadow: 0 4px 8px rgba(0,0,0,0.2);
+    }
+    .success-box {
+        background-color: #d4edda;
+        border: 1px solid #c3e6cb;
+        border-radius: 5px;
+        padding: 10px;
+        margin: 10px 0;
+    }
+    .warning-box {
+        background-color: #fff3cd;
+        border: 1px solid #ffeaa7;
+        border-radius: 5px;
+        padding: 10px;
+        margin: 10px 0;
+    }
+</style>
+""", unsafe_allow_html=True)
+# 導入自定義模組
+from bayesian_core import BayesianHierarchicalAnalyzer
+from bayesian_llm_assistant import BayesianLLMAssistant
+from bayesian_utils import (
+    plot_trace,
+    plot_posterior,
+    plot_forest,
+    plot_model_dag,
+    create_summary_table,
+    create_trial_results_table,
+    export_results_to_text,
+    plot_odds_ratio_comparison
+)
+# 清理函數
+def cleanup_old_sessions():
+    """清理超過 1 小時的 session"""
+    current_time = datetime.now()
+    for session_id in list(BayesianHierarchicalAnalyzer._session_results.keys()):
+        result = BayesianHierarchicalAnalyzer._session_results.get(session_id)
+        if result:
+            result_time = datetime.fromisoformat(result['timestamp'])
+            if current_time - result_time > timedelta(hours=1):
+                BayesianHierarchicalAnalyzer.clear_session_results(session_id)
+# 註冊清理函數
+atexit.register(cleanup_old_sessions)
+# 初始化 session state
+if 'session_id' not in st.session_state:
+    st.session_state.session_id = str(uuid.uuid4())
+if 'analysis_results' not in st.session_state:
+    st.session_state.analysis_results = None
+if 'chat_history' not in st.session_state:
+    st.session_state.chat_history = []
+if 'analyzer' not in st.session_state:
+    st.session_state.analyzer = None
+if 'trace_img' not in st.session_state:
+    st.session_state.trace_img = None
+if 'posterior_img' not in st.session_state:
+    st.session_state.posterior_img = None
+if 'forest_img' not in st.session_state:
+    st.session_state.forest_img = None
+if 'dag_img' not in st.session_state:
+    st.session_state.dag_img = None
+# 標題
+st.title("🎲 Bayesian Hierarchical Model Analysis")
+st.markdown("### 寶可夢速度對勝率影響的貝氏階層分析")
+st.markdown("---")
+# Sidebar
+with st.sidebar:
+    st.header("⚙️ 配置設定")
+    # Google Gemini API Key
+    api_key = st.text_input(
+        "Google Gemini API Key",
+        type="password",
+        help="輸入您的 Google Gemini API Key 以使用 AI 助手"
+    )
+    if api_key:
+        st.session_state.api_key = api_key
+        st.success("✅ API Key 已載入")
+    st.markdown("---")
+    # MCMC 參數設定
+    st.subheader("🔬 MCMC 參數")
+    n_samples = st.number_input(
+        "抽樣數 (Samples)",
+        min_value=500,
+        max_value=10000,
+        value=2000,
+        step=500,
+        help="每條鏈的抽樣數量"
+    )
+    n_tune = st.number_input(
+        "調整期 (Tune)",
+        min_value=200,
+        max_value=5000,
+        value=1000,
+        step=200,
+        help="調整期的樣本數"
+    )
+    n_chains = st.selectbox(
+        "鏈數 (Chains)",
+        options=[1, 2, 4],
+        index=1,
+        help="平行運行的鏈數"
+    )
+    target_accept = st.slider(
+        "目標接受率",
+        min_value=0.80,
+        max_value=0.99,
+        value=0.95,
+        step=0.01,
+        help="NUTS 採樣器的目標接受率"
+    )
+    st.markdown("---")
+    # 清理按鈕
+    if st.button("🧹 清理過期資料"):
+        cleanup_old_sessions()
+        st.success("✅ 清理完成")
+        st.rerun()
+    st.markdown("---")
+    # 資料來源選擇
+    st.subheader("📊 資料來源")
+    data_source = st.radio(
+        "選擇資料來源：",
+        ["使用預設資料集", "上傳您的資料"]
+    )
+    uploaded_file = None
+    if data_source == "上傳您的資料":
+        uploaded_file = st.file_uploader(
+            "上傳 CSV 檔案",
+            type=['csv'],
+            help="上傳寶可夢速度對戰資料"
+        )
+        with st.expander("📖 資料格式說明"):
+            st.markdown("""
+            **必要欄位格式：**
+            - `Trial_Type`: 屬性名稱（例如：Water, Fire, Grass）
+            - `rc`: 控制組（速度慢）的勝場數
+            - `nc`: 控制組的總場數
+            - `rt`: 實驗組（速度快）的勝場數
+            - `nt`: 實驗組的總場數
+            **範例：**
+            ```
+            Trial_Type,rc,nc,rt,nt
+            Water,45,100,62,100
+            Fire,38,100,55,100
+            Grass,42,100,58,100
+            ```
+            """)
+    st.markdown("---")
+    # 關於系統
+    with st.expander("ℹ️ 關於此系統"):
+        st.markdown("""
+        **貝氏階層模型分析系統**
+        本系統使用貝氏階層模型來分析速度對寶可夢勝率的影響，
+        並考慮不同屬性之間的異質性。
+        **主要功能：**
+        - 🎲 貝氏推論與後驗分佈
+        - 📊 階層模型（借用資訊）
+        - 📈 4 種視覺化圖表
+        - 💬 AI 助手解釋
+        - 🎮 對戰策略建議
+        **適用場景：**
+        - 分析速度對不同屬性的影響
+        - 理解屬性間的異質性
+        - 制定基於統計的對戰策略
+        """)
+# 主要內容區 - 雙 Tab
+tab1, tab2 = st.tabs(["📊 貝氏分析", "💬 AI 助手"])
+# Tab 1: 貝氏分析
+with tab1:
+    st.header("📊 貝氏階層模型分析")
+    # 載入資料
+    if data_source == "使用預設資料集":
+        # 檢查預設資料是否存在
+        default_data_path = "pokemon_speed_meta_results.csv"
+        if os.path.exists(default_data_path):
+            df = pd.read_csv(default_data_path)
+            st.success(f"✅ 已載入預設資料集（{len(df)} 個屬性）")
+        else:
+            st.warning("⚠️ 找不到預設資料集，請上傳您的資料")
+            df = None
+    else:
+        if uploaded_file is not None:
+            df = pd.read_csv(uploaded_file)
+            st.success(f"✅ 已載入資料（{len(df)} 個屬性）")
+        else:
+            df = None
+            st.info("📁 請在左側上傳 CSV 檔案")
+    if df is not None:
+        # 顯示資料預覽
+        with st.expander("👀 資料預覽"):
+            st.dataframe(df, use_container_width=True)
+        st.markdown("---")
+        # 分析按鈕
+        col1, col2, col3 = st.columns([1, 2, 1])
+        with col2:
+            analyze_button = st.button(
+                "🔬 開始貝氏分析",
+                type="primary",
+                use_container_width=True
+            )
+        # 執行分析
+        if analyze_button:
+            with st.spinner(f"正在執行貝氏分析... (抽樣 {n_samples} × {n_chains} 條鏈)"):
+                try:
+                    # 初始化分析器
+                    if st.session_state.analyzer is None:
+                        st.session_state.analyzer = BayesianHierarchicalAnalyzer(st.session_state.session_id)
+                    # 載入資料
+                    st.session_state.analyzer.load_data(df)
+                    # 執行分析
+                    results = st.session_state.analyzer.run_analysis(
+                        n_samples=n_samples,
+                        n_tune=n_tune,
+                        n_chains=n_chains,
+                        target_accept=target_accept
+                    )
+                    st.session_state.analysis_results = results
+                    # 生成圖表
+                    with st.spinner("生成視覺化圖表..."):
+                        st.session_state.trace_img = plot_trace(st.session_state.analyzer.trace)
+                        st.session_state.posterior_img = plot_posterior(st.session_state.analyzer.trace)
+                        st.session_state.forest_img = plot_forest(
+                            st.session_state.analyzer.trace,
+                            results['trial_labels']
+                        )
+                        st.session_state.dag_img = plot_model_dag(st.session_state.analyzer)
+                    st.success("✅ 分析完成！")
+                    st.balloons()
+                except Exception as e:
+                    st.error(f"❌ 分析失敗: {str(e)}")
+        # 顯示結果
+        if st.session_state.analysis_results is not None:
+            results = st.session_state.analysis_results
+            st.markdown("---")
+            st.subheader("📊 分析結果")
+            # 創建 4 個子頁面
+            result_tabs = st.tabs([
+                "📊 概覽",
+                "📈 Trace & Posterior",
+                "🌲 Forest Plot",
+                "🔍 DAG 模型圖",
+                "📋 詳細報告"
+            ])
+            # Tab: 概覽
+            with result_tabs[0]:
+                st.markdown("### 🎯 整體效應摘要")
+                overall = results['overall']
+                interp = results['interpretation']
+                # 關鍵指標
+                col1, col2, col3 = st.columns(3)
+                with col1:
+                    st.metric(
+                        "d (整體效應)",
+                        f"{overall['d_mean']:.4f}",
+                        delta=f"HDI: [{overall['d_hdi_low']:.3f}, {overall['d_hdi_high']:.3f}]"
+                    )
+                with col2:
+                    st.metric(
+                        "勝算比 (OR)",
+                        f"{overall['or_mean']:.3f}",
+                        delta=f"HDI: [{overall['or_hdi_low']:.3f}, {overall['or_hdi_high']:.3f}]"
+                    )
+                with col3:
+                    st.metric(
+                        "sigma (異質性)",
+                        f"{overall['sigma_mean']:.4f}",
+                        delta=f"HDI: [{overall['sigma_hdi_low']:.3f}, {overall['sigma_hdi_high']:.3f}]"
+                    )
+                st.markdown("---")
+                # 結果解釋
+                st.markdown("### 📖 結果解釋")
+                st.info(f"""
+                **整體效應**: {interp['overall_effect']}
+                **顯著性**: {interp['overall_significance']}
+                **效果大小**: {interp['effect_size']}
+                **異質性**: {interp['heterogeneity']}
+                """)
+                st.markdown("---")
+                # 收斂診斷
+                st.markdown("### 🔍 模型收斂診斷")
+                diag = results['diagnostics']
+                col1, col2 = st.columns(2)
+                with col1:
+                    st.markdown("**R-hat 診斷** (應 < 1.1):")
+                    if diag['rhat_d']:
+                        st.metric("R-hat (d)", f"{diag['rhat_d']:.4f}",
+                                 delta="✓ 良好" if diag['rhat_d'] < 1.1 else "✗ 需改善")
+                    if diag['rhat_sigma']:
+                        st.metric("R-hat (sigma)", f"{diag['rhat_sigma']:.4f}",
+                                 delta="✓ 良好" if diag['rhat_sigma'] < 1.1 else "✗ 需改善")
+                with col2:
+                    st.markdown("**有效樣本數 (ESS)**:")
+                    if diag['ess_d']:
+                        st.metric("ESS (d)", f"{int(diag['ess_d'])}")
+                    if diag['ess_sigma']:
+                        st.metric("ESS (sigma)", f"{int(diag['ess_sigma'])}")
+                if diag['converged']:
+                    st.success("✅ 模型已收斂，結果可信")
+                else:
+                    st.warning("⚠️ 模型可能未完全收斂，建議增加抽樣數或鏈數")
+                st.markdown("---")
+                # 摘要表格
+                st.markdown("### 📊 統計摘要表")
+                summary_df = create_summary_table(results)
+                st.dataframe(summary_df, use_container_width=True)
+                st.markdown("---")
+                # 各屬性結果
+                st.markdown("### 🎮 各屬性詳細結果")
+                trial_df = create_trial_results_table(results)
+                st.dataframe(trial_df, use_container_width=True)
+                st.markdown("---")
+                # 勝算比比較圖
+                st.markdown("### 📊 各屬性速度效應比較")
+                or_fig = plot_odds_ratio_comparison(results)
+                st.plotly_chart(or_fig, use_container_width=True)
+            # Tab: Trace & Posterior
+            with result_tabs[1]:
+                st.markdown("### 📈 Trace Plot（收斂診斷）")
+                st.markdown("""
+                **Trace Plot 用途**：
+                - 檢查 MCMC 抽樣是否收斂
+                - 左圖：抽樣軌跡（應該像「毛毛蟲」）
+                - 右圖：後驗分佈密度
+                """)
+                if st.session_state.trace_img:
+                    st.image(st.session_state.trace_img, use_column_width=True)
+                else:
+                    st.info("請先執行分析以生成 Trace Plot")
+                st.markdown("---")
+                st.markdown("### 📊 Posterior Plot（後驗分佈）")
+                st.markdown("""
+                **Posterior Plot 用途**：
+                - 顯示參數的後驗分佈
+                - 包含 95% HDI（最高密度區間）
+                - 顯示平均值
+                """)
+                if st.session_state.posterior_img:
+                    st.image(st.session_state.posterior_img, use_column_width=True)
+                else:
+                    st.info("請先執行分析以生成 Posterior Plot")
+            # Tab: Forest Plot
+            with result_tabs[2]:
+                st.markdown("### 🌲 Forest Plot（各屬性效應）")
+                st.markdown("""
+                **Forest Plot 用途**：
+                - 顯示每個屬性的速度效應（delta）
+                - 點：平均效應
+                - 線：95% HDI
+                - ★ 標記：顯著正效應（HDI 不包含 0）
+                - ☆ 標記：顯著負效應
+                """)
+                if st.session_state.forest_img:
+                    st.image(st.session_state.forest_img, use_column_width=True)
+                else:
+                    st.info("請先執行分析以生成 Forest Plot")
+            # Tab: DAG 模型圖
+            with result_tabs[3]:
+                st.markdown("### 🔍 模型結構圖 (DAG)")
+                st.markdown("""
+                **DAG（有向無環圖）用途**：
+                - 視覺化模型的階層結構
+                - 顯示變數之間的依賴關係
+                - 圓形/橢圓：隨機變數
+                - 矩形：觀測資料
+                - 菱形：推導變數
+                """)
+                if st.session_state.dag_img:
+                    st.image(st.session_state.dag_img, use_column_width=True)
+                else:
+                    st.warning("⚠️ 無法生成 DAG 圖（可能需要安裝 Graphviz）")
+                    st.markdown("""
+                    **安裝 Graphviz:**
+                    - Windows: `choco install graphviz`
+                    - Mac: `brew install graphviz`
+                    - Ubuntu: `sudo apt-get install graphviz`
+                    """)
+            # Tab: 詳細報告
+            with result_tabs[4]:
+                st.markdown("### 📋 完整分析報告")
+                # 生成文字報告
+                text_report = export_results_to_text(results)
+                st.text_area(
+                    "報告內容",
+                    text_report,
+                    height=500
+                )
+                # 下載按鈕
+                st.download_button(
+                    label="📥 下載完整報告 (.txt)",
+                    data=text_report,
+                    file_name=f"bayesian_report_{results['timestamp'][:10]}.txt",
+                    mime="text/plain"
+                )
+# Tab 2: AI 助手
+with tab2:
+    st.header("💬 AI 分析助手")
+    if not st.session_state.get('api_key'):
+        st.warning("⚠️ 請在左側輸入您的 Google Gemini API Key 以使用 AI 助手")
+    elif st.session_state.analysis_results is None:
+        st.info("ℹ️ 請先在「貝氏分析」頁面執行分析")
+    else:
+        # 初始化 LLM 助手
+        if 'llm_assistant' not in st.session_state:
+            st.session_state.llm_assistant = BayesianLLMAssistant(
+                api_key=st.session_state.api_key,
+                session_id=st.session_state.session_id
+            )
+        # 聊天容器
+        chat_container = st.container()
+        with chat_container:
+            for message in st.session_state.chat_history:
+                with st.chat_message(message["role"]):
+                    st.markdown(message["content"])
+        # 使用者輸入
+        if prompt := st.chat_input("詢問關於分析結果的任何問題..."):
+            # 添加使用者訊息
+            st.session_state.chat_history.append({
+                "role": "user",
+                "content": prompt
+            })
+            with st.chat_message("user"):
+                st.markdown(prompt)
+            # AI 回應
+            with st.chat_message("assistant"):
+                with st.spinner("思考中..."):
+                    try:
+                        response = st.session_state.llm_assistant.get_response(
+                            user_message=prompt,
+                            analysis_results=st.session_state.analysis_results
+                        )
+                        st.markdown(response)
+                    except Exception as e:
+                        error_msg = f"❌ 錯誤: {str(e)}\n\n請檢查 API key 或重新表達問題。"
+                        st.error(error_msg)
+                        response = error_msg
+            # 添加助手回應
+            st.session_state.chat_history.append({
+                "role": "assistant",
+                "content": response
+            })
+        st.markdown("---")
+        # 快速問題按鈕
+        st.subheader("💡 快速問題")
+        quick_questions = [
+            "📊 給我這次分析的總結",
+            "🎯 解釋 d 和勝算比",
+            "🔍 解釋 sigma（異質性）",
+            "❓ 什麼是階層模型？",
+            "🆚 貝氏 vs 頻率論",
+            "⚔️ 對戰策略建議",
+            "🎮 比較不同屬性"
+        ]
+        cols = st.columns(4)
+        for idx, question in enumerate(quick_questions):
+            col_idx = idx % 4
+            if cols[col_idx].button(question, key=f"quick_{idx}"):
+                # 根據問題選擇對應的方法
+                if "總結" in question:
+                    response = st.session_state.llm_assistant.generate_summary(
+                        st.session_state.analysis_results
+                    )
+                elif "d 和勝算比" in question:
+                    response = st.session_state.llm_assistant.explain_metric(
+                        'd',
+                        st.session_state.analysis_results
+                    )
+                elif "sigma" in question or "異質性" in question:
+                    response = st.session_state.llm_assistant.explain_metric(
+                        'sigma',
+                        st.session_state.analysis_results
+                    )
+                elif "階層模型" in question:
+                    response = st.session_state.llm_assistant.explain_hierarchical_model()
+                elif "貝氏" in question and "頻率論" in question:
+                    response = st.session_state.llm_assistant.explain_bayesian_vs_frequentist()
+                elif "策略" in question:
+                    response = st.session_state.llm_assistant.battle_strategy_advice(
+                        st.session_state.analysis_results
+                    )
+                elif "比較" in question:
+                    response = st.session_state.llm_assistant.compare_types(
+                        st.session_state.analysis_results
+                    )
+                else:
+                    response = st.session_state.llm_assistant.get_response(
+                        question,
+                        st.session_state.analysis_results
+                    )
+                st.session_state.chat_history.append({
+                    "role": "user",
+                    "content": question
+                })
+                st.session_state.chat_history.append({
+                    "role": "assistant",
+                    "content": response
+                })
+                st.rerun()
+        # 重置對話按鈕
+        st.markdown("---")
+        if st.button("🔄 重置對話"):
+            st.session_state.llm_assistant.reset_conversation()
+            st.session_state.chat_history = []
+            st.success("✅ 對話已重置")
+            st.rerun()
+# Footer
+st.markdown("---")
+st.markdown(
+    f"""
+    <div style='text-align: center'>
+        <p>🎲 Bayesian Hierarchical Model Analysis for Pokémon Speed | Built with Streamlit & PyMC</p>
+        <p>Session ID: {st.session_state.session_id[:8]} | Powered by Google Gemini 2.0 Flash</p>
+    </div>
+    """,
+    unsafe_allow_html=True
+)

bayesian_core.py ADDED Viewed

	@@ -0,0 +1,306 @@

+import pandas as pd
+import numpy as np
+import pymc as pm
+import arviz as az
+import threading
+from datetime import datetime
+import warnings
+warnings.filterwarnings('ignore')
+class BayesianHierarchicalAnalyzer:
+    """
+    貝氏階層模型分析器
+    用於分析寶可夢速度對勝率的影響（跨屬性）
+    """
+    # 類別級的鎖，用於執行緒安全
+    _lock = threading.Lock()
+    # 儲存各 session 的分析結果
+    _session_results = {}
+    def __init__(self, session_id):
+        """
+        初始化分析器
+        Args:
+            session_id: 唯一的 session 識別碼
+        """
+        self.session_id = session_id
+        self.df = None
+        self.model = None
+        self.trace = None
+    def load_data(self, csv_path_or_df):
+        """
+        載入資料
+        Args:
+            csv_path_or_df: CSV 檔案路徑或 DataFrame
+        Expected columns:
+            - Trial_Type: 屬性名稱 (e.g., Water, Fire, Grass)
+            - rc: 控制組（速度慢）的勝場數
+            - nc: 控制組的總場數
+            - rt: 實驗組（速度快）的勝場數
+            - nt: 實驗組的總場數
+        """
+        if isinstance(csv_path_or_df, str):
+            self.df = pd.read_csv(csv_path_or_df)
+        else:
+            self.df = csv_path_or_df.copy()
+        # 驗證必要欄位
+        required_cols = ['Trial_Type', 'rc', 'nc', 'rt', 'nt']
+        missing_cols = [col for col in required_cols if col not in self.df.columns]
+        if missing_cols:
+            raise ValueError(f"資料缺少必要欄位: {missing_cols}")
+        return True
+    def validate_data(self):
+        """驗證資料有效性"""
+        if self.df is None:
+            raise ValueError("請先載入資料")
+        # 檢查數值欄位
+        for col in ['rc', 'nc', 'rt', 'nt']:
+            if not pd.api.types.is_numeric_dtype(self.df[col]):
+                raise ValueError(f"欄位 {col} 必須是數值類型")
+        # 檢查邏輯約束
+        if (self.df['rc'] > self.df['nc']).any():
+            raise ValueError("rc (勝場數) 不能大於 nc (總場數)")
+        if (self.df['rt'] > self.df['nt']).any():
+            raise ValueError("rt (勝場數) 不能大於 nt (總場數)")
+        return True
+    def run_analysis(self, n_samples=2000, n_tune=1000, n_chains=2, target_accept=0.95):
+        """
+        執行貝氏階層模型分析
+        Args:
+            n_samples: MCMC 抽樣數
+            n_tune: 調整期樣本數
+            n_chains: 鏈數
+            target_accept: 目標接受率
+        Returns:
+            dict: 包含所有分析結果的字典
+        """
+        with self._lock:
+            try:
+                self.validate_data()
+                # 準備資料
+                trial_labels = self.df['Trial_Type'].values
+                num_trials = len(self.df)
+                # 建立模型
+                with pm.Model() as self.model:
+                    # --- 先驗分佈 (Priors) ---
+                    d = pm.Normal('d', mu=0, sigma=10)
+                    tau = pm.Gamma('tau', alpha=0.001, beta=0.001)
+                    sigma = pm.Deterministic('sigma', 1 / pm.math.sqrt(tau))
+                    # --- 各屬性特定效應 (Trial-specific effects) ---
+                    mu = pm.Normal('mu', mu=0, sigma=10, shape=num_trials)
+                    delta = pm.Normal('delta', mu=d, sigma=1 / pm.math.sqrt(tau), shape=num_trials)
+                    # --- 轉換與似然函數 (Logit Link & Likelihood) ---
+                    pc = pm.Deterministic('pc', pm.math.invlogit(mu))
+                    pt = pm.Deterministic('pt', pm.math.invlogit(mu + delta))
+                    rc_obs = pm.Binomial('rc_obs', n=self.df['nc'].values, p=pc, observed=self.df['rc'].values)
+                    rt_obs = pm.Binomial('rt_obs', n=self.df['nt'].values, p=pt, observed=self.df['rt'].values)
+                    # --- 其他統計量 ---
+                    delta_new = pm.Normal('delta_new', mu=d, sigma=1 / pm.math.sqrt(tau))
+                    or_speed = pm.Deterministic('or_speed', pm.math.exp(d))
+                    # 執行 MCMC 抽樣
+                    self.trace = pm.sample(
+                        draws=n_samples,
+                        tune=n_tune,
+                        chains=n_chains,
+                        target_accept=target_accept,
+                        return_inferencedata=True,
+                        progressbar=False  # 在 Streamlit 中關閉進度條
+                    )
+                # 生成摘要統計
+                summary = az.summary(self.trace, var_names=['d', 'sigma', 'or_speed'], hdi_prob=0.95)
+                # 計算各屬性的 delta 統計量
+                delta_posterior = self.trace.posterior['delta'].values.reshape(-1, num_trials)
+                delta_mean = delta_posterior.mean(axis=0)
+                delta_std = delta_posterior.std(axis=0)
+                delta_hdi = az.hdi(self.trace, var_names=['delta'], hdi_prob=0.95)['delta'].values
+                # 判斷顯著性（HDI 不包含 0）
+                delta_significant = (delta_hdi[:, 0] > 0) | (delta_hdi[:, 1] < 0)
+                # 計算控制組和實驗組的勝率
+                pc_posterior = self.trace.posterior['pc'].values.reshape(-1, num_trials)
+                pt_posterior = self.trace.posterior['pt'].values.reshape(-1, num_trials)
+                pc_mean = pc_posterior.mean(axis=0)
+                pt_mean = pt_posterior.mean(axis=0)
+                # 整理結果
+                results = {
+                    'timestamp': datetime.now().isoformat(),
+                    'n_trials': num_trials,
+                    'trial_labels': trial_labels.tolist(),
+                    # 整體效應
+                    'overall': {
+                        'd_mean': float(summary.loc['d', 'mean']),
+                        'd_sd': float(summary.loc['d', 'sd']),
+                        'd_hdi_low': float(summary.loc['d', 'hdi_2.5%']),
+                        'd_hdi_high': float(summary.loc['d', 'hdi_97.5%']),
+                        'sigma_mean': float(summary.loc['sigma', 'mean']),
+                        'sigma_sd': float(summary.loc['sigma', 'sd']),
+                        'sigma_hdi_low': float(summary.loc['sigma', 'hdi_2.5%']),
+                        'sigma_hdi_high': float(summary.loc['sigma', 'hdi_97.5%']),
+                        'or_mean': float(summary.loc['or_speed', 'mean']),
+                        'or_sd': float(summary.loc['or_speed', 'sd']),
+                        'or_hdi_low': float(summary.loc['or_speed', 'hdi_2.5%']),
+                        'or_hdi_high': float(summary.loc['or_speed', 'hdi_97.5%']),
+                    },
+                    # 各屬性的效應
+                    'by_trial': {
+                        'delta_mean': delta_mean.tolist(),
+                        'delta_std': delta_std.tolist(),
+                        'delta_hdi_low': delta_hdi[:, 0].tolist(),
+                        'delta_hdi_high': delta_hdi[:, 1].tolist(),
+                        'delta_significant': delta_significant.tolist(),
+                        'pc_mean': pc_mean.tolist(),
+                        'pt_mean': pt_mean.tolist(),
+                    },
+                    # 原始資料
+                    'data': self.df.to_dict('records'),
+                    # 模型參數
+                    'model_params': {
+                        'n_samples': n_samples,
+                        'n_tune': n_tune,
+                        'n_chains': n_chains,
+                        'target_accept': target_accept
+                    },
+                    # 收斂診斷
+                    'diagnostics': self._compute_diagnostics(summary),
+                    # 解釋
+                    'interpretation': self._interpret_results(
+                        summary.loc['or_speed', 'mean'],
+                        summary.loc['or_speed', 'hdi_2.5%'],
+                        summary.loc['or_speed', 'hdi_97.5%'],
+                        summary.loc['sigma', 'mean']
+                    )
+                }
+                # 儲存到 session results
+                self._session_results[self.session_id] = results
+                return results
+            except Exception as e:
+                raise Exception(f"分析失敗: {str(e)}")
+    def _compute_diagnostics(self, summary):
+        """計算收斂診斷指標"""
+        try:
+            # R-hat (應該接近 1.0)
+            rhat_d = float(summary.loc['d', 'r_hat']) if 'r_hat' in summary.columns else None
+            rhat_sigma = float(summary.loc['sigma', 'r_hat']) if 'r_hat' in summary.columns else None
+            # ESS (有效樣本數)
+            ess_d = float(summary.loc['d', 'ess_bulk']) if 'ess_bulk' in summary.columns else None
+            ess_sigma = float(summary.loc['sigma', 'ess_bulk']) if 'ess_bulk' in summary.columns else None
+            return {
+                'rhat_d': rhat_d,
+                'rhat_sigma': rhat_sigma,
+                'ess_d': ess_d,
+                'ess_sigma': ess_sigma,
+                'converged': (rhat_d is None or rhat_d < 1.1) and (rhat_sigma is None or rhat_sigma < 1.1)
+            }
+        except:
+            return {
+                'converged': None,
+                'rhat_d': None,
+                'rhat_sigma': None,
+                'ess_d': None,
+                'ess_sigma': None
+            }
+    def _interpret_results(self, or_mean, or_low, or_high, sigma_mean):
+        """解釋分析結果"""
+        # 整體效應顯著性
+        if or_low > 1:
+            overall_effect = "速度快的寶可夢顯著更容易獲勝"
+            overall_significance = "顯著正效應"
+        elif or_high < 1:
+            overall_effect = "速度慢的寶可夢顯著更容易獲勝（罕見）"
+            overall_significance = "顯著負效應"
+        else:
+            overall_effect = "速度對勝率無顯著影響"
+            overall_significance = "不顯著"
+        # 效果大小
+        if or_mean > 2:
+            effect_size = "大效果 (OR > 2)"
+        elif or_mean > 1.5:
+            effect_size = "中等效果 (OR > 1.5)"
+        elif or_mean > 1:
+            effect_size = "小效果 (OR > 1)"
+        elif or_mean == 1:
+            effect_size = "無差異 (OR = 1)"
+        else:
+            effect_size = "反向效果 (OR < 1)"
+        # 異質性評估
+        if sigma_mean > 0.5:
+            heterogeneity = "高異質性 - 不同屬性對速度的反應差異很大"
+        elif sigma_mean > 0.3:
+            heterogeneity = "中等異質性 - 不同屬性對速度的反應有一定差異"
+        else:
+            heterogeneity = "低異質性 - 不同屬性對速度的反應相似"
+        return {
+            'overall_effect': overall_effect,
+            'overall_significance': overall_significance,
+            'effect_size': effect_size,
+            'heterogeneity': heterogeneity
+        }
+    def get_model_graph(self):
+        """生成模型 DAG 圖（返回 graphviz 物件）"""
+        if self.model is None:
+            raise ValueError("請先執行分析")
+        try:
+            gv = pm.model_to_graphviz(self.model)
+            return gv
+        except Exception as e:
+            raise Exception(f"無法生成 DAG 圖: {str(e)}")
+    @classmethod
+    def get_session_results(cls, session_id):
+        """獲取特定 session 的結果"""
+        return cls._session_results.get(session_id)
+    @classmethod
+    def clear_session_results(cls, session_id):
+        """清除特定 session 的結果"""
+        if session_id in cls._session_results:
+            del cls._session_results[session_id]

bayesian_llm_assistant.py ADDED Viewed

	@@ -0,0 +1,349 @@

+import google.generativeai as genai
+import json
+class BayesianLLMAssistant:
+    """
+    貝氏階層模型 LLM 問答助手
+    協助用戶理解貝氏分析結果
+    """
+    def __init__(self, api_key, session_id):
+        """
+        初始化 LLM 助手
+        Args:
+            api_key: Google Gemini API key
+            session_id: 唯一的 session 識別碼
+        """
+        genai.configure(api_key=api_key)
+        self.model = genai.GenerativeModel('gemini-2.0-flash-exp')
+        self.session_id = session_id
+        self.conversation_history = []
+        # 系統提示詞
+        self.system_prompt = """You are an expert Bayesian statistician specializing in hierarchical models and meta-analysis, particularly in the context of Pokémon battle statistics.
+**IMPORTANT - Language Instruction:**
+- Always respond in the SAME language as the user's question
+- If user asks in Traditional Chinese (繁體中文), respond in Traditional Chinese
+- If user asks in English, respond in English
+- Maintain language consistency throughout the conversation
+你是一位精通貝氏階層模型和統合分析的統計專家，特別專注於寶可夢對戰統計分析。
+Your role is to help users understand Bayesian hierarchical model results analyzing how speed affects win rates across different Pokémon types.
+你的角色是幫助使用者理解貝氏階層模型分析結果，了解速度如何影響不同屬性寶可夢的勝率。
+You should:
+1. Explain Bayesian concepts in simple, accessible terms
+2. Interpret posterior distributions, HDI (Highest Density Interval), and credible intervals
+3. Explain hierarchical structure and why it's useful
+4. Help users understand heterogeneity (sigma) between types
+5. Discuss the practical significance of results for Pokémon battles
+6. Provide insights about which types benefit most from speed
+7. Suggest battle strategies based on the statistical findings
+8. Clarify differences between Bayesian and frequentist approaches
+9. Explain MCMC diagnostics (R-hat, ESS) when relevant
+你應該：
+1. 用簡單易懂的方式解釋貝氏概念
+2. 詮釋後驗分佈、HDI（最高密度區間）和可信區間
+3. 解釋階層結構及其優勢
+4. 幫助使用者理解屬性間的異質性（sigma）
+5. 討論統計結果對寶可夢對戰的實際意義
+6. 提供哪些屬性最受益於速度的見解
+7. 根據統計發現提出對戰策略建議
+8. 說明貝氏方法與頻率論方法的差異
+9. 適時解釋 MCMC 診斷指標（R-hat、ESS）
+Key concepts to explain when relevant:
+- **Bayesian Hierarchical Model**: Borrows strength across types, shrinkage effect
+- **Prior & Posterior**: How data updates beliefs
+- **HDI (Highest Density Interval)**: 95% most credible values
+- **d (overall effect)**: Average log odds ratio across all types
+- **sigma (between-type variation)**: How much types differ in speed effect
+- **delta (type-specific effects)**: Each type's individual speed effect
+- **Odds Ratio**: exp(d) - how much more likely fast Pokémon are to win
+- **MCMC**: Markov Chain Monte Carlo sampling method
+- **Convergence**: R-hat < 1.1, good ESS (effective sample size)
+重要概念解釋（當相關時）：
+- **貝氏階層模型**：跨屬性借用資訊，收縮效應
+- **先驗與後驗**：資料如何更新信念
+- **HDI（最高密度區間）**：95% 最可信的數值範圍
+- **d（整體效應）**：所有屬性的平均對數勝算比
+- **sigma（屬性間變異）**：不同屬性的速度效應差異程度
+- **delta（屬性特定效應）**：每個屬性的個別速度效應
+- **勝算比**：exp(d) - 速度快的寶可夢獲勝的可能性倍數
+- **MCMC**：馬可夫鏈蒙地卡羅抽樣方法
+- **收斂性**：R-hat < 1.1，良好的 ESS（有效樣本數）
+When discussing Pokémon battles:
+- Connect statistical findings to battle mechanics
+- Explain why speed matters (determines attack order)
+- Discuss type advantages and strategies
+- Use Pokémon-specific terminology naturally
+- Consider the meta-game implications
+討論寶可夢對戰時：
+- 將統計發現連結到對戰機制
+- 解釋為何速度重要（決定攻擊順序）
+- 討論屬性優勢和策略
+- 自然地使用寶可夢相關術語
+- 考慮競技環境的影響
+Always be clear, educational, and engaging. Use examples when helpful.
+Format responses with proper markdown for better readability.
+請務必清晰、具教育性、引人入勝。適時使用範例說明。使用適當的 Markdown 格式以提升可讀性。"""
+    def get_response(self, user_message, analysis_results=None):
+        """
+        獲取 AI 回應
+        Args:
+            user_message: 用戶訊息
+            analysis_results: 分析結果字典（可選）
+        Returns:
+            str: AI 回應
+        """
+        # 準備上下文資訊
+        context = ""
+        if analysis_results:
+            context = self._prepare_context(analysis_results)
+        # 添加用戶訊息到歷史
+        self.conversation_history.append({
+            "role": "user",
+            "content": user_message
+        })
+        try:
+            # 構建完整的提示詞
+            full_prompt = self.system_prompt
+            if context:
+                full_prompt += f"\n\n## Current Analysis Context:\n{context}"
+            # 構建對話歷史文字
+            conversation_text = "\n\n## Conversation History:\n"
+            for msg in self.conversation_history[:-1]:
+                role = "User" if msg["role"] == "user" else "Assistant"
+                conversation_text += f"\n{role}: {msg['content']}\n"
+            # 組合最終提示詞
+            final_prompt = full_prompt + conversation_text + f"\nUser: {user_message}\n\nAssistant:"
+            # 調用 Gemini API
+            response = self.model.generate_content(
+                final_prompt,
+                generation_config=genai.types.GenerationConfig(
+                    temperature=0.7,
+                    max_output_tokens=4000,
+                )
+            )
+            assistant_message = response.text
+            # 添加助手回應到歷史
+            self.conversation_history.append({
+                "role": "assistant",
+                "content": assistant_message
+            })
+            return assistant_message
+        except Exception as e:
+            return f"❌ Error: {str(e)}\n\nPlease check your API key and try again."
+    def _prepare_context(self, results):
+        """準備分析結果的上下文資訊"""
+        if not results:
+            return "目前尚無分析結果。No analysis results available yet."
+        overall = results['overall']
+        interp = results['interpretation']
+        diag = results['diagnostics']
+        # 找出顯著的屬性
+        sig_types = [
+            results['trial_labels'][i]
+            for i, sig in enumerate(results['by_trial']['delta_significant'])
+            if sig
+        ]
+        context = f"""
+## Current Bayesian Hierarchical Model Analysis | 目前的貝氏階層模型分析
+### Overall Effect | 整體效應
+- **d (Log Odds Ratio) | d（對數勝算比）**:
+  - Mean | 平均: {overall['d_mean']:.4f}
+  - SD | 標準差: {overall['d_sd']:.4f}
+  - 95% HDI: [{overall['d_hdi_low']:.4f}, {overall['d_hdi_high']:.4f}]
+- **sigma (Between-type Variation) | sigma（屬性間變異）**:
+  - Mean | 平均: {overall['sigma_mean']:.4f}
+  - SD | 標準差: {overall['sigma_sd']:.4f}
+  - 95% HDI: [{overall['sigma_hdi_low']:.4f}, {overall['sigma_hdi_high']:.4f}]
+- **Odds Ratio | 勝算比**:
+  - Mean | 平均: {overall['or_mean']:.4f}
+  - SD | 標準差: {overall['or_sd']:.4f}
+  - 95% HDI: [{overall['or_hdi_low']:.4f}, {overall['or_hdi_high']:.4f}]
+### Model Diagnostics | 模型診斷
+- **R-hat (d)**: {diag['rhat_d']:.4f if diag['rhat_d'] else 'N/A'} {'✓' if diag['rhat_d'] and diag['rhat_d'] < 1.1 else '✗'}
+- **R-hat (sigma)**: {diag['rhat_sigma']:.4f if diag['rhat_sigma'] else 'N/A'} {'✓' if diag['rhat_sigma'] and diag['rhat_sigma'] < 1.1 else '✗'}
+- **ESS (d)**: {int(diag['ess_d']) if diag['ess_d'] else 'N/A'}
+- **ESS (sigma)**: {int(diag['ess_sigma']) if diag['ess_sigma'] else 'N/A'}
+- **Convergence | 收斂狀態**: {'✓ Converged 已收斂' if diag['converged'] else '✗ Not Converged 未收斂'}
+### Interpretation | 結果解釋
+- **Overall Effect | 整體效應**: {interp['overall_effect']}
+- **Significance | 顯著性**: {interp['overall_significance']}
+- **Effect Size | 效果大小**: {interp['effect_size']}
+- **Heterogeneity | 異質性**: {interp['heterogeneity']}
+### Significant Types | 顯著的屬性
+{len(sig_types)} out of {results['n_trials']} types show significant speed effects:
+{len(sig_types)} 個屬性（共 {results['n_trials']} 個）顯示顯著的速度效應：
+{', '.join(sig_types) if sig_types else 'None 無'}
+### Number of Types Analyzed | 分析的屬性數量
+{results['n_trials']} types in total 共 {results['n_trials']} 個屬性
+### Key Finding | 關鍵發現
+{
+    f"On average, faster Pokémon are {overall['or_mean']:.2f} times more likely to win than slower ones (95% HDI: [{overall['or_hdi_low']:.2f}, {overall['or_hdi_high']:.2f}]). 平均而言，速度較快的寶可夢獲勝的可能性是速度較慢者的 {overall['or_mean']:.2f} 倍（95% HDI: [{overall['or_hdi_low']:.2f}, {overall['or_hdi_high']:.2f}]）。"
+    if overall['or_mean'] > 1
+    else f"Interestingly, the data suggests no clear speed advantage or even a slight disadvantage. 有趣的是，資料顯示速度並無明顯優勢，甚至可能略有劣勢。"
+}
+The variation between types (sigma = {overall['sigma_mean']:.3f}) indicates {interp['heterogeneity'].lower()}.
+屬性間的變異（sigma = {overall['sigma_mean']:.3f}）表示{interp['heterogeneity'].lower()}。
+"""
+        return context
+    def generate_summary(self, analysis_results):
+        """自動生成分析結果總結"""
+        summary_prompt = """請根據提供的貝氏階層模型分析結果生成一份完整的總結報告，包含：
+1. **模型目的**：簡述這個階層模型在分析什麼
+2. **整體發現**：
+   - 速度對勝率有什麼整體影響？
+   - d 和勝算比告訴我們什麼？
+   - HDI 的意義是什麼？
+3. **屬性間差異**：
+   - sigma 告訴我們什麼？
+   - 哪些屬性特別受速度影響？
+4. **模型品質**：
+   - 模型收斂得好嗎？（R-hat、ESS）
+   - 結果可信嗎？
+5. **實戰啟示**：
+   - 訓練師如何運用這些資訊？
+   - 哪些屬性應該優先考慮速度？
+請用清楚的繁體中文 Markdown 格式撰寫，包含適當的章節標題。"""
+        return self.get_response(summary_prompt, analysis_results)
+    def explain_metric(self, metric_name, analysis_results):
+        """解釋特定指標"""
+        metric_explanations = {
+            'd': 'd (整體對數勝算比)',
+            'sigma': 'sigma (屬性間變異)',
+            'or_speed': 'Odds Ratio (勝算比)',
+            'hdi': '95% HDI (最高密度區間)',
+            'delta': 'delta (屬性特定效應)',
+            'rhat': 'R-hat (收斂診斷)',
+            'ess': 'ESS (有效樣本數)'
+        }
+        metric_display = metric_explanations.get(metric_name, metric_name)
+        explain_prompt = f"""請在這次貝氏階層模型分析的脈絡下，解釋以下指標：
+指標：{metric_display}
+請包含：
+1. 這個指標在貝氏統計中測量什麼？
+2. 在本次分析中得到的數值是多少？
+3. 如何從寶可夢對戰的角度詮釋這個數值？
+4. 與頻率論統計的對應指標有何不同？
+5. 有什麼需要注意的限制或注意事項？
+請用繁體中文回答。"""
+        return self.get_response(explain_prompt, analysis_results)
+    def explain_bayesian_vs_frequentist(self):
+        """解釋貝氏與頻率論的差異"""
+        explain_prompt = """請用簡單的方式解釋貝氏統計和頻率論統計的差異，特別是在寶可夢對戰分析的情境下。
+請涵蓋：
+1. 兩者的根本哲學差異是什麼？
+2. p 值 vs HDI（可信區間）有什麼不同？
+3. 為什麼我們用階層模型來分析多個屬性？
+4. 貝氏方法的優勢和限制是什麼？
+5. 什麼時候該用貝氏、什麼時候該用頻率論？
+請用寶可夢的實際例子讓說明更具體易懂，全程使用繁體中文。"""
+        return self.get_response(explain_prompt, None)
+    def explain_hierarchical_model(self):
+        """解釋階層模型的概念"""
+        explain_prompt = """請用簡單的方式解釋貝氏階層模型，特別是在寶可夢屬性分析的情境下。
+請涵蓋：
+1. 什麼是階層模型？為什麼要用階層結構？
+2. 「借用資訊」(borrowing strength) 是什麼意思？
+3. 收縮效應 (shrinkage) 如何運作？
+4. 為什麼階層模型適合分析多個屬性？
+5. d、sigma、delta 之間的關係是什麼？
+請用寶可夢的實際例子讓說明更具體易懂，全程使用繁體中文。"""
+        return self.get_response(explain_prompt, None)
+    def battle_strategy_advice(self, analysis_results):
+        """提供對戰策略建議"""
+        strategy_prompt = """根據貝氏階層模型的分析結果，請為寶可夢訓練師提供實際的對戰策略建議。
+請考慮：
+1. 整體而言，速度對勝率的影響有多大？
+2. 哪些屬性特別受益於速度？哪些不受影響？
+3. 訓練師在組建隊伍時應該如何權衡速度？
+4. 有沒有屬性可以忽略速度、專注其他數值？
+5. 對競技對戰有什麼啟示？
+請具體且可操作，使用繁體中文回答。"""
+        return self.get_response(strategy_prompt, analysis_results)
+    def compare_types(self, analysis_results):
+        """比較不同屬性"""
+        compare_prompt = """請比較分析結果中不同屬性對速度的反應差異。
+請說明：
+1. 哪些屬性對速度最敏感？為什麼？
+2. 哪些屬性對速度不敏感？可能的原因是什麼？
+3. 屬性間的異質性（sigma）告訴我們什麼？
+4. 有沒有令人意外的發現？
+5. 這些差異對組隊策略有什麼啟示？
+請用繁體中文回答。"""
+        return self.get_response(compare_prompt, analysis_results)
+    def reset_conversation(self):
+        """重置對話歷史"""
+        self.conversation_history = []

bayesian_utils.py ADDED Viewed

	@@ -0,0 +1,349 @@

+import plotly.graph_objects as go
+import plotly.express as px
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import matplotlib
+matplotlib.use('Agg')  # 使用非互動式後端
+import arviz as az
+import io
+import base64
+from PIL import Image
+def plot_trace(trace, var_names=['d', 'sigma']):
+    """
+    繪製 Trace Plot（MCMC 收斂診斷）
+    Args:
+        trace: ArviZ InferenceData 物件
+        var_names: 要繪製的變數名稱
+    Returns:
+        PIL Image
+    """
+    fig, axes = plt.subplots(len(var_names), 2, figsize=(14, 4 * len(var_names)))
+    if len(var_names) == 1:
+        axes = axes.reshape(1, -1)
+    az.plot_trace(trace, var_names=var_names, axes=axes)
+    plt.tight_layout()
+    # 轉換為圖片
+    buf = io.BytesIO()
+    plt.savefig(buf, format='png', dpi=300, bbox_inches='tight')
+    buf.seek(0)
+    img = Image.open(buf)
+    plt.close()
+    return img
+def plot_posterior(trace, var_names=['d', 'sigma', 'or_speed'], hdi_prob=0.95):
+    """
+    繪製後驗分佈圖
+    Args:
+        trace: ArviZ InferenceData 物件
+        var_names: 要繪製的變數名稱
+        hdi_prob: HDI 機率
+    Returns:
+        PIL Image
+    """
+    fig = az.plot_posterior(trace, var_names=var_names, hdi_prob=hdi_prob, figsize=(14, 5))
+    plt.tight_layout()
+    # 轉換為圖片
+    buf = io.BytesIO()
+    plt.savefig(buf, format='png', dpi=300, bbox_inches='tight')
+    buf.seek(0)
+    img = Image.open(buf)
+    plt.close()
+    return img
+def plot_forest(trace, trial_labels, title='Effect of Speed on Win Rate by Type'):
+    """
+    繪製 Forest Plot（各屬性效應）
+    Args:
+        trace: ArviZ InferenceData 物件
+        trial_labels: 屬性標籤列表
+        title: 圖表標題
+    Returns:
+        PIL Image
+    """
+    num_trials = len(trial_labels)
+    # 計算統計量
+    delta_posterior = trace.posterior['delta'].values.reshape(-1, num_trials)
+    delta_mean = delta_posterior.mean(axis=0)
+    delta_hdi = az.hdi(trace, var_names=['delta'], hdi_prob=0.95)['delta'].values
+    # 建立圖表
+    fig, ax = plt.subplots(figsize=(12, max(10, num_trials * 0.4)))
+    y_pos = np.arange(num_trials)
+    # 繪製信賴區間（橫線）
+    ax.hlines(y_pos, delta_hdi[:, 0], delta_hdi[:, 1], color='steelblue', linewidth=3, label='95% HDI')
+    # 繪製平均值（點）
+    ax.scatter(delta_mean, y_pos, color='darkblue', s=120, zorder=3,
+               edgecolors='white', linewidth=1.5, label='Mean')
+    # 標註顯著的點
+    for i, (mean, hdi) in enumerate(zip(delta_mean, delta_hdi)):
+        if hdi[0] > 0:  # 顯著正效應
+            ax.text(mean, i, ' ★', fontsize=15, ha='left', va='center', color='gold')
+        elif hdi[1] < 0:  # 顯著負效應
+            ax.text(mean, i, ' ☆', fontsize=15, ha='left', va='center', color='red')
+    # 設定軸
+    ax.set_yticks(y_pos)
+    ax.set_yticklabels(trial_labels, fontsize=11)
+    ax.invert_yaxis()
+    ax.axvline(0, color='red', linestyle='--', linewidth=2, label='No Effect (δ=0)')
+    ax.set_xlabel('Delta (Log Odds Ratio)', fontsize=13)
+    ax.set_title(title, fontsize=15, fontweight='bold', pad=20)
+    ax.legend(loc='lower right')
+    ax.grid(axis='x', alpha=0.3)
+    plt.tight_layout()
+    # 轉換為圖片
+    buf = io.BytesIO()
+    plt.savefig(buf, format='png', dpi=300, bbox_inches='tight')
+    buf.seek(0)
+    img = Image.open(buf)
+    plt.close()
+    return img
+def plot_model_dag(analyzer):
+    """
+    繪製模型 DAG 圖
+    Args:
+        analyzer: BayesianHierarchicalAnalyzer 物件
+    Returns:
+        PIL Image 或 None
+    """
+    try:
+        gv = analyzer.get_model_graph()
+        # 轉換為 PNG
+        png_bytes = gv.pipe(format='png')
+        # 轉換為 PIL Image
+        img = Image.open(io.BytesIO(png_bytes))
+        return img
+    except Exception as e:
+        print(f"無法生成 DAG 圖: {e}")
+        return None
+def create_summary_table(results):
+    """
+    創建結果摘要表格
+    Args:
+        results: 分析結果字典
+    Returns:
+        pandas DataFrame
+    """
+    overall = results['overall']
+    summary_data = {
+        '參數': ['d (整體效應)', 'sigma (屬性間變異)', 'or_speed (勝算比)'],
+        '平均值': [
+            f"{overall['d_mean']:.4f}",
+            f"{overall['sigma_mean']:.4f}",
+            f"{overall['or_mean']:.4f}"
+        ],
+        '標準差': [
+            f"{overall['d_sd']:.4f}",
+            f"{overall['sigma_sd']:.4f}",
+            f"{overall['or_sd']:.4f}"
+        ],
+        '95% HDI 下界': [
+            f"{overall['d_hdi_low']:.4f}",
+            f"{overall['sigma_hdi_low']:.4f}",
+            f"{overall['or_hdi_low']:.4f}"
+        ],
+        '95% HDI 上界': [
+            f"{overall['d_hdi_high']:.4f}",
+            f"{overall['sigma_hdi_high']:.4f}",
+            f"{overall['or_hdi_high']:.4f}"
+        ]
+    }
+    return pd.DataFrame(summary_data)
+def create_trial_results_table(results):
+    """
+    創建各屬性結果表格
+    Args:
+        results: 分析結果字典
+    Returns:
+        pandas DataFrame
+    """
+    trial_labels = results['trial_labels']
+    by_trial = results['by_trial']
+    data = results['data']
+    trial_data = {
+        '屬性': trial_labels,
+        'Delta (平均)': [f"{x:.4f}" for x in by_trial['delta_mean']],
+        'Delta (標準差)': [f"{x:.4f}" for x in by_trial['delta_std']],
+        '95% HDI 下界': [f"{x:.4f}" for x in by_trial['delta_hdi_low']],
+        '95% HDI 上界': [f"{x:.4f}" for x in by_trial['delta_hdi_high']],
+        '顯著性': ['★ 顯著' if sig else '不顯著' for sig in by_trial['delta_significant']],
+        '控制組勝率': [f"{x:.2%}" for x in by_trial['pc_mean']],
+        '實驗組勝率': [f"{x:.2%}" for x in by_trial['pt_mean']],
+        '控制組 (勝/總)': [f"{d['rc']}/{d['nc']}" for d in data],
+        '實驗組 (勝/總)': [f"{d['rt']}/{d['nt']}" for d in data]
+    }
+    return pd.DataFrame(trial_data)
+def export_results_to_text(results):
+    """
+    匯出結果為純文字格式
+    Args:
+        results: 分析結果字典
+    Returns:
+        str: 格式化的文字報告
+    """
+    overall = results['overall']
+    interp = results['interpretation']
+    diag = results['diagnostics']
+    report = f"""
+==============================================
+貝氏階層模型分析報告
+==============================================
+分析時間: {results['timestamp']}
+屬性數量: {results['n_trials']}
+----------------------------------------------
+1. 整體效應摘要
+----------------------------------------------
+d (整體效應 - Log OR):
+  - 平均值: {overall['d_mean']:.4f}
+  - 標準差: {overall['d_sd']:.4f}
+  - 95% HDI: [{overall['d_hdi_low']:.4f}, {overall['d_hdi_high']:.4f}]
+sigma (屬性間變異):
+  - 平均值: {overall['sigma_mean']:.4f}
+  - 標準差: {overall['sigma_sd']:.4f}
+  - 95% HDI: [{overall['sigma_hdi_low']:.4f}, {overall['sigma_hdi_high']:.4f}]
+or_speed (勝算比):
+  - 平均值: {overall['or_mean']:.4f}
+  - 標準差: {overall['or_sd']:.4f}
+  - 95% HDI: [{overall['or_hdi_low']:.4f}, {overall['or_hdi_high']:.4f}]
+----------------------------------------------
+2. 模型收斂診斷
+----------------------------------------------
+R-hat (d): {diag['rhat_d']:.4f if diag['rhat_d'] else 'N/A'}
+R-hat (sigma): {diag['rhat_sigma']:.4f if diag['rhat_sigma'] else 'N/A'}
+ESS (d): {int(diag['ess_d']) if diag['ess_d'] else 'N/A'}
+ESS (sigma): {int(diag['ess_sigma']) if diag['ess_sigma'] else 'N/A'}
+收斂狀態: {'✓ 已收斂' if diag['converged'] else '✗ 未收斂'}
+----------------------------------------------
+3. 結果解釋
+----------------------------------------------
+整體效應: {interp['overall_effect']}
+顯著性: {interp['overall_significance']}
+效果大小: {interp['effect_size']}
+異質性: {interp['heterogeneity']}
+----------------------------------------------
+4. 各屬性詳細結果
+----------------------------------------------
+"""
+    # 添加各屬性的詳細資訊
+    trial_labels = results['trial_labels']
+    by_trial = results['by_trial']
+    for i, label in enumerate(trial_labels):
+        sig_marker = "★" if by_trial['delta_significant'][i] else " "
+        report += f"""
+{sig_marker} {label}:
+  Delta (平均): {by_trial['delta_mean'][i]:.4f}
+  95% HDI: [{by_trial['delta_hdi_low'][i]:.4f}, {by_trial['delta_hdi_high'][i]:.4f}]
+  控制組勝率: {by_trial['pc_mean'][i]:.2%}
+  實驗組勝率: {by_trial['pt_mean'][i]:.2%}
+  勝率差異: {(by_trial['pt_mean'][i] - by_trial['pc_mean'][i]):.2%}
+"""
+    report += """
+==============================================
+"""
+    return report
+def plot_odds_ratio_comparison(results):
+    """
+    繪製各屬性的勝算比比較圖（Plotly 版本）
+    Args:
+        results: 分析結果字典
+    Returns:
+        plotly figure
+    """
+    trial_labels = results['trial_labels']
+    delta_mean = results['by_trial']['delta_mean']
+    # 轉換為勝算比
+    or_values = [np.exp(d) for d in delta_mean]
+    # 排序
+    sorted_indices = np.argsort(or_values)[::-1]
+    sorted_labels = [trial_labels[i] for i in sorted_indices]
+    sorted_or = [or_values[i] for i in sorted_indices]
+    sorted_sig = [results['by_trial']['delta_significant'][i] for i in sorted_indices]
+    # 顏色標記
+    colors = ['#2ecc71' if sig else '#95a5a6' for sig in sorted_sig]
+    fig = go.Figure()
+    fig.add_trace(go.Bar(
+        x=sorted_or,
+        y=sorted_labels,
+        orientation='h',
+        marker=dict(
+            color=colors,
+            line=dict(color='white', width=1)
+        ),
+        text=[f'{or_val:.2f}' for or_val in sorted_or],
+        textposition='outside',
+        hovertemplate='%{y}<br>OR: %{x:.3f}<extra></extra>'
+    ))
+    # 參考線 (OR = 1)
+    fig.add_vline(x=1, line_dash="dash", line_color="red", line_width=2)
+    fig.update_layout(
+        title='各屬性速度效應（勝算比）',
+        xaxis_title='Odds Ratio',
+        yaxis_title='',
+        width=800,
+        height=max(400, len(trial_labels) * 25),
+        template='plotly_white',
+        showlegend=False
+    )
+    return fig

pokemon_speed_meta_results.csv ADDED Viewed

	@@ -0,0 +1,19 @@

+Trial_Type,rt,nt,rc,nc
+Bug,2229,3142,800,3660
+Dark,1559,2083,369,931
+Drago,1264,1715,298,889
+Elect,1935,2499,373,1174
+Fairy,310,432,309,1320
+Fight,800,1134,402,1458
+Fire,2547,3530,487,1535
+Flyin,102,107,39,110
+Ghost,639,937,331,1259
+Grass,1591,2196,1418,4598
+Groun,1100,1529,529,1574
+Ice,826,1288,354,1296
+Norma,4258,5748,1107,3989
+Poiso,997,1571,431,1411
+Psych,2002,2747,334,1926
+Rock,864,1255,998,3392
+Steel,609,804,428,1584
+Water,3601,5492,1814,5793

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+streamlit==1.31.0
+pandas==2.1.4
+numpy==1.26.3
+plotly==5.18.0
+pymc==5.10.4
+arviz==0.17.0
+matplotlib==3.8.2
+google-generativeai>=0.3.0
+pillow==10.2.0
+graphviz==0.20.1
+scipy==1.11.4
+pytensor==2.18.6