Wen1201 commited on
Commit
79eea14
·
verified ·
1 Parent(s): 5663bd8

Upload 6 files

Browse files
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Pokemon Speed Bayesian Analysis System
3
  emoji: 🔬
4
  colorFrom: blue
5
  colorTo: indigo
@@ -9,253 +9,323 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- # Pokemon Speed Bayesian Analysis System
13
 
14
- A comprehensive web-based system for analyzing the impact of speed on Pokemon win rates using Bayesian hierarchical meta-analysis, powered by AI assistant.
15
 
16
- ## Features
17
 
18
- ### 🔬 **Bayesian Hierarchical Modeling**
19
- - PyMC-based MCMC sampling
20
- - Hierarchical structure to borrow strength across Pokemon types
21
- - Type-specific and overall effect estimation
22
 
23
- ### 📊 **Interactive Visualizations**
24
- - **Trace Plots**: Check MCMC convergence
25
- - **Posterior Distributions**: Visualize parameter uncertainty with HDI
26
- - **Forest Plots**: Compare effects across Pokemon types
27
- - **Win Rate Comparisons**: See actual win rate differences
28
- - **Heterogeneity Analysis**: Understand between-type variation
29
 
30
- ### 🤖 **AI-Powered Assistant**
31
- - GPT-4 integration for result interpretation
32
- - Natural language Q&A about analysis results
33
- - Automatic summary generation
34
- - Statistical concept explanations
35
- - Type-specific insights
36
 
37
- ### 📥 **Export Capabilities**
38
- - JSON format for full results
39
- - CSV format for type-specific data
40
- - Downloadable reports
 
 
41
 
42
- ## 🚀 Quick Start
43
-
44
- ### Installation
45
 
 
46
  ```bash
47
- # Install dependencies
48
- pip install -r requirements.txt
49
-
50
- # Run the application
51
- streamlit run app.py
52
  ```
53
 
54
- ### Usage
55
-
56
- 1. **Configure Settings** (Sidebar)
57
- - Enter your OpenAI API Key for AI features
58
- - Upload your data CSV or use example data
59
- - Adjust MCMC parameters if needed
60
 
61
- 2. **Run Analysis** (Data & Analysis tab)
62
- - Click "🚀 Run Analysis"
63
- - Wait for MCMC sampling to complete (2-5 minutes)
64
- - View results and convergence diagnostics
65
-
66
- 3. **Explore Visualizations** (Visualizations tab)
67
- - Trace plots for convergence checking
68
- - Posterior distributions with HDI
69
- - Forest plots for type comparisons
70
- - Win rate comparisons
71
-
72
- 4. **Ask Questions** (AI Assistant tab)
73
- - Use quick question buttons
74
- - Chat with AI about results
75
- - Get concept explanations
76
- - Request improvement suggestions
77
 
78
- 5. **Export Results** (Export Results tab)
79
- - Download as JSON or CSV
80
- - Review export preview
 
81
 
82
- ## 📁 Data Format
 
 
 
83
 
84
- Your CSV file should contain the following columns:
 
85
 
86
- | Column | Description |
87
- |--------|-------------|
88
- | `Trial_Type` | Pokemon type name (e.g., "Fire", "Water") |
89
- | `rc` | Control group (slow) win count |
90
- | `nc` | Control group total battles |
91
- | `rt` | Treatment group (fast) win count |
92
- | `nt` | Treatment group total battles |
93
 
94
- **Example:**
 
 
 
 
 
 
95
 
 
96
  ```csv
97
  Trial_Type,rc,nc,rt,nt
98
- Fire,45,100,58,100
99
- Water,52,110,63,105
100
- Electric,48,95,61,98
 
101
  ```
102
 
103
- ## 🔬 Statistical Model
 
 
 
104
 
105
- ### Hierarchical Structure
106
-
107
- ```
108
- Overall Effect (d, τ)
109
-
110
- Type-Specific Effects (δᵢ, μᵢ)
111
-
112
- Observed Win Rates (rc, rt)
113
  ```
114
 
115
- ### Key Parameters
116
 
117
- - **d**: Overall log odds ratio of speed effect
118
- - **OR (Odds Ratio)**: exp(d) - multiplicative effect on odds
119
- - **σ (sigma)**: Between-type heterogeneity
120
- - **δᵢ (delta)**: Type-specific speed effects
121
- - **μᵢ (mu)**: Type-specific baseline win rates
122
-
123
- ### Priors
124
-
125
- ```python
126
- d ~ Normal(0, 10) # Overall effect
127
- τ ~ Gamma(0.001, 0.001) # Precision
128
- σ = 1/√τ # Heterogeneity
129
- μᵢ ~ Normal(0, 10) # Baseline rates
130
- δᵢ ~ Normal(d, σ) # Type effects
131
  ```
132
 
133
- ## 📊 Interpreting Results
134
-
135
- ### Log Odds Ratio (d)
136
- - **d > 0**: Speed increases win probability
137
- - **d < 0**: Speed decreases win probability
138
- - **d ≈ 0**: No effect
139
-
140
- ### Odds Ratio (OR)
141
- - **OR = 1.5**: Faster Pokemon have 1.5x the odds of winning
142
- - **OR = 2.0**: Faster Pokemon have 2x the odds (twice as likely)
143
-
144
- ### 95% HDI (Highest Density Interval)
145
- - Bayesian credible interval
146
- - 95% probability the true value falls within this range
147
- - **HDI excludes 0**: Effect is "statistically credible"
148
-
149
- ### Convergence Diagnostics
150
-
151
- **R-hat (Gelman-Rubin)**
152
- - < 1.01: Excellent convergence
153
- - ⚠️ 1.01-1.05: Acceptable but check
154
- - > 1.05: Poor convergence, resample
155
-
156
- **ESS (Effective Sample Size)**
157
- - > 400: Good
158
- - ⚠️ 100-400: Marginal
159
- - < 100: Insufficient, increase samples
160
-
161
- ## 🤖 AI Assistant Features
162
-
163
- ### Quick Actions
164
- - **Generate Summary**: Comprehensive analysis overview
165
- - **Explain Results**: Simple interpretation
166
- - **Suggest Improvements**: Data and model enhancements
167
-
168
- ### Concept Explanations
169
- - Log Odds Ratio
170
- - Odds Ratio
171
- - HDI (Highest Density Interval)
172
- - Heterogeneity
173
- - Hierarchical Model
174
- - Convergence Diagnostics
175
-
176
- ### Custom Questions
177
- Ask anything about your analysis:
178
- - "Which Pokemon type benefits most from speed?"
179
- - "Is the heterogeneity high in my analysis?"
180
- - "Should I trust these results based on R-hat?"
181
- - "What does an odds ratio of 1.6 mean practically?"
182
-
183
- ## 🛠️ Technical Stack
184
-
185
- - **Backend**: Python 3.8+
186
- - **Bayesian Inference**: PyMC 5.x
187
- - **Diagnostics**: ArviZ
188
- - **Visualization**: Plotly
189
- - **Web Framework**: Streamlit
190
- - **AI**: OpenAI GPT-4o-mini
191
-
192
- ## ⚙️ Configuration
193
-
194
- ### MCMC Parameters
195
-
196
- **Samples** (default: 2000)
197
- - More samples = more accurate but slower
198
- - Recommended: 2000-5000 for production
199
-
200
- **Tuning** (default: 1000)
201
- - Warm-up iterations discarded
202
- - Recommended: 500-1500
203
-
204
- **Target Accept** (default: 0.95)
205
- - Higher = more accurate but slower
206
- - Recommended: 0.90-0.98
207
-
208
- ## 🔍 Example Analysis
209
-
210
- Using the example dataset (18 Pokemon types):
211
-
212
- **Typical Results:**
213
- - **Overall Effect (d)**: ~0.35 (95% HDI: [0.18, 0.52])
214
- - **Odds Ratio**: ~1.42 (faster Pokemon have 42% higher odds)
215
- - **Heterogeneity (σ)**: ~0.15 (low, effects are consistent across types)
216
- - **Win Rate Increase**: ~7% on average
217
-
218
- **Interpretation:**
219
- > Across all Pokemon types, faster Pokemon have approximately 1.4x the odds of winning compared to slower Pokemon. This translates to an average win rate increase of about 7 percentage points. The effect is relatively consistent across types (low heterogeneity).
220
-
221
- ## ⚠️ Limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
222
 
223
- 1. **Computational Time**: MCMC can take several minutes
224
- 2. **API Costs**: AI features require OpenAI API credits
225
- 3. **Data Requirements**: Need sufficient sample sizes per type
226
- 4. **Causality**: Analysis shows association, not causation
227
- 5. **Assumptions**: Binary outcomes, independent battles
228
 
229
- ## 📚 References
 
 
 
 
230
 
231
- ### Statistical Methods
232
- - Gelman, A. et al. (2013). *Bayesian Data Analysis*
233
- - Kruschke, J. (2014). *Doing Bayesian Data Analysis*
 
 
 
234
 
235
- ### Software
236
- - [PyMC Documentation](https://www.pymc.io/)
237
- - [ArviZ Documentation](https://arviz-devs.github.io/)
238
- - [Streamlit Documentation](https://docs.streamlit.io/)
 
239
 
240
- ## 🤝 Contributing
241
 
242
- Suggestions and improvements welcome! Consider:
243
- - Adding more visualization types
244
- - Implementing model comparison (DIC, WAIC)
245
- - Supporting multiple outcome types
246
- - Adding more AI assistant features
247
 
248
- ## 📄 License
249
 
250
- MIT License - feel free to use and modify
251
 
252
- ## 🙏 Acknowledgments
253
 
254
- - **PyMC Team** for excellent Bayesian modeling tools
255
- - **OpenAI** for GPT-4 API
256
- - **Streamlit** for the web framework
257
- - **Pokemon Community** for inspiring this analysis
258
 
259
- ---
260
 
261
- **Made with ⚡ for Pokemon trainers who love statistics**
 
 
 
 
1
  ---
2
+ title: BayePyMC
3
  emoji: 🔬
4
  colorFrom: blue
5
  colorTo: indigo
 
9
  pinned: false
10
  ---
11
 
12
+ # 貝氏階層模型分析系統 - 寶可夢速度對勝率影響
13
 
14
+ ## 📋 系統簡介
15
 
16
+ 這是一個基於 Streamlit 和 PyMC 的貝氏階層模型分析系統,專為分析寶可夢速度對不同屬性勝率的影響而設計,結合 AI 助手提供深入的統計解釋和對戰策略建議。
17
 
18
+ ## 🎯 主要功能
 
 
 
19
 
20
+ ### 1. 貝氏階層模型分析
21
+ - MCMC 抽樣(Markov Chain Monte Carlo)
22
+ - 階層結構(跨屬性資訊借用)
23
+ - 完整的不確定性量化
24
+ - 後驗分佈估計
25
+ - 收斂診斷
26
 
27
+ ### 2. 完整視覺化(4 個圖表 + 1 個文字摘要)
28
+ - 🔀 **DAG 圖**:模型結構視覺化
29
+ - 📉 **Trace Plot**:MCMC 收斂診斷圖
30
+ - 🎯 **Posterior Plot**:後驗分佈圖
31
+ - 🌲 **Forest Plot**:各屬性效應圖
32
+ - 📋 **文字摘要**:統計結果表格
33
 
34
+ ### 3. AI 智能助手
35
+ - 💬 自然語言對話(雙語支援)
36
+ - 📖 統計概念解釋(貝氏、階層模型)
37
+ - 🎮 對戰策略建議
38
+ - 🔍 結果深度分析
39
+ - 📚 參數詳細說明
40
 
41
+ ## 📦 安裝步驟
 
 
42
 
43
+ ### 1. 安裝 Python 依賴套件
44
  ```bash
45
+ pip install -r bayesian_requirements.txt
 
 
 
 
46
  ```
47
 
48
+ ### 2. 安裝 Graphviz(系統級套件,用於生成 DAG 圖)
 
 
 
 
 
49
 
50
+ **Windows (使用 Chocolatey):**
51
+ ```bash
52
+ choco install graphviz
53
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
+ **Mac:**
56
+ ```bash
57
+ brew install graphviz
58
+ ```
59
 
60
+ **Ubuntu/Debian:**
61
+ ```bash
62
+ sudo apt-get install graphviz
63
+ ```
64
 
65
+ ### 3. 準備資料
66
+ 將寶可夢速度分析資料 CSV 檔放在同一目錄下,檔名為 `pokemon_speed_meta_results.csv`
67
 
68
+ **資料格式要求:**
 
 
 
 
 
 
69
 
70
+ | 欄位 | 說明 | 範例 |
71
+ |------|------|------|
72
+ | `Trial_Type` | 寶可夢屬性 | Water, Fire, Grass |
73
+ | `rc` | 控制組(速度慢)勝場數 | 45 |
74
+ | `nc` | 控制組總場數 | 100 |
75
+ | `rt` | 實驗組(速度快)勝場數 | 60 |
76
+ | `nt` | 實驗組總場數 | 100 |
77
 
78
+ **範例資料:**
79
  ```csv
80
  Trial_Type,rc,nc,rt,nt
81
+ Water,45,100,60,100
82
+ Fire,38,100,55,100
83
+ Grass,42,100,58,100
84
+ Electric,50,100,65,100
85
  ```
86
 
87
+ ### 4. 設定 Google Gemini API Key
88
+ - 在系統左側邊欄輸入您的 Google Gemini API Key
89
+ - API Key 用於 AI 助手功能
90
+ - 取得 API Key:https://ai.google.dev/
91
 
92
+ ### 5. 執行程式
93
+ ```bash
94
+ streamlit run bayesian_app.py
 
 
 
 
 
95
  ```
96
 
97
+ ## 🔧 檔案結構
98
 
99
+ ```
100
+ bayesian_hierarchical_model/
101
+ ├── bayesian_app.py # Streamlit 主程式
102
+ ├── bayesian_core.py # 貝氏階層模型核心邏輯
103
+ ├── bayesian_llm_assistant.py # AI 對話助手
104
+ ├── bayesian_requirements.txt # 依賴套件
105
+ ├── README.md # 說明文件
106
+ └── pokemon_speed_meta_results.csv # 資料檔(需自行準備)
 
 
 
 
 
 
107
  ```
108
 
109
+ ## 📊 使用方式
110
+
111
+ ### Step 1: 載入資料
112
+ 1. 選擇「使用預設資料集」或「上傳您的資料」
113
+ 2. 如果上傳,請確保 CSV 格式正確(需包含必要欄位)
114
+
115
+ ### Step 2: 設定抽樣參數(可選)
116
+ 1. 展開「進階設定」調整 MCMC 參數
117
+ 2. **建議設定**
118
+ - Samples: 2000(更多 = 更準確但更慢)
119
+ - Tuning: 1000
120
+ - Chains: 1(多條鏈可檢測收斂問題)
121
+ - Target Accept: 0.95
122
+
123
+ ### Step 3: 執行分析
124
+ 1. 點擊「開始貝氏分析」按鈕
125
+ 2. 等待分析完成(通常需要 2-5 分鐘)
126
+ 3. 查看結果的四個子頁面:
127
+ - **📊 概覽**:關鍵指標、摘要、各屬性詳細結果
128
+ - **📉 Trace Plot**:收斂診斷
129
+ - **🎯 Posterior**:後驗分佈
130
+ - **🌲 Forest Plot**:各屬性效應比較
131
+
132
+ ### Step 4: 使用 AI 助手
133
+ 1. 切換到「AI 助手」頁面
134
+ 2. 在聊天框輸入問題,或點擊快速問題按鈕
135
+ 3. AI 會根據分析結果提供解釋和建議
136
+
137
+ ## 💡 統計指標說明
138
+
139
+ ### 關鍵參數
140
+
141
+ | 參數 | 說明 | 解讀 |
142
+ |------|------|------|
143
+ | **d** | 整體平均效應(log OR) | 所有屬性的平均速度效應 |
144
+ | **sigma** | 屬性間變異 | 不同屬性對速度反應的差異程度 |
145
+ | **or_speed** | 速度勝算比(exp(d)) | 速度快的寶可夢獲勝機率倍數 |
146
+ | **delta[i]** | 第 i 個屬性的效應 | 該屬性的速度效應(相對於整體) |
147
+
148
+ ### 判斷準則
149
+
150
+ **顯著性:**
151
+ - 95% HDI 不包含 0 → 效應顯著
152
+ - 95% HDI 包含 0 → 效應不顯著
153
+
154
+ **勝算比解讀:**
155
+ - OR > 1:速度快有利
156
+ - OR = 1:無差異
157
+ - OR < 1:速度慢有利(罕見)
158
+
159
+ **收斂診斷:**
160
+ - Trace plot 應該像「毛毛蟲」(平穩、混合良好)
161
+ - 不應有明顯趨勢或週期性
162
+
163
+ ## 🎮 應用場景
164
+
165
+ ### 1. 屬性特定分析
166
+ 判斷哪些屬性的寶可夢特別受益於速度(如電系、飛行系)
167
+
168
+ ### 2. 組隊策略制定
169
+ 根據統計結果選擇是否優先速度特訓
170
+
171
+ ### 3. 對戰機制理解
172
+ 理解速度在不同對戰情境中的重要性
173
+
174
+ ### 4. 教學用途
175
+ 學習貝氏階層模型的原理和應用
176
+
177
+ ## 📈 視覺化圖表說明
178
+
179
+ ### 1️⃣ DAG 圖(模型結構)
180
+ - **作用**:展示變數之間的依賴關係
181
+ - **元素**:
182
+ - 圓形/橢圓:隨機變數
183
+ - 矩形:觀測資料
184
+ - 菱形:推導變數
185
+ - 箭頭:依賴關係
186
+
187
+ ### 2️⃣ Trace Plot(收斂診斷)
188
+ - **左欄**:MCMC 抽樣軌跡
189
+ - **右欄**:後驗分佈密度
190
+ - **良好收斂**:軌跡像「毛毛蟲」,平穩無趨勢
191
+ - **問題跡象**:有趨勢、卡住、未混合
192
+
193
+ ### 3️⃣ Posterior Plot(後驗分佈)
194
+ - 顯示 d、sigma、or_speed 的後驗分佈
195
+ - 自動標註 95% HDI
196
+ - 顯示平均值
197
+
198
+ ### 4️⃣ Forest Plot(各屬性效應)
199
+ - **最重要的圖!**
200
+ - Y 軸:各屬性
201
+ - X 軸:delta(log OR)
202
+ - 點:平均效應
203
+ - 線:95% 信賴區間
204
+ - 星號:顯著效應
205
+ - 紅虛線:無效應參考線
206
+
207
+ ## ⚙️ 技術架構
208
+
209
+ ### 核心技術
210
+ - **Streamlit**: Web 應用框架
211
+ - **PyMC**: 貝氏推論引擎
212
+ - **ArviZ**: 貝氏分析視覺化
213
+ - **NumPy/Pandas**: 數值運算與資料處理
214
+ - **Matplotlib**: 圖表繪製
215
+ - **Google Gemini**: AI 助手
216
+
217
+ ### 統計方法
218
+ - **Hierarchical Bayesian Model**: 階層貝氏模型
219
+ - **MCMC Sampling**: 馬可夫鏈蒙地卡羅抽樣
220
+ - **Logit Link Function**: Logit 連結函數
221
+ - **Partial Pooling**: 部分池化(資訊借用)
222
+
223
+ ### 特色設計
224
+ - ✅ Session 隔離(多用戶支援)
225
+ - ✅ 執行緒安全
226
+ - ✅ 自動清理過期資料
227
+ - ✅ 響應式 UI 設計
228
+ - ✅ 進度條回饋
229
+ - ✅ 完整錯誤處理
230
+
231
+ ## 🔒 隱私與安全
232
+
233
+ - 所有分析在本地執行
234
+ - Session 資料獨立儲存
235
+ - 超過 1 小時自動清理
236
+ - API Key 不會被儲存
237
+
238
+ ## 📝 範例問題(給 AI 助手)
239
+
240
+ ### 基本概念
241
+ - "什麼是貝氏統計?"
242
+ - "什麼是階層模型?"
243
+ - "什麼是先驗、後驗、似然?"
244
+ - "HDI 和信賴區間有什麼不同?"
245
+
246
+ ### 結果解讀
247
+ - "d 參數是什麼意思?"
248
+ - "sigma 大表示什麼?"
249
+ - "如何判斷速度效應是否顯著?"
250
+ - "為什麼有些屬性顯著,有些不顯著?"
251
+
252
+ ### 收斂診斷
253
+ - "如何看 Trace Plot?"
254
+ - "什麼是毛毛蟲圖?"
255
+ - "我的模型收斂了嗎?"
256
+
257
+ ### 實戰應用
258
+ - "給我分析總結"
259
+ - "哪些屬性最受益於速度?"
260
+ - "我該如何組建隊伍?"
261
+ - "這對對戰策略有什麼啟示?"
262
+
263
+ ## 🆚 與 McNemar 系統的比較
264
+
265
+ | 特性 | McNemar 系統 | 貝氏階層模型 |
266
+ |------|--------------|--------------|
267
+ | 方法 | 頻率論統計 | 貝氏推論 |
268
+ | 資料 | 配對資料(勝vs敗) | 獨立兩組(快vs慢) |
269
+ | 分析單位 | 單一特徵 | 多屬性同時分析 |
270
+ | 輸出 | p 值、OR | 後驗分佈、HDI |
271
+ | 階層性 | 無 | 有(跨屬性借用資訊) |
272
+ | 不確定性 | 點估計 + CI | 完整後驗分佈 |
273
+ | 小樣本 | 可能不穩定 | 穩健(借用資訊) |
274
+
275
+ ## 🚀 未來功能規劃
276
+
277
+ - [ ] 多特徵聯合分析(速度 + 攻擊 + HP)
278
+ - [ ] 模型比較(DIC, WAIC)
279
+ - [ ] 預測新屬性的效應
280
+ - [ ] 互動式後驗預測檢查
281
+ - [ ] 匯出完整 PDF 報告
282
+ - [ ] 批次分析多個資料集
283
+
284
+ ## 🐛 常見問題排解
285
+
286
+ ### Q1: DAG 圖無法生成
287
+ **A**: 請確保已安裝系統級的 Graphviz
288
+ ```bash
289
+ # 檢查是否安裝
290
+ dot -V
291
 
292
+ # 如果未安裝,請依照上述安裝步驟安裝
293
+ ```
 
 
 
294
 
295
+ ### Q2: MCMC 抽樣太慢
296
+ **A**: 可以降低抽樣數或調整參數
297
+ - 減少 Samples(但會降低精確度)
298
+ - 增加 Chains(利用多核心)
299
+ - 降低 Target Accept(但可能影響收斂)
300
 
301
+ ### Q3: Trace Plot 顯示未收斂
302
+ **A**: 嘗試以下方法
303
+ - 增加 Tuning samples
304
+ - 增加 Samples
305
+ - 提高 Target Accept
306
+ - 檢查資料是否有問題
307
 
308
+ ### Q4: AI 助手無法使用
309
+ **A**: 請檢查
310
+ - API Key 是否正確
311
+ - ���否已執行分析
312
+ - 網路連線是否正常
313
 
314
+ ## 📧 聯絡資訊
315
 
316
+ 如有問題或建議,歡迎聯繫開發團隊。
 
 
 
 
317
 
318
+ ## 📄 授權
319
 
320
+ 本專案僅供學術研究和教學使用。
321
 
322
+ ---
323
 
324
+ **Powered by PyMC, ArviZ & Google Gemini** 🚀
 
 
 
325
 
326
+ ## 🎓 延伸閱讀
327
 
328
+ - [PyMC 官方文件](https://www.pymc.io/)
329
+ - [ArviZ 官方文件](https://arviz-devs.github.io/arviz/)
330
+ - [Bayesian Data Analysis (Gelman et al.)](http://www.stat.columbia.edu/~gelman/book/)
331
+ - [Hierarchical Models 教學](https://www.pymc.io/projects/examples/en/latest/case_studies/hierarchical_partial_pooling.html)
app.py CHANGED
@@ -1,657 +1,621 @@
1
- """
2
- Pokemon Speed Bayesian Analysis System with LLM Assistant
3
- A comprehensive web application for analyzing speed effects on win rates
4
- """
5
-
6
  import streamlit as st
7
  import pandas as pd
8
- import numpy as np
9
- from datetime import datetime
10
- import io
11
- import json
12
-
13
- # 導入自定義模組
14
- from bayesian_core import BayesianSpeedAnalyzer
15
- from llm_assistant import LLMAssistant
16
- from utils import (
17
- plot_trace, plot_posterior, plot_forest,
18
- plot_win_rate_comparison, plot_heterogeneity,
19
- create_results_table, create_type_results_table
20
- )
21
 
22
- # ===== 頁面配置 =====
23
  st.set_page_config(
24
- page_title="Pokemon Speed Analysis",
25
  page_icon="⚡",
26
  layout="wide",
27
  initial_sidebar_state="expanded"
28
  )
29
 
30
- # ===== 自定義 CSS =====
31
  st.markdown("""
32
  <style>
33
- .main-header {
34
- font-size: 2.5rem;
35
- font-weight: bold;
36
- color: #2d6ca2;
37
- text-align: center;
38
- margin-bottom: 1rem;
 
 
 
 
 
 
 
 
 
39
  }
40
- .sub-header {
41
- font-size: 1.2rem;
42
- color: #666;
43
- text-align: center;
44
- margin-bottom: 2rem;
45
  }
46
- .metric-card {
47
- background-color: #f0f2f6;
48
- padding: 1rem;
49
- border-radius: 0.5rem;
50
- border-left: 4px solid #2d6ca2;
51
  }
52
- .stAlert {
53
- margin-top: 1rem;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  }
55
  </style>
56
  """, unsafe_allow_html=True)
57
 
58
- # ===== Session State 初始化 =====
59
- if 'analyzer' not in st.session_state:
60
- st.session_state.analyzer = None
61
- if 'results' not in st.session_state:
62
- st.session_state.results = None
63
- if 'trace' not in st.session_state:
64
- st.session_state.trace = None
65
- if 'llm_assistant' not in st.session_state:
66
- st.session_state.llm_assistant = None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  if 'chat_history' not in st.session_state:
68
  st.session_state.chat_history = []
69
- if 'data' not in st.session_state:
70
- st.session_state.data = None
 
 
 
 
 
71
 
72
- # ===== 側邊欄 =====
73
  with st.sidebar:
74
- st.markdown("### ⚙️ Configuration")
75
 
76
- # OpenAI API Key
77
  api_key = st.text_input(
78
- "OpenAI API Key",
79
  type="password",
80
- help="Required for AI Assistant features"
81
  )
82
 
83
  if api_key:
84
- st.success("✅ API Key provided")
85
- # 初始化 LLM Assistant
86
- if st.session_state.llm_assistant is None:
87
- session_id = f"pokemon_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
88
- st.session_state.llm_assistant = LLMAssistant(api_key, session_id)
89
- else:
90
- st.warning("⚠️ Enter API Key to enable AI features")
91
 
92
  st.markdown("---")
93
 
94
- # 資料上傳
95
- st.markdown("### 📁 Data Upload")
96
-
97
- uploaded_file = st.file_uploader(
98
- "Upload CSV file",
99
- type=['csv'],
100
- help="CSV should contain: Trial_Type, rc, nc, rt, nt"
101
- )
102
-
103
- # 使用範例資料
104
- use_example = st.checkbox("Use example data", value=True)
105
 
106
  st.markdown("---")
107
 
108
- # 分析參數
109
- st.markdown("### 🔧 Analysis Parameters")
110
-
111
- n_samples = st.slider(
112
- "MCMC Samples",
113
- min_value=500,
114
- max_value=5000,
115
- value=2000,
116
- step=500,
117
- help="Number of posterior samples to draw"
118
  )
119
 
120
- n_tune = st.slider(
121
- "Tuning Steps",
122
- min_value=500,
123
- max_value=3000,
124
- value=1000,
125
- step=500,
126
- help="Number of warm-up iterations"
127
- )
128
-
129
- target_accept = st.slider(
130
- "Target Accept Rate",
131
- min_value=0.80,
132
- max_value=0.99,
133
- value=0.95,
134
- step=0.01,
135
- help="MCMC acceptance rate (higher = more accurate but slower)"
136
- )
 
 
 
 
 
 
 
 
137
 
138
  st.markdown("---")
139
 
140
- # 關於
141
- with st.expander("ℹ️ About"):
142
- st.markdown("""
143
- **Pokemon Speed Bayesian Analysis**
 
 
 
 
 
 
 
 
144
 
145
- A hierarchical Bayesian meta-analysis system to evaluate
146
- whether faster Pokemon have higher win rates across different types.
 
 
 
 
 
 
147
 
148
- **Features:**
149
- - Bayesian hierarchical modeling
150
- - MCMC convergence diagnostics
151
- - Interactive visualizations
152
- - AI-powered result interpretation
 
153
 
154
- **Powered by:**
155
- - PyMC (Bayesian inference)
156
- - ArviZ (diagnostics)
157
- - GPT-4 (AI assistant)
158
- - Streamlit (web interface)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
159
  """)
160
 
161
- # ===== 標題 =====
162
- st.markdown('<div class="main-header">⚡ Pokemon Speed Bayesian Analysis System</div>', unsafe_allow_html=True)
163
- st.markdown('<div class="sub-header">Hierarchical Bayesian Meta-Analysis with AI Assistant</div>', unsafe_allow_html=True)
164
 
165
- # ===== 資料載入 =====
166
- def load_data():
167
- """載入或生成資料"""
168
 
169
- if uploaded_file is not None:
170
- try:
 
 
 
 
 
 
 
 
 
 
171
  df = pd.read_csv(uploaded_file)
172
-
173
- # 驗證必要欄位
174
- required_cols = ['Trial_Type', 'rc', 'nc', 'rt', 'nt']
175
- missing_cols = [col for col in required_cols if col not in df.columns]
176
-
177
- if missing_cols:
178
- st.error(f"❌ Missing required columns: {', '.join(missing_cols)}")
179
- return None
180
-
181
- st.success(f"✅ Loaded {len(df)} Pokemon types from uploaded file")
182
- return df
183
-
184
- except Exception as e:
185
- st.error(f"❌ Error loading file: {str(e)}")
186
- return None
187
 
188
- elif use_example:
189
- # 生成範例資料 (18種屬性)
190
- types = [
191
- 'Normal', 'Fire', 'Water', 'Electric', 'Grass', 'Ice',
192
- 'Fighting', 'Poison', 'Ground', 'Flying', 'Psychic', 'Bug',
193
- 'Rock', 'Ghost', 'Dragon', 'Dark', 'Steel', 'Fairy'
194
- ]
195
-
196
- np.random.seed(42)
197
-
198
- data = []
199
- for ptype in types:
200
- # 模擬數據:快速寶可夢通常有更高勝率
201
- base_win_rate = 0.50
202
- speed_effect = np.random.normal(0.08, 0.03) # 平均 8% 提升,變異 3%
203
-
204
- nc = np.random.randint(80, 120) # 控制組樣本數
205
- nt = np.random.randint(80, 120) # 實驗組樣本數
206
-
207
- pc = np.clip(base_win_rate + np.random.normal(0, 0.05), 0.3, 0.7)
208
- pt = np.clip(pc + speed_effect, 0.3, 0.7)
209
-
210
- rc = int(nc * pc)
211
- rt = int(nt * pt)
212
-
213
- data.append({
214
- 'Trial_Type': ptype,
215
- 'rc': rc,
216
- 'nc': nc,
217
- 'rt': rt,
218
- 'nt': nt
219
- })
220
-
221
- df = pd.DataFrame(data)
222
- st.info("ℹ️ Using example data (18 Pokemon types)")
223
- return df
224
-
225
- return None
226
-
227
- # 載入資料
228
- if st.session_state.data is None:
229
- st.session_state.data = load_data()
230
-
231
- # ===== 分頁 =====
232
- tab1, tab2, tab3, tab4 = st.tabs([
233
- "📊 Data & Analysis",
234
- "📈 Visualizations",
235
- "🤖 AI Assistant",
236
- "📥 Export Results"
237
- ])
238
-
239
- # ===== Tab 1: 資料與分析 =====
240
- with tab1:
241
- if st.session_state.data is not None:
242
- st.markdown("### 📋 Data Preview")
243
-
244
- # 顯示資料
245
- col1, col2 = st.columns([2, 1])
246
-
247
- with col1:
248
- st.dataframe(st.session_state.data, use_container_width=True)
249
-
250
- with col2:
251
- st.markdown("**Data Summary**")
252
- st.metric("Total Types", len(st.session_state.data))
253
- st.metric("Total Battles (Control)", st.session_state.data['nc'].sum())
254
- st.metric("Total Battles (Treatment)", st.session_state.data['nt'].sum())
255
 
256
  st.markdown("---")
257
 
258
  # 執行分析按鈕
259
- col1, col2, col3 = st.columns([1, 1, 2])
260
-
261
- with col1:
262
- if st.button("🚀 Run Analysis", type="primary", use_container_width=True):
263
- with st.spinner("Running Bayesian MCMC sampling... This may take a few minutes."):
264
- try:
265
- # 創建分析器
266
- analyzer = BayesianSpeedAnalyzer(st.session_state.data)
267
-
268
- # 建立模型
269
- analyzer.build_model()
270
-
271
- # 執行 MCMC
272
- progress_bar = st.progress(0)
273
- status_text = st.empty()
274
-
275
- status_text.text("Building model...")
276
- progress_bar.progress(20)
277
-
278
- status_text.text(f"Sampling {n_samples} iterations...")
279
- trace = analyzer.run_analysis(
280
- samples=n_samples,
281
- tune=n_tune,
282
- target_accept=target_accept
283
- )
284
- progress_bar.progress(80)
285
-
286
- status_text.text("Generating results...")
287
-
288
- # 儲存結果
289
- st.session_state.analyzer = analyzer
290
- st.session_state.trace = trace
291
- st.session_state.results = analyzer.results
292
-
293
- progress_bar.progress(100)
294
- status_text.empty()
295
- progress_bar.empty()
296
-
297
- st.success("✅ Analysis completed successfully!")
298
- st.rerun()
299
-
300
- except Exception as e:
301
- st.error(f"❌ Analysis failed: {str(e)}")
302
 
303
  with col2:
304
- if st.session_state.results is not None:
305
- if st.button("🔄 Reset Analysis", use_container_width=True):
306
- st.session_state.analyzer = None
307
- st.session_state.results = None
308
- st.session_state.trace = None
309
- st.rerun()
310
 
311
- # 顯示結果
312
- if st.session_state.results is not None:
313
- st.markdown("---")
314
- st.markdown("### 📊 Analysis Results")
 
315
 
316
- # 關鍵指標
317
- stats = st.session_state.results['statistics']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
318
 
319
- col1, col2, col3, col4 = st.columns(4)
 
320
 
321
- with col1:
322
- st.markdown('<div class="metric-card">', unsafe_allow_html=True)
323
- st.metric(
324
- "Log Odds Ratio (d)",
325
- f"{stats['d_mean']:.3f}",
326
- delta=f"HDI: [{stats['d_hdi_lower']:.3f}, {stats['d_hdi_upper']:.3f}]"
327
- )
328
- st.markdown('</div>', unsafe_allow_html=True)
329
 
330
- with col2:
331
- st.markdown('<div class="metric-card">', unsafe_allow_html=True)
332
- st.metric(
333
- "Odds Ratio (OR)",
334
- f"{stats['or_mean']:.3f}",
335
- delta=f"HDI: [{stats['or_hdi_lower']:.3f}, {stats['or_hdi_upper']:.3f}]"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
336
  )
337
- st.markdown('</div>', unsafe_allow_html=True)
338
-
339
- with col3:
340
- st.markdown('<div class="metric-card">', unsafe_allow_html=True)
341
- st.metric(
342
- "Heterogeneity (σ)",
343
- f"{stats['sigma_mean']:.3f}",
344
- delta="Between-type variation"
345
  )
346
- st.markdown('</div>', unsafe_allow_html=True)
347
-
348
- with col4:
349
- st.markdown('<div class="metric-card">', unsafe_allow_html=True)
350
- st.metric(
351
- "Avg Win Rate Increase",
352
- f"{stats['win_rate_increase'].mean():.1f}%",
353
- delta="Percentage points"
 
 
 
 
 
 
 
 
 
 
 
354
  )
355
- st.markdown('</div>', unsafe_allow_html=True)
356
-
357
- # 解釋
358
- st.markdown("### 💡 Interpretation")
359
- interpretation = st.session_state.analyzer.interpret_results()
360
- st.markdown(interpretation)
361
-
362
- # 詳細結果表
363
- st.markdown("### 📋 Detailed Results")
364
 
365
- col1, col2 = st.columns(2)
366
-
367
- with col1:
368
- st.markdown("**Overall Effect Summary**")
369
- fig_summary = create_results_table(st.session_state.results['summary'])
370
- st.plotly_chart(fig_summary, use_container_width=True)
371
-
372
- with col2:
373
- st.markdown("**Type-Specific Results**")
374
- trial_results = st.session_state.analyzer.get_trial_specific_results()
375
- fig_trial = create_type_results_table(trial_results)
376
- st.plotly_chart(fig_trial, use_container_width=True)
 
 
 
 
 
 
 
 
377
 
378
- # 收斂診斷
379
- st.markdown("### 🔍 Convergence Diagnostics")
380
- diagnostics = st.session_state.analyzer.get_convergence_diagnostics()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
381
 
382
- if diagnostics:
383
- col1, col2 = st.columns(2)
 
384
 
385
- with col1:
386
- st.markdown("**R-hat (Convergence)**")
387
- st.write("✅ Good: < 1.01, ⚠️ Check: 1.01-1.05, ❌ Poor: > 1.05")
388
- for param, value in diagnostics['r_hat'].items():
389
- status = "✅" if value < 1.01 else "⚠️" if value < 1.05 else "❌"
390
- st.write(f"{status} {param}: {value:.4f}")
 
 
 
 
 
 
 
391
 
392
- with col2:
393
- st.markdown("**ESS (Effective Sample Size)**")
394
- st.write("✅ Good: > 400, ⚠️ Check: 100-400, ❌ Poor: < 100")
395
- for param, value in diagnostics['ess_bulk'].items():
396
- status = "✅" if value > 400 else "⚠️" if value > 100 else "❌"
397
- st.write(f"{status} {param}: {value:.0f}")
398
-
399
- else:
400
- st.warning("⚠️ Please upload data or enable example data in the sidebar")
 
 
 
 
 
 
 
 
 
 
 
401
 
402
- # ===== Tab 2: 視覺化 =====
403
  with tab2:
404
- if st.session_state.trace is not None and st.session_state.results is not None:
405
- st.markdown("### 📈 Visualization Gallery")
406
-
407
- # Trace Plot
408
- with st.expander("🔍 Trace Plot (Convergence Check)", expanded=True):
409
- st.markdown("""
410
- **How to read:**
411
- - Left: Sampling trace should look like a "hairy caterpillar" (stationary)
412
- - Right: Posterior distribution shape
413
- """)
414
- fig_trace = plot_trace(st.session_state.trace, var_names=['d', 'sigma'])
415
- st.plotly_chart(fig_trace, use_container_width=True)
416
-
417
- # Posterior Plot
418
- with st.expander("📊 Posterior Distributions", expanded=True):
419
- st.markdown("""
420
- **How to read:**
421
- - Shaded area: 95% Highest Density Interval (credible interval)
422
- - Red line: Posterior mean
423
- """)
424
- fig_posterior = plot_posterior(st.session_state.trace)
425
- st.plotly_chart(fig_posterior, use_container_width=True)
426
-
427
- # Forest Plot
428
- with st.expander("🌲 Forest Plot (Type-Specific Effects)", expanded=True):
429
- st.markdown("""
430
- **How to read:**
431
- - Each row = one Pokemon type
432
- - Point = mean effect, line = 95% credible interval
433
- - Red dashed line = no effect (δ=0)
434
- - Right of line = speed helps, left = speed hurts
435
- """)
436
- fig_forest = plot_forest(
437
- st.session_state.trace,
438
- st.session_state.results['trial_labels']
439
- )
440
- st.plotly_chart(fig_forest, use_container_width=True)
441
-
442
- # Win Rate Comparison
443
- with st.expander("🏆 Win Rate Comparison", expanded=True):
444
- stats = st.session_state.results['statistics']
445
- fig_winrate = plot_win_rate_comparison(
446
- st.session_state.results['trial_labels'],
447
- stats['pc_mean'],
448
- stats['pt_mean']
449
- )
450
- st.plotly_chart(fig_winrate, use_container_width=True)
451
-
452
- # Heterogeneity
453
- with st.expander("📉 Heterogeneity Analysis"):
454
- st.markdown("""
455
- **Sigma (σ):** Measures variation in speed effects across types
456
- - Low (< 0.2): Effects are similar across types
457
- - Moderate (0.2-0.5): Some type-specific differences
458
- - High (> 0.5): Large differences between types
459
- """)
460
- fig_hetero = plot_heterogeneity(st.session_state.trace)
461
- st.plotly_chart(fig_hetero, use_container_width=True)
462
 
 
 
 
 
463
  else:
464
- st.info("ℹ️ Run analysis first to view visualizations")
465
-
466
- # ===== Tab 3: AI 助手 =====
467
- with tab3:
468
- st.markdown("### 🤖 AI Assistant")
469
-
470
- if not api_key:
471
- st.warning("⚠️ Please enter your OpenAI API Key in the sidebar to use AI features")
472
-
473
- elif st.session_state.llm_assistant is not None:
474
- # 快捷問題按鈕
475
- st.markdown("**Quick Questions:**")
476
 
477
- col1, col2, col3 = st.columns(3)
 
478
 
479
- with col1:
480
- if st.button("📝 Generate Summary", use_container_width=True):
481
- if st.session_state.results:
482
- with st.spinner("Generating summary..."):
483
- response = st.session_state.llm_assistant.generate_summary(
484
- st.session_state.results
485
- )
486
- st.session_state.chat_history.append({
487
- 'role': 'assistant',
488
- 'content': response
489
- })
490
- else:
491
- st.warning("Run analysis first")
492
 
493
- with col2:
494
- if st.button("📊 Explain Results", use_container_width=True):
495
- if st.session_state.results:
496
- with st.spinner("Explaining..."):
 
 
 
 
 
 
 
 
 
 
 
497
  response = st.session_state.llm_assistant.get_response(
498
- "Please explain the key findings from this analysis in simple terms.",
499
- st.session_state.results
500
- )
501
- st.session_state.chat_history.append({
502
- 'role': 'assistant',
503
- 'content': response
504
- })
505
- else:
506
- st.warning("Run analysis first")
507
-
508
- with col3:
509
- if st.button("💡 Suggest Improvements", use_container_width=True):
510
- if st.session_state.results:
511
- with st.spinner("Thinking..."):
512
- response = st.session_state.llm_assistant.suggest_improvements(
513
- st.session_state.results
514
  )
515
- st.session_state.chat_history.append({
516
- 'role': 'assistant',
517
- 'content': response
518
- })
519
- else:
520
- st.warning("Run analysis first")
521
-
522
- # 概念解釋按鈕
523
- st.markdown("**Explain Concepts:**")
 
 
524
 
525
- col1, col2, col3, col4 = st.columns(4)
526
 
527
- concepts = [
528
- ('Log Odds Ratio', 'log_odds_ratio'),
529
- ('Odds Ratio', 'odds_ratio'),
530
- ('HDI', 'hdi'),
531
- ('Heterogeneity', 'heterogeneity')
 
 
 
 
 
 
 
532
  ]
533
 
534
- for i, (label, concept_key) in enumerate(concepts):
535
- with [col1, col2, col3, col4][i]:
536
- if st.button(label, use_container_width=True):
537
- with st.spinner(f"Explaining {label}..."):
538
- response = st.session_state.llm_assistant.explain_concept(
539
- concept_key,
540
- st.session_state.results
541
- )
542
- st.session_state.chat_history.append({
543
- 'role': 'assistant',
544
- 'content': response
545
- })
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
546
 
 
547
  st.markdown("---")
548
-
549
- # 聊天介面
550
- st.markdown("**Chat with AI Assistant:**")
551
-
552
- # 顯示歷史訊息
553
- for msg in st.session_state.chat_history:
554
- if msg['role'] == 'user':
555
- st.markdown(f"**You:** {msg['content']}")
556
- else:
557
- st.markdown(f"**AI:** {msg['content']}")
558
- st.markdown("---")
559
-
560
- # 輸入框
561
- user_input = st.text_area(
562
- "Ask a question about the analysis:",
563
- height=100,
564
- placeholder="e.g., Which Pokemon type benefits most from speed?"
565
- )
566
-
567
- col1, col2 = st.columns([1, 5])
568
-
569
- with col1:
570
- if st.button("Send", type="primary"):
571
- if user_input:
572
- # 添加用戶訊息
573
- st.session_state.chat_history.append({
574
- 'role': 'user',
575
- 'content': user_input
576
- })
577
-
578
- # 獲取 AI 回應
579
- with st.spinner("Thinking..."):
580
- response = st.session_state.llm_assistant.get_response(
581
- user_input,
582
- st.session_state.results
583
- )
584
- st.session_state.chat_history.append({
585
- 'role': 'assistant',
586
- 'content': response
587
- })
588
-
589
- st.rerun()
590
-
591
- with col2:
592
- if st.button("Clear Chat"):
593
- st.session_state.chat_history = []
594
- st.session_state.llm_assistant.reset_conversation()
595
- st.rerun()
596
 
597
- # ===== Tab 4: 匯出結 =====
598
- with tab4:
599
- st.markdown("### 📥 Export Results")
600
-
601
- if st.session_state.results is not None:
602
- # 準備匯出資料
603
- export_data = {
604
- 'timestamp': st.session_state.results['timestamp'],
605
- 'overall_statistics': {
606
- 'd_mean': float(st.session_state.results['statistics']['d_mean']),
607
- 'd_hdi': [
608
- float(st.session_state.results['statistics']['d_hdi_lower']),
609
- float(st.session_state.results['statistics']['d_hdi_upper'])
610
- ],
611
- 'or_mean': float(st.session_state.results['statistics']['or_mean']),
612
- 'or_hdi': [
613
- float(st.session_state.results['statistics']['or_hdi_lower']),
614
- float(st.session_state.results['statistics']['or_hdi_upper'])
615
- ],
616
- 'sigma_mean': float(st.session_state.results['statistics']['sigma_mean'])
617
- },
618
- 'type_results': st.session_state.analyzer.get_trial_specific_results().to_dict('records')
619
- }
620
-
621
- # JSON 下載
622
- st.markdown("**Download as JSON:**")
623
- json_str = json.dumps(export_data, indent=2)
624
- st.download_button(
625
- label="📄 Download JSON",
626
- data=json_str,
627
- file_name=f"pokemon_speed_analysis_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json",
628
- mime="application/json"
629
- )
630
-
631
- # CSV 下載
632
- st.markdown("**Download Type Results as CSV:**")
633
- csv_buffer = io.StringIO()
634
- st.session_state.analyzer.get_trial_specific_results().to_csv(csv_buffer, index=False)
635
- st.download_button(
636
- label="📊 Download CSV",
637
- data=csv_buffer.getvalue(),
638
- file_name=f"pokemon_type_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv",
639
- mime="text/csv"
640
- )
641
-
642
- # 顯示摘要
643
  st.markdown("---")
644
- st.markdown("### 📋 Export Preview")
645
- st.json(export_data)
646
-
647
- else:
648
- st.info("ℹ️ Run analysis first to export results")
649
 
650
- # ===== Footer =====
651
  st.markdown("---")
652
- st.markdown("""
653
- <div style='text-align: center; color: #666; font-size: 0.9rem;'>
654
- <p>Pokemon Speed Bayesian Analysis System | Powered by PyMC, ArviZ, GPT-4, and Streamlit</p>
655
- <p>⚡ Analyzing the impact of speed on win rates across Pokemon types </p>
656
- </div>
657
- """, unsafe_allow_html=True)
 
 
 
 
 
 
 
 
 
1
  import streamlit as st
2
  import pandas as pd
3
+ import uuid
4
+ from datetime import datetime, timedelta
5
+ import atexit
6
+ import os
7
+ import base64
 
 
 
 
 
 
 
 
8
 
9
+ # 頁面配置
10
  st.set_page_config(
11
+ page_title="Bayesian Hierarchical Model - Pokémon Speed Analysis",
12
  page_icon="⚡",
13
  layout="wide",
14
  initial_sidebar_state="expanded"
15
  )
16
 
17
+ # 自定義 CSS
18
  st.markdown("""
19
  <style>
20
+ .streamlit-expanderHeader {
21
+ background-color: #e8f1f8;
22
+ border: 1px solid #b0cfe8;
23
+ border-radius: 5px;
24
+ font-weight: 600;
25
+ color: #1b4f72;
26
+ }
27
+ .streamlit-expanderHeader:hover {
28
+ background-color: #d0e7f8;
29
+ }
30
+ .stMetric {
31
+ background-color: #f8fbff;
32
+ padding: 10px;
33
+ border-radius: 5px;
34
+ border: 1px solid #d0e4f5;
35
  }
36
+ .stButton > button {
37
+ width: 100%;
38
+ border-radius: 20px;
39
+ font-weight: 600;
40
+ transition: all 0.3s ease;
41
  }
42
+ .stButton > button:hover {
43
+ transform: translateY(-2px);
44
+ box-shadow: 0 4px 8px rgba(0,0,0,0.2);
 
 
45
  }
46
+ .success-box {
47
+ background-color: #d4edda;
48
+ border: 1px solid #c3e6cb;
49
+ border-radius: 5px;
50
+ padding: 10px;
51
+ margin: 10px 0;
52
+ }
53
+ .warning-box {
54
+ background-color: #fff3cd;
55
+ border: 1px solid #ffeaa7;
56
+ border-radius: 5px;
57
+ padding: 10px;
58
+ margin: 10px 0;
59
+ }
60
+ .info-box {
61
+ background-color: #d1ecf1;
62
+ border: 1px solid #bee5eb;
63
+ border-radius: 5px;
64
+ padding: 10px;
65
+ margin: 10px 0;
66
  }
67
  </style>
68
  """, unsafe_allow_html=True)
69
 
70
+ # 導入自定義模組
71
+ from bayesian_core import BayesianHierarchicalAnalyzer
72
+ from bayesian_llm_assistant import BayesianLLMAssistant
73
+
74
+ # 清理函數
75
+ def cleanup_old_sessions():
76
+ """清理超過 1 小時的 session"""
77
+ current_time = datetime.now()
78
+ for session_id in list(BayesianHierarchicalAnalyzer._session_results.keys()):
79
+ result = BayesianHierarchicalAnalyzer._session_results.get(session_id)
80
+ if result:
81
+ result_time = datetime.fromisoformat(result['timestamp'])
82
+ if current_time - result_time > timedelta(hours=1):
83
+ BayesianHierarchicalAnalyzer.clear_session_results(session_id)
84
+
85
+ # 註冊清理函數
86
+ atexit.register(cleanup_old_sessions)
87
+
88
+ # 初始化 session state
89
+ if 'session_id' not in st.session_state:
90
+ st.session_state.session_id = str(uuid.uuid4())
91
+ if 'analysis_results' not in st.session_state:
92
+ st.session_state.analysis_results = None
93
  if 'chat_history' not in st.session_state:
94
  st.session_state.chat_history = []
95
+ if 'analyzer' not in st.session_state:
96
+ st.session_state.analyzer = None
97
+
98
+ # 標題
99
+ st.title("⚡ Bayesian Hierarchical Model Analysis")
100
+ st.markdown("### 寶可夢速度對勝率影響的階層貝氏分析")
101
+ st.markdown("---")
102
 
103
+ # Sidebar
104
  with st.sidebar:
105
+ st.header("⚙️ 配置設定")
106
 
107
+ # Google Gemini API Key
108
  api_key = st.text_input(
109
+ "Google Gemini API Key",
110
  type="password",
111
+ help="輸入您的 Google Gemini API Key 以使用 AI 助手"
112
  )
113
 
114
  if api_key:
115
+ st.session_state.api_key = api_key
116
+ st.success("✅ API Key 已載入")
 
 
 
 
 
117
 
118
  st.markdown("---")
119
 
120
+ # 清理按鈕
121
+ if st.button("🧹 清理過期資料"):
122
+ cleanup_old_sessions()
123
+ st.success("✅ 清理完成")
124
+ st.rerun()
 
 
 
 
 
 
125
 
126
  st.markdown("---")
127
 
128
+ # 資料來源選擇
129
+ st.subheader("📊 資料來源")
130
+ data_source = st.radio(
131
+ "選擇資料來源:",
132
+ ["使用預設資料集", "上傳您的資料"]
 
 
 
 
 
133
  )
134
 
135
+ uploaded_file = None
136
+ if data_source == "上傳您的資料":
137
+ uploaded_file = st.file_uploader(
138
+ "上傳 CSV 檔案",
139
+ type=['csv'],
140
+ help="上傳寶可夢速度分析資料"
141
+ )
142
+
143
+ with st.expander("📖 資料格式說明"):
144
+ st.markdown("""
145
+ **必要欄位格式:**
146
+ - `Trial_Type`: 寶可夢屬性(如 Water, Fire, Grass)
147
+ - `rc`: 控制組(速度慢)的勝場數
148
+ - `nc`: 控制組的總場數
149
+ - `rt`: 實驗組(速度快)的勝場數
150
+ - `nt`: 實驗組的總場數
151
+
152
+ **範例:**
153
+ ```
154
+ Trial_Type, rc, nc, rt, nt
155
+ Water, 45, 100, 60, 100
156
+ Fire, 38, 100, 55, 100
157
+ Grass, 42, 100, 58, 100
158
+ ```
159
+ """)
160
 
161
  st.markdown("---")
162
 
163
+ # MCMC 抽樣參數設定
164
+ st.subheader("🎲 MCMC 抽樣參數")
165
+
166
+ with st.expander("⚙️ 進階設定"):
167
+ n_samples = st.slider(
168
+ "抽樣數 (Samples)",
169
+ min_value=500,
170
+ max_value=5000,
171
+ value=2000,
172
+ step=500,
173
+ help="更多樣本 = 更準確,但更慢"
174
+ )
175
 
176
+ n_tune = st.slider(
177
+ "調整期樣本 (Tuning)",
178
+ min_value=500,
179
+ max_value=2000,
180
+ value=1000,
181
+ step=100,
182
+ help="調整期用於優化抽樣器"
183
+ )
184
 
185
+ n_chains = st.selectbox(
186
+ "鏈數 (Chains)",
187
+ options=[1, 2, 4],
188
+ index=0,
189
+ help="多條鏈可以檢測收斂問題"
190
+ )
191
 
192
+ target_accept = st.slider(
193
+ "目標接受率",
194
+ min_value=0.80,
195
+ max_value=0.99,
196
+ value=0.95,
197
+ step=0.01,
198
+ help="更高的接受率 = 更準確,但更慢"
199
+ )
200
+
201
+ st.markdown("---")
202
+
203
+ # 關於系統
204
+ with st.expander("ℹ️ 關於此系統"):
205
+ st.markdown("""
206
+ **貝氏階層模型分析系統**
207
+
208
+ 本系統使用貝氏階層模型來分析速度對不同屬性寶可夢勝率的影響。
209
+
210
+ **主要功能:**
211
+ - 🔬 貝氏推論與 MCMC 抽樣
212
+ - 📊 階層模型(跨屬性資訊借用)
213
+ - 📈 完整視覺化(4 個圖表)
214
+ - 💬 AI 助手解釋
215
+ - 🎮 對戰策略建議
216
+
217
+ **模型優勢:**
218
+ - 量化不確定性
219
+ - 處理小樣本
220
+ - 估計屬性間異質性
221
+ - 穩健的統計推論
222
  """)
223
 
224
+ # 主要內容區 - 雙 Tab
225
+ tab1, tab2 = st.tabs(["📊 貝氏分析", "💬 AI 助手"])
 
226
 
227
+ # Tab 1: 貝氏分析
228
+ with tab1:
229
+ st.header("📊 貝氏階層模型分析")
230
 
231
+ # 載入資料
232
+ if data_source == "使用預設資料集":
233
+ # 檢查預設資料是否存在
234
+ default_data_path = "pokemon_speed_meta_results.csv"
235
+ if os.path.exists(default_data_path):
236
+ df = pd.read_csv(default_data_path)
237
+ st.success(f"✅ 已載入預設資料集({len(df)} 個屬性)")
238
+ else:
239
+ st.warning("⚠️ 找不到預設資料集,請上傳您的資料")
240
+ df = None
241
+ else:
242
+ if uploaded_file is not None:
243
  df = pd.read_csv(uploaded_file)
244
+ st.success(f"✅ 已載入資料({len(df)} 個屬性)")
245
+ else:
246
+ df = None
247
+ st.info("📁 請在左側上傳 CSV 檔案")
 
 
 
 
 
 
 
 
 
 
 
248
 
249
+ if df is not None:
250
+ # 顯示資料預覽
251
+ with st.expander("👀 資料預覽"):
252
+ st.dataframe(df, use_container_width=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
253
 
254
  st.markdown("---")
255
 
256
  # 執行分析按鈕
257
+ col1, col2, col3 = st.columns([2, 1, 2])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
258
 
259
  with col2:
260
+ analyze_button = st.button("🔬 開始貝氏分析", type="primary", use_container_width=True)
 
 
 
 
 
261
 
262
+ # 執行分析
263
+ if analyze_button:
264
+ # 初始化分析器
265
+ if st.session_state.analyzer is None:
266
+ st.session_state.analyzer = BayesianHierarchicalAnalyzer(st.session_state.session_id)
267
 
268
+ try:
269
+ st.session_state.analyzer.load_data(df)
270
+
271
+ # 進度條
272
+ progress_bar = st.progress(0)
273
+ status_text = st.empty()
274
+
275
+ def update_progress(message, percent):
276
+ status_text.text(message)
277
+ progress_bar.progress(percent / 100)
278
+
279
+ # 執行分析
280
+ with st.spinner("正在執行貝氏分析..."):
281
+ results = st.session_state.analyzer.run_analysis(
282
+ n_samples=n_samples,
283
+ n_tune=n_tune,
284
+ n_chains=n_chains,
285
+ target_accept=target_accept,
286
+ progress_callback=update_progress
287
+ )
288
+ st.session_state.analysis_results = results
289
+
290
+ progress_bar.empty()
291
+ status_text.empty()
292
+ st.success("✅ 分析完成!")
293
+ st.balloons()
294
+
295
+ except Exception as e:
296
+ st.error(f"❌ 分析失敗: {str(e)}")
297
+
298
+ # 顯示結果
299
+ if st.session_state.analysis_results is not None:
300
+ results = st.session_state.analysis_results
301
 
302
+ st.markdown("---")
303
+ st.markdown("## 📈 分析結果")
304
 
305
+ # 建立 4 個子 Tab
306
+ result_tabs = st.tabs(["📊 概覽", "📉 Trace Plot", "🎯 Posterior", "🌲 Forest Plot"])
 
 
 
 
 
 
307
 
308
+ # Tab: 概覽
309
+ with result_tabs[0]:
310
+ st.markdown("### 🎯 關鍵指標")
311
+
312
+ # 顯示關鍵指標
313
+ col1, col2, col3 = st.columns(3)
314
+
315
+ with col1:
316
+ st.metric(
317
+ label="整體效應 (d)",
318
+ value=f"{results['d_mean']:.4f}",
319
+ delta=f"HDI: [{results['d_hdi_lower']:.3f}, {results['d_hdi_upper']:.3f}]"
320
+ )
321
+
322
+ with col2:
323
+ st.metric(
324
+ label="屬性間變異 (sigma)",
325
+ value=f"{results['sigma_mean']:.4f}",
326
+ delta=f"SD: {results['sigma_sd']:.4f}"
327
+ )
328
+
329
+ with col3:
330
+ st.metric(
331
+ label="速度勝算比 (OR)",
332
+ value=f"{results['or_speed_mean']:.3f}",
333
+ delta=f"HDI: [{results['or_speed_hdi_lower']:.3f}, {results['or_speed_hdi_upper']:.3f}]"
334
+ )
335
+
336
+ st.markdown("---")
337
+
338
+ # 顯著性判斷
339
+ if results['is_significant']:
340
+ st.markdown("""
341
+ <div class="success-box">
342
+ <h4>✅ 結果顯著</h4>
343
+ <p>速度對勝率有<strong>顯著影響</strong>(95% HDI 不包含 0)</p>
344
+ </div>
345
+ """, unsafe_allow_html=True)
346
+ else:
347
+ st.markdown("""
348
+ <div class="warning-box">
349
+ <h4>⚠️ 結果不顯著</h4>
350
+ <p>速度對勝率<strong>無顯著影響</strong>(95% HDI 包含 0)</p>
351
+ </div>
352
+ """, unsafe_allow_html=True)
353
+
354
+ st.markdown("---")
355
+
356
+ # 文字摘要
357
+ st.markdown("### 📋 統計摘要")
358
+ st.text_area(
359
+ "Summary Statistics",
360
+ results['summary_text'],
361
+ height=300
362
  )
363
+
364
+ # 下載摘要
365
+ st.download_button(
366
+ label="📥 下載統計摘要 (.txt)",
367
+ data=results['summary_text'],
368
+ file_name=f"bayesian_summary_{results['timestamp'][:10]}.txt",
369
+ mime="text/plain"
 
370
  )
371
+
372
+ st.markdown("---")
373
+
374
+ # 各屬性詳細結果
375
+ st.markdown("### 🎮 各屬性詳細結果")
376
+
377
+ delta_df = pd.DataFrame(results['delta_results'])
378
+ delta_df['Significant'] = delta_df['is_significant'].apply(lambda x: '★' if x else '')
379
+ delta_df = delta_df[['trial_type', 'delta_mean', 'delta_sd', 'delta_hdi_lower', 'delta_hdi_upper', 'Significant']]
380
+ delta_df.columns = ['屬性', 'Delta 平均', 'Delta 標準差', 'HDI 下界', 'HDI 上界', '顯著']
381
+
382
+ st.dataframe(
383
+ delta_df.style.format({
384
+ 'Delta 平均': '{:.4f}',
385
+ 'Delta 標準差': '{:.4f}',
386
+ 'HDI 下界': '{:.4f}',
387
+ 'HDI 上界': '{:.4f}'
388
+ }),
389
+ use_container_width=True
390
  )
 
 
 
 
 
 
 
 
 
391
 
392
+ # Tab: Trace Plot
393
+ with result_tabs[1]:
394
+ st.markdown("### 📉 Trace Plot - 收斂診斷")
395
+
396
+ st.markdown("""
397
+ <div class="info-box">
398
+ <h4>📖 如何解讀 Trace Plot��</h4>
399
+ <ul>
400
+ <li><strong>左欄</strong>:MCMC 抽樣軌跡(應該像「毛毛蟲」,平穩無趨勢)</li>
401
+ <li><strong>右欄</strong>:後驗分佈密度圖</li>
402
+ <li><strong>良好收斂</strong>:軌跡圖混合良好,無明顯趨勢或週期</li>
403
+ <li><strong>問題跡象</strong>:軌跡圖有趨勢、卡住、或未混合</li>
404
+ </ul>
405
+ </div>
406
+ """, unsafe_allow_html=True)
407
+
408
+ if results['trace_plot']:
409
+ st.image(f"data:image/png;base64,{results['trace_plot']}", use_column_width=True)
410
+ else:
411
+ st.warning("⚠️ Trace Plot 未生成")
412
 
413
+ # Tab: Posterior Plot
414
+ with result_tabs[2]:
415
+ st.markdown("### 🎯 Posterior Distributions - 後驗分佈")
416
+
417
+ st.markdown("""
418
+ <div class="info-box">
419
+ <h4>📖 如何解讀 Posterior Plot:</h4>
420
+ <ul>
421
+ <li><strong>d</strong>:整體平均效應(log odds ratio)</li>
422
+ <li><strong>sigma</strong>:屬性間變異(越大表示屬性間差異越大)</li>
423
+ <li><strong>or_speed</strong>:速度勝算比(exp(d))</li>
424
+ <li><strong>95% HDI</strong>:最高密度區間(類似信賴區間)</li>
425
+ <li><strong>顯著性</strong>:HDI 不包含 0(d)或 1(or_speed)即為顯著</li>
426
+ </ul>
427
+ </div>
428
+ """, unsafe_allow_html=True)
429
+
430
+ if results['posterior_plot']:
431
+ st.image(f"data:image/png;base64,{results['posterior_plot']}", use_column_width=True)
432
+ else:
433
+ st.warning("⚠️ Posterior Plot 未生成")
434
 
435
+ # Tab: Forest Plot
436
+ with result_tabs[3]:
437
+ st.markdown("### 🌲 Forest Plot - 各屬性效應")
438
 
439
+ st.markdown("""
440
+ <div class="info-box">
441
+ <h4>📖 如何解讀 Forest Plot:</h4>
442
+ <ul>
443
+ <li><strong>點</strong>:各屬性的平均效應(delta)</li>
444
+ <li><strong>橫線</strong>:95% 信賴區間</li>
445
+ <li><strong>紅虛線</strong>:無效應參考線(delta = 0)</li>
446
+ <li><strong>星號 ★</strong>:該屬性效應顯著</li>
447
+ <li><strong>右側</strong>:速度快有利於該屬性</li>
448
+ <li><strong>左側</strong>:速度慢有利於該屬性(罕見)</li>
449
+ </ul>
450
+ </div>
451
+ """, unsafe_allow_html=True)
452
 
453
+ if results['forest_plot']:
454
+ st.image(f"data:image/png;base64,{results['forest_plot']}", use_column_width=True)
455
+ else:
456
+ st.warning("⚠️ Forest Plot 未生成")
457
+
458
+ st.markdown("---")
459
+
460
+ # 顯著屬性總結
461
+ significant_types = [dr for dr in results['delta_results'] if dr['is_significant']]
462
+
463
+ if significant_types:
464
+ st.markdown(f"### ⭐ 顯著屬性總結 ({len(significant_types)}/{results['n_trials']})")
465
+
466
+ for dr in significant_types:
467
+ if dr['delta_mean'] > 0:
468
+ st.success(f"**{dr['trial_type']}**: 速度快有顯著優勢 (Delta = {dr['delta_mean']:.3f})")
469
+ else:
470
+ st.warning(f"**{dr['trial_type']}**: 速度慢有顯著優勢 (Delta = {dr['delta_mean']:.3f})")
471
+ else:
472
+ st.info("沒有屬性顯示顯著的速度效應")
473
 
474
+ # Tab 2: AI 助手
475
  with tab2:
476
+ st.header("💬 AI 分析助手")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
477
 
478
+ if not st.session_state.get('api_key'):
479
+ st.warning("⚠️ 請在左側輸入您的 Google Gemini API Key 以使用 AI 助手")
480
+ elif st.session_state.analysis_results is None:
481
+ st.info("ℹ️ 請先在「貝氏分析」頁面執行分析")
482
  else:
483
+ # 初始化 LLM 助手
484
+ if 'llm_assistant' not in st.session_state:
485
+ st.session_state.llm_assistant = BayesianLLMAssistant(
486
+ api_key=st.session_state.api_key,
487
+ session_id=st.session_state.session_id
488
+ )
 
 
 
 
 
 
489
 
490
+ # 聊天容器
491
+ chat_container = st.container()
492
 
493
+ with chat_container:
494
+ for message in st.session_state.chat_history:
495
+ with st.chat_message(message["role"]):
496
+ st.markdown(message["content"])
 
 
 
 
 
 
 
 
 
497
 
498
+ # 使用者輸入
499
+ if prompt := st.chat_input("詢問關於分析結果的任何問題..."):
500
+ # 添��使用者訊息
501
+ st.session_state.chat_history.append({
502
+ "role": "user",
503
+ "content": prompt
504
+ })
505
+
506
+ with st.chat_message("user"):
507
+ st.markdown(prompt)
508
+
509
+ # AI 回應
510
+ with st.chat_message("assistant"):
511
+ with st.spinner("思考中..."):
512
+ try:
513
  response = st.session_state.llm_assistant.get_response(
514
+ user_message=prompt,
515
+ analysis_results=st.session_state.analysis_results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
516
  )
517
+ st.markdown(response)
518
+ except Exception as e:
519
+ error_msg = f"❌ 錯誤: {str(e)}\n\n請檢查 API key 或重新表達問題。"
520
+ st.error(error_msg)
521
+ response = error_msg
522
+
523
+ # 添加助手回應
524
+ st.session_state.chat_history.append({
525
+ "role": "assistant",
526
+ "content": response
527
+ })
528
 
529
+ st.markdown("---")
530
 
531
+ # 快速問題按鈕
532
+ st.subheader("💡 快速問題")
533
+
534
+ quick_questions = [
535
+ "📊 給我分析總結",
536
+ "🎯 解釋 d 參數",
537
+ "🔍 解釋 sigma",
538
+ "📖 什麼是貝氏統計?",
539
+ "🏗️ 什麼是階層模型?",
540
+ "📉 如何看 Trace Plot?",
541
+ "🎮 比較各屬性",
542
+ "⚔️ 對戰策略建議"
543
  ]
544
 
545
+ cols = st.columns(4)
546
+ for idx, question in enumerate(quick_questions):
547
+ col_idx = idx % 4
548
+ if cols[col_idx].button(question, key=f"quick_{idx}", use_container_width=True):
549
+ # 根據問題選擇對應的方法
550
+ if "總結" in question:
551
+ response = st.session_state.llm_assistant.generate_summary(
552
+ st.session_state.analysis_results
553
+ )
554
+ elif "d 參數" in question:
555
+ response = st.session_state.llm_assistant.explain_metric(
556
+ 'd',
557
+ st.session_state.analysis_results
558
+ )
559
+ elif "sigma" in question:
560
+ response = st.session_state.llm_assistant.explain_metric(
561
+ 'sigma',
562
+ st.session_state.analysis_results
563
+ )
564
+ elif "貝氏統計" in question:
565
+ response = st.session_state.llm_assistant.explain_bayesian_concepts()
566
+ elif "階層模型" in question:
567
+ response = st.session_state.llm_assistant.explain_hierarchical_model()
568
+ elif "Trace Plot" in question:
569
+ response = st.session_state.llm_assistant.explain_convergence()
570
+ elif "比較" in question:
571
+ response = st.session_state.llm_assistant.compare_types(
572
+ st.session_state.analysis_results
573
+ )
574
+ elif "策略" in question:
575
+ response = st.session_state.llm_assistant.battle_strategy_advice(
576
+ st.session_state.analysis_results
577
+ )
578
+ else:
579
+ response = st.session_state.llm_assistant.get_response(
580
+ question,
581
+ st.session_state.analysis_results
582
+ )
583
+
584
+ st.session_state.chat_history.append({
585
+ "role": "user",
586
+ "content": question
587
+ })
588
+
589
+ st.session_state.chat_history.append({
590
+ "role": "assistant",
591
+ "content": response
592
+ })
593
+
594
+ st.rerun()
595
 
596
+ # 重置對話按鈕
597
  st.markdown("---")
598
+ if st.button("🔄 重置對話"):
599
+ st.session_state.llm_assistant.reset_conversation()
600
+ st.session_state.chat_history = []
601
+ st.success("✅ 對話已重置")
602
+ st.rerun()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
603
 
604
+ # DAG 圖(如有的話,放在側邊欄底部)
605
+ if st.session_state.analysis_results and st.session_state.analysis_results.get('dag_plot'):
606
+ with st.sidebar:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
607
  st.markdown("---")
608
+ with st.expander("🔀 DAG 模型結構圖"):
609
+ st.image(f"data:image/png;base64,{st.session_state.analysis_results['dag_plot']}")
 
 
 
610
 
611
+ # Footer
612
  st.markdown("---")
613
+ st.markdown(
614
+ f"""
615
+ <div style='text-align: center'>
616
+ <p>⚡ Bayesian Hierarchical Model for Pokémon Speed Analysis | Built with PyMC & Streamlit</p>
617
+ <p>Session ID: {st.session_state.session_id[:8]} | Powered by Google Gemini</p>
618
+ </div>
619
+ """,
620
+ unsafe_allow_html=True
621
+ )
bayesian_core.py CHANGED
@@ -1,264 +1,311 @@
1
- """
2
- Bayesian Meta-Analysis Core for Pokemon Speed Analysis
3
- Using PyMC for hierarchical Bayesian modeling
4
- """
5
-
6
  import pymc as pm
7
  import numpy as np
8
  import pandas as pd
9
  import arviz as az
 
 
 
10
  from datetime import datetime
 
11
 
12
-
13
- class BayesianSpeedAnalyzer:
14
  """
15
- 葉斯階層分析器
16
- 分析速度對不同屬性寶可夢勝率的影響
17
  """
18
-
19
- def __init__(self, data):
 
 
 
 
 
 
20
  """
21
  初始化分析器
22
-
23
  Args:
24
- data: DataFrame 包含欄位:
25
- - Trial_Type: 屬性名稱
26
- - rc: 控制組勝場數
27
- - nc: 控制組總場數
28
- - rt: 實驗組勝場數
29
- - nt: 實驗組總場數
30
  """
31
- self.data = data
32
- self.trial_labels = data['Trial_Type'].values
33
- self.num_trials = len(data)
34
  self.model = None
35
  self.trace = None
36
- self.results = None
37
-
38
- def build_model(self):
39
- """建立貝葉斯階層式模型"""
40
 
41
- with pm.Model() as model:
42
- # ===== 先驗分佈 (Priors) =====
43
- # d: 整體速度效應 (log odds ratio)
44
- d = pm.Normal('d', mu=0, sigma=10)
45
-
46
- # tau: 精度參數 (控制屬性間變異)
47
- tau = pm.Gamma('tau', alpha=0.001, beta=0.001)
48
-
49
- # sigma: 標準差 (由 tau 導出)
50
- sigma = pm.Deterministic('sigma', 1 / pm.math.sqrt(tau))
51
-
52
- # ===== 各屬性特定參數 =====
53
- # mu: 各屬性基準勝率 (logit scale)
54
- mu = pm.Normal('mu', mu=0, sigma=10, shape=self.num_trials)
55
-
56
- # delta: 各屬性的速度效應
57
- delta = pm.Normal(
58
- 'delta',
59
- mu=d,
60
- sigma=1 / pm.math.sqrt(tau),
61
- shape=self.num_trials
62
- )
63
-
64
- # ===== 轉換與似然函數 =====
65
- # pc: 控制組(慢速)勝率
66
- pc = pm.Deterministic('pc', pm.math.invlogit(mu))
67
-
68
- # pt: 實驗組(快速)勝率
69
- pt = pm.Deterministic('pt', pm.math.invlogit(mu + delta))
70
-
71
- # 觀測資料的似然函數
72
- rc_obs = pm.Binomial(
73
- 'rc_obs',
74
- n=self.data['nc'].values,
75
- p=pc,
76
- observed=self.data['rc'].values
77
- )
78
-
79
- rt_obs = pm.Binomial(
80
- 'rt_obs',
81
- n=self.data['nt'].values,
82
- p=pt,
83
- observed=self.data['rt'].values
84
- )
85
-
86
- # ===== 導出統計量 =====
87
- # 預測新屬性的效應
88
- delta_new = pm.Normal('delta_new', mu=d, sigma=1 / pm.math.sqrt(tau))
89
-
90
- # 勝率比 (Odds Ratio)
91
- or_speed = pm.Deterministic('or_speed', pm.math.exp(d))
92
-
93
- self.model = model
94
- return model
95
-
96
- def run_analysis(self, samples=2000, tune=1000, chains=1, target_accept=0.95, progress_callback=None):
97
  """
98
- 執行 MCMC 抽樣
99
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
  Args:
101
- samples: 抽樣
102
- tune: 暖身迭代次
103
- chains: 鏈數
104
  target_accept: 目標接受率
105
- progress_callback: 進度回調函數 (可選)
106
-
107
  Returns:
108
- trace: InferenceData 物件
109
  """
110
- if self.model is None:
111
- self.build_model()
112
-
113
- with self.model:
114
- self.trace = pm.sample(
115
- samples,
116
- tune=tune,
117
- chains=chains,
118
- target_accept=target_accept,
119
- return_inferencedata=True,
120
- progressbar=False # Streamlit 中關閉進度條
121
- )
122
-
123
- # 生成分析結果
124
- self._generate_results()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
- return self.trace
127
-
128
- def _generate_results(self):
129
- """生成分析結果摘要"""
 
 
130
 
131
- # 主要參數摘要
132
- summary = az.summary(
133
- self.trace,
134
- var_names=['d', 'sigma', 'or_speed'],
135
- hdi_prob=0.95
136
- )
137
-
138
- # 各屬性效應摘要
139
- delta_summary = az.summary(
140
- self.trace,
141
- var_names=['delta'],
142
- hdi_prob=0.95
143
- )
144
- delta_summary['Trial_Type'] = self.trial_labels
145
-
146
- # 提取關鍵統計量
147
- d_mean = summary.loc['d', 'mean']
148
- d_hdi_lower = summary.loc['d', 'hdi_2.5%']
149
- d_hdi_upper = summary.loc['d', 'hdi_97.5%']
150
 
151
- or_mean = summary.loc['or_speed', 'mean']
152
- or_hdi_lower = summary.loc['or_speed', 'hdi_2.5%']
153
- or_hdi_upper = summary.loc['or_speed', 'hdi_97.5%']
 
 
 
154
 
155
- sigma_mean = summary.loc['sigma', 'mean']
156
-
157
- # 計算各屬性勝率變化
158
- delta_values = self.trace.posterior['delta'].values.reshape(-1, self.num_trials)
159
- mu_values = self.trace.posterior['mu'].values.reshape(-1, self.num_trials)
 
 
160
 
161
- pc_mean = 1 / (1 + np.exp(-mu_values.mean(axis=0))) # 控制組平均勝率
162
- pt_mean = 1 / (1 + np.exp(-(mu_values.mean(axis=0) + delta_values.mean(axis=0)))) # 實驗組平均勝率
163
 
164
- win_rate_increase = (pt_mean - pc_mean) * 100 # 勝率提升百分點
165
-
166
- self.results = {
167
- 'summary': summary,
168
- 'delta_summary': delta_summary,
169
- 'statistics': {
170
- 'd_mean': d_mean,
171
- 'd_hdi_lower': d_hdi_lower,
172
- 'd_hdi_upper': d_hdi_upper,
173
- 'or_mean': or_mean,
174
- 'or_hdi_lower': or_hdi_lower,
175
- 'or_hdi_upper': or_hdi_upper,
176
- 'sigma_mean': sigma_mean,
177
- 'pc_mean': pc_mean,
178
- 'pt_mean': pt_mean,
179
- 'win_rate_increase': win_rate_increase
180
- },
181
- 'trial_labels': self.trial_labels,
182
- 'num_trials': self.num_trials,
183
- 'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
184
- }
185
-
186
- def get_convergence_diagnostics(self):
187
- """獲取收斂診斷指標"""
188
 
189
- if self.trace is None:
190
- return None
191
-
192
- summary = az.summary(self.trace, var_names=['d', 'sigma', 'or_speed'])
193
 
194
- diagnostics = {
195
- 'r_hat': {
196
- 'd': summary.loc['d', 'r_hat'] if 'r_hat' in summary.columns else 1.0,
197
- 'sigma': summary.loc['sigma', 'r_hat'] if 'r_hat' in summary.columns else 1.0,
198
- 'or_speed': summary.loc['or_speed', 'r_hat'] if 'r_hat' in summary.columns else 1.0
199
- },
200
- 'ess_bulk': {
201
- 'd': summary.loc['d', 'ess_bulk'] if 'ess_bulk' in summary.columns else 2000,
202
- 'sigma': summary.loc['sigma', 'ess_bulk'] if 'ess_bulk' in summary.columns else 2000,
203
- 'or_speed': summary.loc['or_speed', 'ess_bulk'] if 'ess_bulk' in summary.columns else 2000
204
- }
205
- }
206
-
207
- return diagnostics
208
-
209
- def interpret_results(self):
210
- """解釋分析結果"""
211
 
212
- if self.results is None:
213
- return "尚未執行分析"
214
-
215
- stats = self.results['statistics']
216
 
217
- # 判斷速度效應顯著性
218
- if stats['d_hdi_lower'] > 0:
219
- significance = "顯著正向"
220
- direction = "速度快明顯提升勝率"
221
- elif stats['d_hdi_upper'] < 0:
222
- significance = "顯著負向"
223
- direction = "速度快反而降低勝率"
224
- else:
225
- significance = "不顯著"
226
- direction = "速度效應不明確"
227
-
228
- interpretation = f"""
229
- ### 🎯 整體結論
230
-
231
- **速度效應**: {significance} ({direction})
232
-
233
- - **對數勝率比 (d)**: {stats['d_mean']:.3f} (95% HDI: [{stats['d_hdi_lower']:.3f}, {stats['d_hdi_upper']:.3f}])
234
- - **勝率比 (OR)**: {stats['or_mean']:.3f} (95% HDI: [{stats['or_hdi_lower']:.3f}, {stats['or_hdi_upper']:.3f}])
235
- - **異質性 (σ)**: {stats['sigma_mean']:.3f}
236
-
237
- ### 📊 實際意義
238
-
239
- 速度快的寶可夢勝率約為速度慢的 **{stats['or_mean']:.2f} 倍**。
240
-
241
- 平均而言,速度快可使勝率提升約 **{stats['win_rate_increase'].mean():.1f} 個百分點**。
242
- """
243
 
244
- return interpretation
245
-
246
- def get_trial_specific_results(self):
247
- """獲取各屬性的詳細結果"""
 
 
 
 
248
 
249
- if self.results is None:
250
- return None
251
-
252
- stats = self.results['statistics']
 
 
253
 
254
- trial_results = []
255
- for i, trial in enumerate(self.trial_labels):
256
- trial_results.append({
257
- 'Trial_Type': trial,
258
- 'Control_Win_Rate': f"{stats['pc_mean'][i]:.1%}",
259
- 'Treatment_Win_Rate': f"{stats['pt_mean'][i]:.1%}",
260
- 'Win_Rate_Increase': f"{stats['win_rate_increase'][i]:+.1f}%",
261
- 'Effect_Size': self.results['delta_summary'].iloc[i]['mean']
262
- })
263
-
264
- return pd.DataFrame(trial_results)
 
 
 
 
 
 
 
 
 
1
+ import os
 
 
 
 
2
  import pymc as pm
3
  import numpy as np
4
  import pandas as pd
5
  import arviz as az
6
+ import matplotlib.pyplot as plt
7
+ import io
8
+ import base64
9
  from datetime import datetime
10
+ import threading
11
 
12
+ class BayesianHierarchicalAnalyzer:
 
13
  """
14
+ 階層模型分析器
15
+ 用於分析寶可夢速度對勝率的影響(按屬性分層)
16
  """
17
+
18
+ # 類別級的鎖,用於執行緒安全
19
+ _lock = threading.Lock()
20
+
21
+ # 儲存各 session 的分析結果
22
+ _session_results = {}
23
+
24
+ def __init__(self, session_id):
25
  """
26
  初始化分析器
27
+
28
  Args:
29
+ session_id: 唯一的 session 識別碼
 
 
 
 
 
30
  """
31
+ self.session_id = session_id
32
+ self.df = None
 
33
  self.model = None
34
  self.trace = None
35
+
36
+ def load_data(self, csv_path_or_df):
37
+ """
38
+ 載入資料
39
 
40
+ Args:
41
+ csv_path_or_df: CSV 檔案路徑或 DataFrame
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  """
43
+ if isinstance(csv_path_or_df, str):
44
+ self.df = pd.read_csv(csv_path_or_df)
45
+ else:
46
+ self.df = csv_path_or_df.copy()
47
+
48
+ # 驗證必要欄位
49
+ required_cols = ['Trial_Type', 'rc', 'nc', 'rt', 'nt']
50
+ missing_cols = [col for col in required_cols if col not in self.df.columns]
51
+
52
+ if missing_cols:
53
+ raise ValueError(f"資料缺少必要欄位: {missing_cols}")
54
+
55
+ def run_analysis(self, n_samples=2000, n_tune=1000, n_chains=1, target_accept=0.95, progress_callback=None):
56
+ """
57
+ 執行貝氏階層模型分析
58
+
59
  Args:
60
+ n_samples: MCMC 抽樣數
61
+ n_tune: 調整期樣本
62
+ n_chains: 鏈數
63
  target_accept: 目標接受率
64
+ progress_callback: 進度回調函數
65
+
66
  Returns:
67
+ dict: 包含所有分析結果的字典
68
  """
69
+ with self._lock:
70
+ try:
71
+ if self.df is None:
72
+ raise ValueError("請先載入資料")
73
+
74
+ if progress_callback:
75
+ progress_callback("建立貝氏模型...", 10)
76
+
77
+ # 準備資料
78
+ trial_labels = self.df['Trial_Type'].values
79
+ Num = len(self.df)
80
+
81
+ # 建立貝氏模型
82
+ with pm.Model() as model:
83
+ # 先驗分佈
84
+ d = pm.Normal('d', mu=0, sigma=10)
85
+ tau = pm.Gamma('tau', alpha=0.001, beta=0.001)
86
+ sigma = pm.Deterministic('sigma', 1 / pm.math.sqrt(tau))
87
+
88
+ # 各屬性特定效應
89
+ mu = pm.Normal('mu', mu=0, sigma=10, shape=Num)
90
+ delta = pm.Normal('delta', mu=d, sigma=1 / pm.math.sqrt(tau), shape=Num)
91
+
92
+ # 轉換與似然函數
93
+ pc = pm.Deterministic('pc', pm.math.invlogit(mu))
94
+ pt = pm.Deterministic('pt', pm.math.invlogit(mu + delta))
95
+ rc_obs = pm.Binomial('rc_obs', n=self.df['nc'].values, p=pc, observed=self.df['rc'].values)
96
+ rt_obs = pm.Binomial('rt_obs', n=self.df['nt'].values, p=pt, observed=self.df['rt'].values)
97
+
98
+ # 其他統計量
99
+ delta_new = pm.Normal('delta_new', mu=d, sigma=1 / pm.math.sqrt(tau))
100
+ or_speed = pm.Deterministic('or_speed', pm.math.exp(d))
101
+
102
+ # 生成 DAG 圖
103
+ if progress_callback:
104
+ progress_callback("生成 DAG 模型圖...", 20)
105
+
106
+ try:
107
+ dag_img = self._generate_dag(model)
108
+ except Exception as e:
109
+ print(f"DAG 生成失敗: {e}")
110
+ dag_img = None
111
+
112
+ # 執行 MCMC 抽樣
113
+ if progress_callback:
114
+ progress_callback("執行貝氏抽樣(這可能需要幾分鐘)...", 30)
115
+
116
+ trace = pm.sample(
117
+ n_samples,
118
+ tune=n_tune,
119
+ chains=n_chains,
120
+ target_accept=target_accept,
121
+ return_inferencedata=True,
122
+ progressbar=False
123
+ )
124
+
125
+ self.model = model
126
+ self.trace = trace
127
+
128
+ if progress_callback:
129
+ progress_callback("生成統計摘要...", 60)
130
+
131
+ # 生成文字摘要
132
+ summary = az.summary(trace, var_names=['d', 'sigma', 'or_speed'], hdi_prob=0.95)
133
+ summary_text = self._format_summary(summary)
134
+
135
+ if progress_callback:
136
+ progress_callback("生成視覺化圖表...", 70)
137
+
138
+ # 生成圖表
139
+ trace_plot = self._generate_trace_plot(trace)
140
+ posterior_plot = self._generate_posterior_plot(trace)
141
+ forest_plot = self._generate_forest_plot(trace, trial_labels, Num)
142
+
143
+ if progress_callback:
144
+ progress_callback("整理結果...", 90)
145
+
146
+ # 整理結果
147
+ results = {
148
+ 'trial_labels': trial_labels.tolist(),
149
+ 'n_trials': Num,
150
+ 'summary_table': summary.to_dict(),
151
+ 'summary_text': summary_text,
152
+ 'd_mean': float(summary.loc['d', 'mean']),
153
+ 'd_sd': float(summary.loc['d', 'sd']),
154
+ 'd_hdi_lower': float(summary.loc['d', 'hdi_2.5%']),
155
+ 'd_hdi_upper': float(summary.loc['d', 'hdi_97.5%']),
156
+ 'sigma_mean': float(summary.loc['sigma', 'mean']),
157
+ 'sigma_sd': float(summary.loc['sigma', 'sd']),
158
+ 'or_speed_mean': float(summary.loc['or_speed', 'mean']),
159
+ 'or_speed_sd': float(summary.loc['or_speed', 'sd']),
160
+ 'or_speed_hdi_lower': float(summary.loc['or_speed', 'hdi_2.5%']),
161
+ 'or_speed_hdi_upper': float(summary.loc['or_speed', 'hdi_97.5%']),
162
+ 'is_significant': summary.loc['d', 'hdi_2.5%'] > 0 or summary.loc['d', 'hdi_97.5%'] < 0,
163
+ 'dag_plot': dag_img,
164
+ 'trace_plot': trace_plot,
165
+ 'posterior_plot': posterior_plot,
166
+ 'forest_plot': forest_plot,
167
+ 'timestamp': datetime.now().isoformat(),
168
+ 'sampling_params': {
169
+ 'n_samples': n_samples,
170
+ 'n_tune': n_tune,
171
+ 'n_chains': n_chains,
172
+ 'target_accept': target_accept
173
+ }
174
+ }
175
+
176
+ # 添加各屬性的詳細結果
177
+ delta_summary = az.summary(trace, var_names=['delta'], hdi_prob=0.95)
178
+ results['delta_results'] = []
179
+ for i, trial_type in enumerate(trial_labels):
180
+ results['delta_results'].append({
181
+ 'trial_type': trial_type,
182
+ 'delta_mean': float(delta_summary.iloc[i]['mean']),
183
+ 'delta_sd': float(delta_summary.iloc[i]['sd']),
184
+ 'delta_hdi_lower': float(delta_summary.iloc[i]['hdi_2.5%']),
185
+ 'delta_hdi_upper': float(delta_summary.iloc[i]['hdi_97.5%']),
186
+ 'is_significant': delta_summary.iloc[i]['hdi_2.5%'] > 0 or delta_summary.iloc[i]['hdi_97.5%'] < 0
187
+ })
188
+
189
+ # 儲存到 session results
190
+ self._session_results[self.session_id] = results
191
+
192
+ if progress_callback:
193
+ progress_callback("分析完成!", 100)
194
+
195
+ return results
196
+
197
+ except Exception as e:
198
+ raise Exception(f"分析失敗: {str(e)}")
199
+
200
+ def _generate_dag(self, model):
201
+ """生成 DAG 圖"""
202
+ try:
203
+ gv = pm.model_to_graphviz(model)
204
+ # 轉換為 PNG 圖片的 base64
205
+ png_data = gv.pipe(format='png')
206
+ return base64.b64encode(png_data).decode()
207
+ except Exception as e:
208
+ print(f"DAG 生成失敗: {e}")
209
+ return None
210
+
211
+ def _generate_trace_plot(self, trace):
212
+ """生成 Trace Plot"""
213
+ fig, axes = plt.subplots(2, 2, figsize=(14, 8))
214
+ az.plot_trace(trace, var_names=['d', 'sigma'], axes=axes)
215
+ plt.tight_layout()
216
 
217
+ # 轉換為 base64
218
+ buf = io.BytesIO()
219
+ plt.savefig(buf, format='png', dpi=150, bbox_inches='tight')
220
+ buf.seek(0)
221
+ img_base64 = base64.b64encode(buf.read()).decode()
222
+ plt.close()
223
 
224
+ return img_base64
225
+
226
+ def _generate_posterior_plot(self, trace):
227
+ """生成 Posterior Plot"""
228
+ az.plot_posterior(trace, var_names=['d', 'sigma', 'or_speed'], hdi_prob=0.95)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
 
230
+ # 轉換為 base64
231
+ buf = io.BytesIO()
232
+ plt.savefig(buf, format='png', dpi=150, bbox_inches='tight')
233
+ buf.seek(0)
234
+ img_base64 = base64.b64encode(buf.read()).decode()
235
+ plt.close()
236
 
237
+ return img_base64
238
+
239
+ def _generate_forest_plot(self, trace, trial_labels, Num):
240
+ """生成 Forest Plot"""
241
+ delta_posterior = trace.posterior['delta'].values.reshape(-1, Num)
242
+ delta_mean = delta_posterior.mean(axis=0)
243
+ delta_hdi = az.hdi(trace, var_names=['delta'], hdi_prob=0.95)['delta'].values
244
 
245
+ fig, ax = plt.subplots(figsize=(12, max(10, Num * 0.4)))
246
+ y_pos = np.arange(Num)
247
 
248
+ # 繪製信賴區間
249
+ ax.hlines(y_pos, delta_hdi[:, 0], delta_hdi[:, 1], color='steelblue', linewidth=3)
250
+ # 繪製平均值
251
+ ax.scatter(delta_mean, y_pos, color='darkblue', s=120, zorder=3, edgecolors='white', linewidth=1.5)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
252
 
253
+ # 標註顯著的屬性
254
+ for i, (mean, hdi) in enumerate(zip(delta_mean, delta_hdi)):
255
+ if hdi[0] > 0: # 顯著正效應
256
+ ax.text(mean + 0.05, i, '★', fontsize=15, ha='left', color='gold', va='center')
257
 
258
+ # 設定軸
259
+ ax.set_yticks(y_pos)
260
+ ax.set_yticklabels(trial_labels, fontsize=11)
261
+ ax.invert_yaxis()
262
+ ax.axvline(0, color='red', linestyle='--', linewidth=2, label='No Effect (δ=0)')
263
+ ax.set_xlabel('Delta (Log Odds Ratio)', fontsize=13)
264
+ ax.set_title('Effect of Speed on Win Rate by Type', fontsize=15, fontweight='bold', pad=20)
265
+ ax.legend(loc='lower right')
266
+ ax.grid(axis='x', alpha=0.3)
 
 
 
 
 
 
 
 
267
 
268
+ plt.tight_layout()
 
 
 
269
 
270
+ # 轉換為 base64
271
+ buf = io.BytesIO()
272
+ plt.savefig(buf, format='png', dpi=150, bbox_inches='tight')
273
+ buf.seek(0)
274
+ img_base64 = base64.b64encode(buf.read()).decode()
275
+ plt.close()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
276
 
277
+ return img_base64
278
+
279
+ def _format_summary(self, summary):
280
+ """格式化摘要表格為文字"""
281
+ text = "="*70 + "\n"
282
+ text += "貝氏階層模型分析結果摘要\n"
283
+ text += "Bayesian Hierarchical Model Analysis Summary\n"
284
+ text += "="*70 + "\n\n"
285
 
286
+ for var in ['d', 'sigma', 'or_speed']:
287
+ row = summary.loc[var]
288
+ text += f"{var:12} | "
289
+ text += f"Mean: {row['mean']:7.4f} | "
290
+ text += f"SD: {row['sd']:7.4f} | "
291
+ text += f"95% HDI: [{row['hdi_2.5%']:7.4f}, {row['hdi_97.5%']:7.4f}]\n"
292
 
293
+ text += "\n" + "="*70 + "\n"
294
+ text += "參數說明 (Parameter Descriptions):\n"
295
+ text += " d : 整體平均效應 (Overall mean effect)\n"
296
+ text += " sigma : 屬性間變異 (Between-type variability)\n"
297
+ text += " or_speed : 速度勝算比 (Speed odds ratio = exp(d))\n"
298
+ text += "="*70 + "\n"
299
+
300
+ return text
301
+
302
+ @classmethod
303
+ def get_session_results(cls, session_id):
304
+ """獲取特定 session 的結果"""
305
+ return cls._session_results.get(session_id)
306
+
307
+ @classmethod
308
+ def clear_session_results(cls, session_id):
309
+ """清除特定 session 的結果"""
310
+ if session_id in cls._session_results:
311
+ del cls._session_results[session_id]
bayesian_llm_assistant.py ADDED
@@ -0,0 +1,362 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import google.generativeai as genai
2
+
3
+ class BayesianLLMAssistant:
4
+ """
5
+ 貝氏階層模型 LLM 問答助手
6
+ 協助用戶理解貝氏分析結果
7
+ """
8
+
9
+ def __init__(self, api_key, session_id):
10
+ """
11
+ 初始化 LLM 助手
12
+
13
+ Args:
14
+ api_key: Google Gemini API key
15
+ session_id: 唯一的 session 識別碼
16
+ """
17
+ genai.configure(api_key=api_key)
18
+ self.model = genai.GenerativeModel('gemini-2.0-flash-exp')
19
+ self.session_id = session_id
20
+ self.conversation_history = []
21
+
22
+ # 系統提示詞(雙語版)
23
+ self.system_prompt = """You are an expert Bayesian statistician specializing in hierarchical models and meta-analysis, particularly in the context of Pokémon battle statistics.
24
+
25
+ **IMPORTANT - Language Instruction:**
26
+ - Always respond in the SAME language as the user's question
27
+ - If user asks in Traditional Chinese (繁體中文), respond in Traditional Chinese
28
+ - If user asks in English, respond in English
29
+ - Maintain language consistency throughout the conversation
30
+
31
+ 你是一位精通貝氏統計和階層模型的專家,特別專注於寶可夢速度對戰分析。
32
+
33
+ Your role is to help users understand Bayesian hierarchical model results for analyzing how Speed affects win rates across different Pokémon types.
34
+ 你的角色是幫助使用者理解貝氏階層模型的結果,分析速度如何影響不同屬性寶可夢的勝率。
35
+
36
+ You should:
37
+ 1. Explain Bayesian concepts in simple, accessible terms (prior, posterior, credible intervals)
38
+ 2. Interpret hierarchical modeling and why it's useful (borrowing strength, shrinkage)
39
+ 3. Explain what parameters mean (d, delta, sigma, tau)
40
+ 4. Discuss posterior distributions and HDI (Highest Density Interval)
41
+ 5. Help users understand convergence diagnostics (trace plots, R-hat)
42
+ 6. Explain the difference between Bayesian and frequentist approaches
43
+ 7. Provide battle strategy insights based on posterior estimates
44
+ 8. Discuss uncertainty quantification and practical significance
45
+
46
+ 你應該:
47
+ 1. 用簡單易懂的方式解釋貝氏概念(先驗、後驗、可信區間)
48
+ 2. 詮釋階層模型及其優勢(資訊借用、收縮效應)
49
+ 3. 解釋參數的意義(d、delta、sigma、tau)
50
+ 4. 討論後驗分佈和 HDI(最高密度區間)
51
+ 5. 幫助使用者理解收斂診斷(trace plot、R-hat)
52
+ 6. 解釋貝氏與頻率論方法的差異
53
+ 7. 根據後驗估計提供對戰策略見解
54
+ 8. 討論不確定性量化和實際顯著性
55
+
56
+ Key concepts to explain when relevant:
57
+ 重要概念解釋(當相關時):
58
+
59
+ **Bayesian Framework | 貝氏框架:**
60
+ - **Prior**: Initial belief before seeing data | 先驗:觀察資料前的初始信念
61
+ - **Likelihood**: Probability of data given parameters | 似然:給定參數下資料的機率
62
+ - **Posterior**: Updated belief after seeing data | 後驗:觀察資料後更新的信念
63
+ - **HDI**: 95% highest density interval (Bayesian CI) | HDI:95% 最高密度區間(貝氏信賴區間)
64
+
65
+ **Hierarchical Model Parameters | 階層模型參數:**
66
+ - **d**: Overall mean effect across all types | d:所有屬性的整體平均效應
67
+ - **delta[i]**: Type-specific effect for type i | delta[i]:第 i 個屬性的特定效應
68
+ - **sigma**: Between-type variability | sigma:屬性間的變異性
69
+ - **tau**: Precision parameter (1/sigma²) | tau:精確度參數(1/sigma²)
70
+ - **or_speed**: Odds ratio = exp(d) | or_speed:勝算比 = exp(d)
71
+
72
+ **Model Advantages | 模型優勢:**
73
+ - Borrows information across types (partial pooling) | 跨屬性資訊借用(部分池化)
74
+ - Quantifies uncertainty properly | 正確量化不確定性
75
+ - Shrinks unreliable estimates toward overall mean | 將不可靠估計收縮至整體平均
76
+ - Handles small sample sizes better | 更好處理小樣本
77
+
78
+ **Interpretation Guidelines | 解讀指引:**
79
+ - HDI not crossing 0 → significant effect | HDI 不跨越 0 → 效應顯著
80
+ - or_speed > 1 → faster Pokémon more likely to win | or_speed > 1 → 速度快的更容易獲勝
81
+ - Large sigma → high variability between types | sigma 大 → 屬性間差異大
82
+ - Trace plots should look like "hairy caterpillar" | Trace 圖應該像「毛毛蟲」
83
+
84
+ When discussing Pokémon battles:
85
+ 討論寶可夢對戰時:
86
+ - Explain why Speed matters (turn order, priority moves) | 解釋速度的重要性(回合順序、先制技能)
87
+ - Connect type-specific effects to battle mechanics | 將屬性特定效應連結到對戰機制
88
+ - Discuss practical implications for team building | 討論組隊的實際意涵
89
+ - Consider exceptions (Trick Room, priority moves) | 考慮例外情況(戲法空間、先制招式)
90
+
91
+ Always be clear, educational, and engaging. Use examples when helpful.
92
+ Format responses with proper markdown for better readability.
93
+
94
+ 請務必清晰、具教育性、引人入勝。適時使用範例說明。使用適當的 Markdown 格式以提升可讀性。"""
95
+
96
+ def get_response(self, user_message, analysis_results=None):
97
+ """
98
+ 獲取 AI 回應
99
+
100
+ Args:
101
+ user_message: 用戶訊息
102
+ analysis_results: 分析結果字典(可選)
103
+
104
+ Returns:
105
+ str: AI 回應
106
+ """
107
+ # 準備上下文資訊
108
+ context = ""
109
+ if analysis_results:
110
+ context = self._prepare_context(analysis_results)
111
+
112
+ # 添加用戶訊息到歷史
113
+ self.conversation_history.append({
114
+ "role": "user",
115
+ "content": user_message
116
+ })
117
+
118
+ try:
119
+ # 構建完整的提示詞
120
+ full_prompt = self.system_prompt
121
+
122
+ if context:
123
+ full_prompt += f"\n\n## Current Analysis Context:\n{context}"
124
+
125
+ # 構建對話歷史文字
126
+ conversation_text = "\n\n## Conversation History:\n"
127
+ for msg in self.conversation_history[:-1]:
128
+ role = "User" if msg["role"] == "user" else "Assistant"
129
+ conversation_text += f"\n{role}: {msg['content']}\n"
130
+
131
+ # 組合最終提示詞
132
+ final_prompt = full_prompt + conversation_text + f"\nUser: {user_message}\n\nAssistant:"
133
+
134
+ # 調用 Gemini API
135
+ response = self.model.generate_content(
136
+ final_prompt,
137
+ generation_config=genai.types.GenerationConfig(
138
+ temperature=1.0,
139
+ max_output_tokens=4000,
140
+ )
141
+ )
142
+
143
+ assistant_message = response.text
144
+
145
+ # 添加助手回應到歷史
146
+ self.conversation_history.append({
147
+ "role": "assistant",
148
+ "content": assistant_message
149
+ })
150
+
151
+ return assistant_message
152
+
153
+ except Exception as e:
154
+ return f"❌ Error: {str(e)}\n\nPlease check your API key and try again."
155
+
156
+ def _prepare_context(self, results):
157
+ """準備分析結果的上下文資訊"""
158
+
159
+ if not results:
160
+ return "目前尚無分析結果。No analysis results available yet."
161
+
162
+ # 判斷效應方向
163
+ if results['d_mean'] > 0:
164
+ effect_direction = "faster Pokémon have HIGHER win rates | 速度快的寶可夢有更高的勝率"
165
+ else:
166
+ effect_direction = "slower Pokémon have HIGHER win rates | 速度慢的寶可夢有更高的勝率"
167
+
168
+ # 判斷顯著性
169
+ if results['is_significant']:
170
+ significance = "YES - The effect is significant | 是 - 效應顯著"
171
+ else:
172
+ significance = "NO - The effect is not significant | 否 - 效應不顯著"
173
+
174
+ context = f"""
175
+ ## Current Bayesian Hierarchical Model Analysis | 目前的貝氏階層模型分析
176
+
177
+ ### Dataset Information | 資料集資訊
178
+ - Number of Pokémon Types Analyzed | 分析的屬性數量: {results['n_trials']}
179
+ - Types | 屬性: {', '.join(results['trial_labels'])}
180
+
181
+ ### Overall Effect (All Types Combined) | 整體效應(所有屬性合併)
182
+
183
+ **d (Overall Mean Effect | 整體平均效應):**
184
+ - Mean | 平均值: {results['d_mean']:.4f}
185
+ - SD | 標準差: {results['d_sd']:.4f}
186
+ - 95% HDI | 95% 最高密度區間: [{results['d_hdi_lower']:.4f}, {results['d_hdi_upper']:.4f}]
187
+ - **Interpretation | 解讀**: {effect_direction}
188
+ - **Is Significant? | 是否顯著?**: {significance}
189
+
190
+ **sigma (Between-Type Variability | 屬性間變異):**
191
+ - Mean | 平均值: {results['sigma_mean']:.4f}
192
+ - SD | 標準差: {results['sigma_sd']:.4f}
193
+ - **Interpretation | 解讀**: {"High variability between types | 屬性間差異大" if results['sigma_mean'] > 0.5 else "Moderate variability between types | 屬性間差異中等" if results['sigma_mean'] > 0.2 else "Low variability between types | 屬性間差異小"}
194
+
195
+ **or_speed (Speed Odds Ratio | 速度勝算比):**
196
+ - Mean | 平均值: {results['or_speed_mean']:.4f}
197
+ - SD | 標準差: {results['or_speed_sd']:.4f}
198
+ - 95% HDI | 95% 最高密度區間: [{results['or_speed_hdi_lower']:.4f}, {results['or_speed_hdi_upper']:.4f}]
199
+ - **Interpretation | 解讀**: {
200
+ f"Faster Pokémon are {results['or_speed_mean']:.2f} times more likely to win | 速度快的寶可夢獲勝機率是慢的 {results['or_speed_mean']:.2f} 倍"
201
+ if results['or_speed_mean'] > 1
202
+ else f"Slower Pokémon are {1/results['or_speed_mean']:.2f} times more likely to win | 速度慢的寶可夢獲勝機率是快的 {1/results['or_speed_mean']:.2f} 倍"
203
+ }
204
+
205
+ ### Type-Specific Effects | 屬性特定效應
206
+
207
+ """
208
+
209
+ # 添加各屬性的詳細結果
210
+ for delta_result in results['delta_results']:
211
+ significant_marker = "★" if delta_result['is_significant'] else " "
212
+ context += f"\n**{delta_result['trial_type']} {significant_marker}:**\n"
213
+ context += f" - Delta Mean | 平均效應: {delta_result['delta_mean']:.4f}\n"
214
+ context += f" - 95% HDI: [{delta_result['delta_hdi_lower']:.4f}, {delta_result['delta_hdi_upper']:.4f}]\n"
215
+ context += f" - Significant? | 顯著?: {'Yes 是' if delta_result['is_significant'] else 'No 否'}\n"
216
+
217
+ context += f"""
218
+ ### Model Fitting Information | 模型擬合資訊
219
+ - Samples | 樣本數: {results['sampling_params']['n_samples']}
220
+ - Tuning samples | 調整樣本數: {results['sampling_params']['n_tune']}
221
+ - Chains | 鏈數: {results['sampling_params']['n_chains']}
222
+ - Target accept rate | 目標接受率: {results['sampling_params']['target_accept']}
223
+
224
+ ### Key Insights | 關鍵洞察
225
+ 1. **Overall Pattern | 整體模式**: {effect_direction}
226
+ 2. **Heterogeneity | 異質性**: {"Different types show different responses to speed" if results['sigma_mean'] > 0.3 else "Types respond similarly to speed"}
227
+ 3. **Significant Types | 顯著屬性**: {sum(1 for dr in results['delta_results'] if dr['is_significant'])} out of {results['n_trials']} types show significant speed effects
228
+ """
229
+
230
+ return context
231
+
232
+ def generate_summary(self, analysis_results):
233
+ """自動生成分析結果總結"""
234
+
235
+ summary_prompt = """請根據提供的貝氏階層模型分析結果生成一份完整的總結報告,包含:
236
+
237
+ 1. **分析目的**:這個模型在研究什麼?
238
+ 2. **整體發現**:
239
+ - 速度對勝率的整體影響(d 參數)
240
+ - 是否具有統計顯著性?
241
+ - 勝算比告訴我們什麼?
242
+ 3. **屬性間差異**:
243
+ - sigma 參數顯示什麼?
244
+ - 哪些屬性對速度特別敏感?
245
+ - 哪些屬性例外?
246
+ 4. **對戰意涵**:這對實戰有什麼啟示?
247
+ 5. **建議**:訓練師該如何運用這些資訊?
248
+
249
+ 請用清楚的繁體中文 Markdown 格式撰寫,包含適當的章節標題。"""
250
+
251
+ return self.get_response(summary_prompt, analysis_results)
252
+
253
+ def explain_bayesian_concepts(self):
254
+ """解釋貝氏統計基本概念"""
255
+
256
+ explain_prompt = """請用簡單的方式解釋貝氏統計,特別是在這個寶可夢速度分析的情境下。
257
+
258
+ 請涵蓋:
259
+ 1. 什麼是貝氏統計?與傳統統計有何不同?
260
+ 2. 什麼是先驗、似然、後驗?
261
+ 3. 什麼是 HDI(最高密度區間)?與信賴區間有何不同?
262
+ 4. 為什麼用貝氏方法分析這個問題?
263
+ 5. 如何解讀後驗分佈?
264
+
265
+ 請用寶可夢的實際例子讓說明更具體易懂,全程使用繁體中文。"""
266
+
267
+ return self.get_response(explain_prompt, None)
268
+
269
+ def explain_hierarchical_model(self):
270
+ """解釋階層模型的概念"""
271
+
272
+ explain_prompt = """請解釋什麼是階層模型(Hierarchical Model),以及為什麼用它來分析不同屬性的寶可夢。
273
+
274
+ 請涵蓋:
275
+ 1. 什麼是階層結構?
276
+ 2. 什麼是「資訊借用」(borrowing strength)?
277
+ 3. 什麼是「收縮效應」(shrinkage)?為什麼這很重要?
278
+ 4. 在這個分析中,階層模型如何幫助我們?
279
+ 5. d、delta、sigma 參數分別代表什麼?
280
+
281
+ 請用具體的寶可夢例子說明,使用繁體中文。"""
282
+
283
+ return self.get_response(explain_prompt, None)
284
+
285
+ def explain_convergence(self):
286
+ """解釋收斂診斷"""
287
+
288
+ explain_prompt = """請解釋如何判斷 MCMC 抽樣是否收斂,以及 Trace Plot 該如何解讀。
289
+
290
+ 請涵蓋:
291
+ 1. 什麼是 MCMC 抽樣?
292
+ 2. 什麼是收斂?為什麼重要?
293
+ 3. Trace Plot 該如何解讀?
294
+ 4. 什麼是「毛毛蟲圖」?
295
+ 5. 如果沒有收斂會怎樣?
296
+
297
+ 請用簡單的語言解釋,使用繁體中文。"""
298
+
299
+ return self.get_response(explain_prompt, None)
300
+
301
+ def compare_types(self, analysis_results):
302
+ """比較不同屬性"""
303
+
304
+ compare_prompt = """根據各屬性的 delta 值,請分析哪些寶可夢屬性對速度最敏感,哪些最不敏感。
305
+
306
+ 請提供:
307
+ 1. 速度效應最大的前 5 個屬性
308
+ 2. 速度效應最小的前 5 個屬性
309
+ 3. 可能的原因(從對戰機制角度)
310
+ 4. 組隊建議
311
+
312
+ 請用繁體中文回答。"""
313
+
314
+ return self.get_response(compare_prompt, analysis_results)
315
+
316
+ def battle_strategy_advice(self, analysis_results):
317
+ """提供對戰策略建議"""
318
+
319
+ strategy_prompt = """根據這個貝氏階層模型的分析結果,請為寶可夢訓練師提供實際的對戰策略建議。
320
+
321
+ 請考慮:
322
+ 1. 在組建隊伍時應該多重視速度?
323
+ 2. 哪些屬性的寶可夢特別需要速度?
324
+ 3. 哪些屬性可以犧牲速度換取其他能力?
325
+ 4. 有什麼例外情況(如戲法空間隊伍)?
326
+ 5. 對競技對戰的影響?
327
+
328
+ 請具體且可操作,使用繁體中文回答。"""
329
+
330
+ return self.get_response(strategy_prompt, analysis_results)
331
+
332
+ def explain_metric(self, metric_name, analysis_results):
333
+ """解釋特定指標"""
334
+
335
+ metric_explanations = {
336
+ 'd': 'Overall Mean Effect (d) | 整體平均��應',
337
+ 'sigma': 'Between-Type Variability (sigma) | 屬性間變異',
338
+ 'or_speed': 'Speed Odds Ratio (or_speed) | 速度勝算比',
339
+ 'delta': 'Type-Specific Effects (delta) | 屬性特定效應',
340
+ 'hdi': '95% HDI (Highest Density Interval) | 95% 最高密度區間'
341
+ }
342
+
343
+ metric_display = metric_explanations.get(metric_name, metric_name)
344
+
345
+ explain_prompt = f"""請在這次貝氏階層模型分析的脈絡下,解釋以下指標:
346
+
347
+ 指標:{metric_display}
348
+
349
+ 請包含:
350
+ 1. 這個指標一般來說測量什麼?
351
+ 2. 在本次分析中得到的數值是多少?
352
+ 3. 如何從寶可夢對戰的角度詮釋這個數值?
353
+ 4. 這告訴我們速度的重要性如何?
354
+ 5. 有什麼需要注意的限制或注意事項?
355
+
356
+ 請用繁體中文回答。"""
357
+
358
+ return self.get_response(explain_prompt, analysis_results)
359
+
360
+ def reset_conversation(self):
361
+ """重置對話歷史"""
362
+ self.conversation_history = []
bayesian_requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ streamlit==1.31.0
2
+ pandas==2.1.4
3
+ numpy==1.26.3
4
+ pymc==5.10.0
5
+ arviz==0.17.0
6
+ matplotlib==3.8.2
7
+ google-generativeai>=0.3.0
8
+ graphviz
pokemon_speed_meta_results.csv ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Trial_Type,rt,nt,rc,nc
2
+ Bug,2229,3142,800,3660
3
+ Dark,1559,2083,369,931
4
+ Drago,1264,1715,298,889
5
+ Elect,1935,2499,373,1174
6
+ Fairy,310,432,309,1320
7
+ Fight,800,1134,402,1458
8
+ Fire,2547,3530,487,1535
9
+ Flyin,102,107,39,110
10
+ Ghost,639,937,331,1259
11
+ Grass,1591,2196,1418,4598
12
+ Groun,1100,1529,529,1574
13
+ Ice,826,1288,354,1296
14
+ Norma,4258,5748,1107,3989
15
+ Poiso,997,1571,431,1411
16
+ Psych,2002,2747,334,1926
17
+ Rock,864,1255,998,3392
18
+ Steel,609,804,428,1584
19
+ Water,3601,5492,1814,5793