Spaces:
Sleeping
Sleeping
Commit
·
740e5d3
0
Parent(s):
docs: add project README and MIT license section
Browse files- .gitignore +83 -0
- CODE_OF_CONDUCT.md +18 -0
- CONTRIBUTING.md +15 -0
- LICENSE +21 -0
- README.md +81 -0
- STYLEGUIDE.md +22 -0
- app.py +155 -0
- requirements.txt +5 -0
.gitignore
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Byte-compiled / optimized / DLL files
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.py[cod]
|
| 4 |
+
*$py.class
|
| 5 |
+
|
| 6 |
+
# C extensions
|
| 7 |
+
*.so
|
| 8 |
+
|
| 9 |
+
# Distribution / packaging
|
| 10 |
+
.Python
|
| 11 |
+
build/
|
| 12 |
+
develop-eggs/
|
| 13 |
+
dist/
|
| 14 |
+
downloads/
|
| 15 |
+
eggs/
|
| 16 |
+
.eggs/
|
| 17 |
+
lib/
|
| 18 |
+
lib64/
|
| 19 |
+
parts/
|
| 20 |
+
sdist/
|
| 21 |
+
var/
|
| 22 |
+
wheels/
|
| 23 |
+
share/python-wheels/
|
| 24 |
+
*.egg-info/
|
| 25 |
+
.installed.cfg
|
| 26 |
+
*.egg
|
| 27 |
+
MANIFEST
|
| 28 |
+
|
| 29 |
+
# Virtual environments
|
| 30 |
+
.env
|
| 31 |
+
.venv
|
| 32 |
+
env/
|
| 33 |
+
venv/
|
| 34 |
+
ENV/
|
| 35 |
+
env.bak/
|
| 36 |
+
venv.bak/
|
| 37 |
+
|
| 38 |
+
# PyInstaller
|
| 39 |
+
*.manifest
|
| 40 |
+
*.spec
|
| 41 |
+
|
| 42 |
+
# Installer logs
|
| 43 |
+
pip-log.txt
|
| 44 |
+
pip-delete-this-directory.txt
|
| 45 |
+
|
| 46 |
+
# Unit test / coverage reports
|
| 47 |
+
htmlcov/
|
| 48 |
+
.tox/
|
| 49 |
+
.nox/
|
| 50 |
+
.coverage
|
| 51 |
+
.coverage.*
|
| 52 |
+
.cache
|
| 53 |
+
nosetests.xml
|
| 54 |
+
coverage.xml
|
| 55 |
+
*.cover
|
| 56 |
+
*.py,cover
|
| 57 |
+
.hypothesis/
|
| 58 |
+
.pytest_cache/
|
| 59 |
+
cover/
|
| 60 |
+
|
| 61 |
+
# Jupyter Notebook
|
| 62 |
+
.ipynb_checkpoints
|
| 63 |
+
|
| 64 |
+
# PyCharm / VS Code
|
| 65 |
+
.idea/
|
| 66 |
+
.vscode/
|
| 67 |
+
*.swp
|
| 68 |
+
|
| 69 |
+
# mypy / pytype / pyright
|
| 70 |
+
.mypy_cache/
|
| 71 |
+
.dmypy.json
|
| 72 |
+
dmypy.json
|
| 73 |
+
.pytype/
|
| 74 |
+
.pyright/
|
| 75 |
+
|
| 76 |
+
# Cython debug symbols
|
| 77 |
+
cython_debug/
|
| 78 |
+
|
| 79 |
+
# Local configs
|
| 80 |
+
*.env
|
| 81 |
+
*.local
|
| 82 |
+
|
| 83 |
+
install_ngork.sh
|
CODE_OF_CONDUCT.md
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CODE OF CONDUCT
|
| 2 |
+
|
| 3 |
+
本專案遵循 Contributor Covenant 行為準則(簡化版)。
|
| 4 |
+
|
| 5 |
+
## 我們的承諾
|
| 6 |
+
- 以尊重、包容的態度互動。
|
| 7 |
+
- 建設性地提出意見,避免人身攻擊。
|
| 8 |
+
- 保持社群安全、友善。
|
| 9 |
+
|
| 10 |
+
## 不允許的行為
|
| 11 |
+
- 歧視、騷擾、攻擊性言語或行為。
|
| 12 |
+
- 惡意散播錯誤資訊或破壞協作。
|
| 13 |
+
|
| 14 |
+
## 責任
|
| 15 |
+
專案維護者有權移除或拒絕不當的貢獻,並在必要時禁止違規者參與。
|
| 16 |
+
|
| 17 |
+
## 聯絡
|
| 18 |
+
若有任何疑慮,請透過 **Twinkle AI 社群官方管道** 聯繫管理員。
|
CONTRIBUTING.md
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CONTRIBUTING
|
| 2 |
+
|
| 3 |
+
感謝你願意貢獻本專案!
|
| 4 |
+
|
| 5 |
+
## 基本流程
|
| 6 |
+
1. Fork 專案並建立分支(例:`feat/add-lesson-xyz`)。
|
| 7 |
+
2. 在 `courses/` 下建立新課程資料夾(格式:`YYYY-MM-course-slug/`)。
|
| 8 |
+
3. 課程至少包含一個 `README.md` 與一組 `.ipynb`(notebook-first,完整可執行)。
|
| 9 |
+
4. 確認 notebook 能在 **Google Colab** 從零執行完成。
|
| 10 |
+
5. 發送 Pull Request,簡述修改重點與測試方式。
|
| 11 |
+
|
| 12 |
+
## 注意事項
|
| 13 |
+
- 優先確保 Notebook 內教學步驟完整,避免把程式抽到外部模組。
|
| 14 |
+
- 資料需避免個資與敏感資訊。
|
| 15 |
+
- 推薦在 PR 中附上 Colab 測試連結。
|
LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
MIT License
|
| 2 |
+
|
| 3 |
+
Copyright (c) 2020 Hugging Face
|
| 4 |
+
|
| 5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
+
of this software and associated documentation files (the "Software"), to deal
|
| 7 |
+
in the Software without restriction, including without limitation the rights
|
| 8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
| 9 |
+
copies of the Software, and to permit persons to whom the Software is
|
| 10 |
+
furnished to do so, subject to the following conditions:
|
| 11 |
+
|
| 12 |
+
The above copyright notice and this permission notice shall be included in all
|
| 13 |
+
copies or substantial portions of the Software.
|
| 14 |
+
|
| 15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
| 16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
| 17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
| 18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
| 19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
| 20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
| 21 |
+
SOFTWARE.
|
README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🌟 Eval Analyzer
|
| 2 |
+
|
| 3 |
+
一個基於 🎈 **Streamlit** 的互動式工具,用來分析 **[Twinkle Eval](https://github.com/ai-twinkle/Eval)** 格式的評估檔案(`.json` / `.jsonl`)。
|
| 4 |
+
|
| 5 |
+
## 📌 功能特色
|
| 6 |
+
|
| 7 |
+
<p align="center">
|
| 8 |
+
<img src="https://github.com/ai-twinkle/llm-lab/blob/main/courses/2025-0827-llm-eval-with-twinkle/assets/gpt-oss-120b-mmlu-eval-report.png?raw=1" width="100%"/><br/>
|
| 9 |
+
<em>圖:gpt-oss-120b 在 MMLU 部分子集上的表現成績預覽</em>
|
| 10 |
+
</p>
|
| 11 |
+
|
| 12 |
+
- 支援上傳多個 **Twinkle Eval 檔案**(`json` / `jsonl`)。
|
| 13 |
+
- 自動解析評估結果,抽取:
|
| 14 |
+
- `dataset`
|
| 15 |
+
- `category`
|
| 16 |
+
- `file`
|
| 17 |
+
- `accuracy_mean`
|
| 18 |
+
- `source_label`(模型名稱 + timestamp)
|
| 19 |
+
- 提供整體平均值的計算,缺漏時自動補足。
|
| 20 |
+
- 視覺化:
|
| 21 |
+
- 各類別的柱狀圖(依模型分組對照)。
|
| 22 |
+
- 可選擇排序方式(平均由高→低、平均由低→高、字母排序)。
|
| 23 |
+
- 支援分頁顯示(自訂每頁顯示類別數量)。
|
| 24 |
+
- 指標可切換為原始值或 0–100 比例。
|
| 25 |
+
- 支援 **CSV 匯出**(下載分頁結果)。
|
| 26 |
+
|
| 27 |
+
## 🚀 使用方式
|
| 28 |
+
|
| 29 |
+
### 1. 安裝環境
|
| 30 |
+
建議使用虛擬環境(如 `venv` 或 `conda`):
|
| 31 |
+
|
| 32 |
+
```bash
|
| 33 |
+
pip install -r requirements.txt
|
| 34 |
+
```
|
| 35 |
+
### 2. 啟動應用程式
|
| 36 |
+
```bash
|
| 37 |
+
streamlit run app.py
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
### 3. 操作流程
|
| 41 |
+
1. 在左側 Sidebar 上傳一個或多個 **Twinkle Eval 檔案**。
|
| 42 |
+
2. 選擇要查看的資料集。
|
| 43 |
+
3. 設定排序方式、分頁大小、顯示比例(0–1 或 0–100)。
|
| 44 |
+
4. 查看圖表與資料表,並可下載 CSV。
|
| 45 |
+
|
| 46 |
+
## 📂 檔案格式要求
|
| 47 |
+
每份 json / jsonl 檔案需符合 Twinkle Eval 格式,至少包含以下欄位:
|
| 48 |
+
|
| 49 |
+
```json
|
| 50 |
+
{
|
| 51 |
+
"timestamp": "2025-08-20T10:00:00",
|
| 52 |
+
"config": {
|
| 53 |
+
"model": { "name": "my-model" }
|
| 54 |
+
},
|
| 55 |
+
"dataset_results": {
|
| 56 |
+
"datasets/my_dataset": {
|
| 57 |
+
"average_accuracy": 0.85,
|
| 58 |
+
"results": [
|
| 59 |
+
{
|
| 60 |
+
"file": "category1.json",
|
| 61 |
+
"accuracy_mean": 0.9
|
| 62 |
+
},
|
| 63 |
+
{
|
| 64 |
+
"file": "category2.json",
|
| 65 |
+
"accuracy_mean": 0.8
|
| 66 |
+
}
|
| 67 |
+
]
|
| 68 |
+
}
|
| 69 |
+
}
|
| 70 |
+
}
|
| 71 |
+
```
|
| 72 |
+
或者可以到 Twinkle AI [Eval logs](https://huggingface.co/collections/twinkle-ai/eval-logs-6811a657da5ce4cbd75dbf50) collections 下載範例。
|
| 73 |
+
|
| 74 |
+
## 📊 輸出範例
|
| 75 |
+
|
| 76 |
+
- **圖表**:顯示各模型在不同類別的 accuracy_mean 比較。
|
| 77 |
+
- **表格**:Pivot Table,行為類別,列為模型,值為 accuracy。
|
| 78 |
+
- **下載**:每頁結果可匯出成 CSV。
|
| 79 |
+
|
| 80 |
+
## 📄 License
|
| 81 |
+
MIT
|
STYLEGUIDE.md
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# STYLEGUIDE
|
| 2 |
+
|
| 3 |
+
## 語言與措辭
|
| 4 |
+
- 課程內容與註解以 **繁體中文** 為主。
|
| 5 |
+
- 避免使用「台灣地區」措辭,直接使用「台灣」。
|
| 6 |
+
|
| 7 |
+
## Notebook 結構
|
| 8 |
+
每支 Notebook 建議包含以下段落:
|
| 9 |
+
1. 學習目標
|
| 10 |
+
2. 重點說明
|
| 11 |
+
3. 實作步驟(含完整可執行程式碼區塊)
|
| 12 |
+
4. 練習或作業
|
| 13 |
+
5. 延伸閱讀
|
| 14 |
+
|
| 15 |
+
## 程式風格
|
| 16 |
+
- 以 **教學清晰** 為優先,允許在 Notebook 內重覆程式碼(方便學員複習)。
|
| 17 |
+
- 變數命名具意義,必要時以註解補充脈絡。
|
| 18 |
+
- 使用 [PEP8](https://peps.python.org/pep-0008/) 作為 Python 基礎規範。
|
| 19 |
+
|
| 20 |
+
## 檔案命名
|
| 21 |
+
- Notebook 檔名以 `00_`, `01_`, `02_`… 開頭,保持執行順序一致。
|
| 22 |
+
- 資料集檔案建議用 snake_case,例如:`dialogues_raw.jsonl`、`sft.jsonl`。
|
app.py
ADDED
|
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
import io
|
| 3 |
+
from typing import List, Dict, Tuple
|
| 4 |
+
|
| 5 |
+
import pandas as pd
|
| 6 |
+
import numpy as np
|
| 7 |
+
import altair as alt
|
| 8 |
+
import streamlit as st
|
| 9 |
+
from pathlib import PurePosixPath
|
| 10 |
+
|
| 11 |
+
st.set_page_config(page_title="Twinkle Eval Analyzer", page_icon=":star2:", layout="wide")
|
| 12 |
+
|
| 13 |
+
st.title("✨ Twinkle Eval Analyzer (.json / .jsonl)")
|
| 14 |
+
|
| 15 |
+
# ----------------- Helpers -----------------
|
| 16 |
+
|
| 17 |
+
def _decode_bytes_to_text(b: bytes) -> str:
|
| 18 |
+
for enc in ("utf-8", "utf-16", "utf-16le", "utf-16be", "big5", "cp950"):
|
| 19 |
+
try:
|
| 20 |
+
return b.decode(enc)
|
| 21 |
+
except Exception:
|
| 22 |
+
continue
|
| 23 |
+
return b.decode("utf-8", errors="ignore")
|
| 24 |
+
|
| 25 |
+
def read_twinkle_doc(file) -> Dict:
|
| 26 |
+
raw = file.read()
|
| 27 |
+
if isinstance(raw, bytes):
|
| 28 |
+
text = _decode_bytes_to_text(raw)
|
| 29 |
+
else:
|
| 30 |
+
text = raw
|
| 31 |
+
text = text.strip()
|
| 32 |
+
try:
|
| 33 |
+
obj = json.loads(text)
|
| 34 |
+
except Exception:
|
| 35 |
+
for line in text.splitlines():
|
| 36 |
+
line = line.strip().rstrip(",")
|
| 37 |
+
if not line:
|
| 38 |
+
continue
|
| 39 |
+
try:
|
| 40 |
+
obj = json.loads(line)
|
| 41 |
+
break
|
| 42 |
+
except Exception:
|
| 43 |
+
continue
|
| 44 |
+
if not isinstance(obj, dict):
|
| 45 |
+
raise ValueError("檔案不是有效的 Twinkle Eval JSON 物件。")
|
| 46 |
+
if "timestamp" not in obj or "config" not in obj or "dataset_results" not in obj:
|
| 47 |
+
raise ValueError("缺少必要欄位")
|
| 48 |
+
return obj
|
| 49 |
+
|
| 50 |
+
def extract_records(doc: Dict) -> Tuple[pd.DataFrame, Dict[str, float]]:
|
| 51 |
+
model = doc.get("config", {}).get("model", {}).get("name", "<unknown>")
|
| 52 |
+
timestamp = doc.get("timestamp", "<no-ts>")
|
| 53 |
+
source_label = f"{model} @ {timestamp}"
|
| 54 |
+
rows = []
|
| 55 |
+
avg_map = {}
|
| 56 |
+
for ds_path, ds_payload in doc.get("dataset_results", {}).items():
|
| 57 |
+
ds_name = ds_path.split("datasets/")[-1].strip("/") if ds_path.startswith("datasets/") else ds_path
|
| 58 |
+
avg_meta = ds_payload.get("average_accuracy") if isinstance(ds_payload, dict) else None
|
| 59 |
+
results = ds_payload.get("results", []) if isinstance(ds_payload, dict) else []
|
| 60 |
+
for item in results:
|
| 61 |
+
if not isinstance(item, dict):
|
| 62 |
+
continue
|
| 63 |
+
file_path = item.get("file")
|
| 64 |
+
acc_mean = item.get("accuracy_mean")
|
| 65 |
+
if file_path is None or acc_mean is None:
|
| 66 |
+
continue
|
| 67 |
+
fname = PurePosixPath(file_path).name
|
| 68 |
+
category = fname.rsplit(".", 1)[0]
|
| 69 |
+
rows.append({
|
| 70 |
+
"dataset": ds_name,
|
| 71 |
+
"category": category,
|
| 72 |
+
"file": fname,
|
| 73 |
+
"accuracy_mean": float(acc_mean),
|
| 74 |
+
"source_label": source_label
|
| 75 |
+
})
|
| 76 |
+
if avg_meta is None and results:
|
| 77 |
+
vals = [float(it.get("accuracy_mean", np.nan)) for it in results if "accuracy_mean" in it]
|
| 78 |
+
if vals:
|
| 79 |
+
avg_meta = float(np.mean(vals))
|
| 80 |
+
if avg_meta is not None:
|
| 81 |
+
avg_map[ds_name] = avg_meta
|
| 82 |
+
return pd.DataFrame(rows), avg_map
|
| 83 |
+
|
| 84 |
+
def load_all(files) -> Tuple[pd.DataFrame, Dict[str, Dict[str, float]]]:
|
| 85 |
+
frames = []
|
| 86 |
+
meta = {}
|
| 87 |
+
for f in files or []:
|
| 88 |
+
try:
|
| 89 |
+
doc = read_twinkle_doc(f)
|
| 90 |
+
except Exception as e:
|
| 91 |
+
st.error(f"❌ 無法讀取 {getattr(f, 'name', '檔案')}:{e}")
|
| 92 |
+
continue
|
| 93 |
+
df, avg_map = extract_records(doc)
|
| 94 |
+
if not df.empty:
|
| 95 |
+
frames.append(df)
|
| 96 |
+
src = df["source_label"].iloc[0]
|
| 97 |
+
meta[src] = avg_map
|
| 98 |
+
if not frames:
|
| 99 |
+
return pd.DataFrame(columns=["dataset", "category", "file", "accuracy_mean", "source_label"]), {}
|
| 100 |
+
return pd.concat(frames, ignore_index=True), meta
|
| 101 |
+
|
| 102 |
+
# ----------------- Sidebar -----------------
|
| 103 |
+
|
| 104 |
+
with st.sidebar:
|
| 105 |
+
files = st.file_uploader("選擇 Twinkle Eval 檔案", type=["json", "jsonl"], accept_multiple_files=True)
|
| 106 |
+
df_all, meta_all = load_all(files)
|
| 107 |
+
normalize_0_100 = st.checkbox("以 0–100 顯示", value=False)
|
| 108 |
+
page_size = st.selectbox("每張圖顯示幾個類別", [10, 20, 30, 50, 100], index=1)
|
| 109 |
+
sort_mode = st.selectbox("排序方式", ["依整體平均由高到低", "依整體平均由低到高", "依字母排序"])
|
| 110 |
+
|
| 111 |
+
if df_all.empty:
|
| 112 |
+
st.info("請上傳 Twinkle Eval 檔案")
|
| 113 |
+
st.stop()
|
| 114 |
+
|
| 115 |
+
all_datasets = sorted(df_all["dataset"].unique().tolist())
|
| 116 |
+
selected_dataset = st.selectbox("選擇資料集", options=all_datasets)
|
| 117 |
+
work = df_all[df_all["dataset"] == selected_dataset].copy()
|
| 118 |
+
metric_plot = "accuracy_mean" + (" (x100)" if normalize_0_100 else "")
|
| 119 |
+
work[metric_plot] = work["accuracy_mean"] * (100.0 if normalize_0_100 else 1.0)
|
| 120 |
+
|
| 121 |
+
order_df = work.groupby("category")[metric_plot].mean().reset_index()
|
| 122 |
+
if sort_mode == "依整體平均由高到低":
|
| 123 |
+
order_df = order_df.sort_values(metric_plot, ascending=False)
|
| 124 |
+
elif sort_mode == "依整體平均由低到高":
|
| 125 |
+
order_df = order_df.sort_values(metric_plot, ascending=True)
|
| 126 |
+
else:
|
| 127 |
+
order_df = order_df.sort_values("category", ascending=True)
|
| 128 |
+
|
| 129 |
+
cat_order = order_df["category"].tolist()
|
| 130 |
+
work["category"] = pd.Categorical(work["category"], categories=cat_order, ordered=True)
|
| 131 |
+
|
| 132 |
+
n = len(cat_order)
|
| 133 |
+
pages = int(np.ceil(n / page_size))
|
| 134 |
+
|
| 135 |
+
for p in range(pages):
|
| 136 |
+
start, end = p * page_size, min((p + 1) * page_size, n)
|
| 137 |
+
subset_cats = cat_order[start:end]
|
| 138 |
+
sub = work[work["category"].isin(subset_cats)]
|
| 139 |
+
st.subheader(f"📊 {selected_dataset}|類別 {start+1}-{end} / {n}")
|
| 140 |
+
base = alt.Chart(sub).encode(
|
| 141 |
+
x=alt.X("category:N", sort=subset_cats),
|
| 142 |
+
y=alt.Y(f"{metric_plot}:Q"),
|
| 143 |
+
color=alt.Color("source_label:N"),
|
| 144 |
+
tooltip=["source_label", "file", alt.Tooltip(metric_plot, format=".3f")]
|
| 145 |
+
)
|
| 146 |
+
bars = base.mark_bar().encode(xOffset="source_label")
|
| 147 |
+
st.altair_chart(bars.properties(height=420), use_container_width=True)
|
| 148 |
+
pivot = sub.pivot_table(index="category", columns="source_label", values=metric_plot)
|
| 149 |
+
st.dataframe(pivot, use_container_width=True)
|
| 150 |
+
st.download_button(
|
| 151 |
+
label=f"下載此頁 CSV ({start+1}-{end})",
|
| 152 |
+
data=pivot.reset_index().to_csv(index=False).encode("utf-8"),
|
| 153 |
+
file_name=f"twinkle_{selected_dataset}_{start+1}_{end}.csv",
|
| 154 |
+
mime="text/csv"
|
| 155 |
+
)
|
requirements.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
pandas
|
| 2 |
+
altair
|
| 3 |
+
streamlit
|
| 4 |
+
|
| 5 |
+
|