| | --- |
| | title: Eval Suite Visualization |
| | emoji: π |
| | colorFrom: blue |
| | colorTo: indigo |
| | sdk: static |
| | pinned: false |
| | --- |
| | |
| | # Eval Suite Visualization |
| |
|
| | A static web app for visualizing LLM evaluation scores. Data is loaded directly from a HuggingFace dataset ([ellamind/eval-scores](https://huggingface.co/datasets/ellamind/eval-scores)) using DuckDB-WASM β no preprocessing or backend required. |
| |
|
| | ## Features |
| |
|
| | - **Hierarchical task selection**: eval suite β task group β individual benchmark, with aggregate views |
| | - **Multiple metrics**: `acc`, `acc_norm`, `bits_per_byte`, `exact_match`, `pass@1`, etc. |
| | - **Model comparison**: toggle models on/off; separate checkpoint runs from baselines |
| | - **Auto chart type**: line charts for training runs (tokens trained on x-axis), bar charts for single-point comparisons |
| | - **Multi-panel layout**: add multiple independent panels side by side |
| | - **Smoothing**: configurable moving average for line charts |
| | - **Export**: download charts as PNG or SVG |
| |
|
| | ## Quick Start |
| |
|
| | Serve the app with any static file server: |
| |
|
| | ```bash |
| | python3 -m http.server 8080 |
| | ``` |
| |
|
| | Then open `http://localhost:8080`. The app fetches the parquet data directly from HuggingFace on load. |
| |
|
| | ## Project Structure |
| |
|
| | ``` |
| | index.html # Single-file web app (HTML + CSS + JS) |
| | config.yaml # Model color overrides |
| | README.md # HF Spaces metadata + docs |
| | ``` |
| |
|
| | ## Configuration |
| |
|
| | Model colors can be customized in `config.yaml`: |
| |
|
| | ```yaml |
| | model_colors: |
| | "D01": "#4361ee" |
| | "Qwen3 1.7B": "#6F53D1" |
| | ``` |
| |
|
| | Exact matches are checked first, then prefix matches. Models without a configured color get assigned one from a default palette. |
| |
|
| | ## Deployment |
| |
|
| | This app is deployed as a [Static HTML Space](https://huggingface.co/docs/hub/spaces-sdks-static) on Hugging Face. To deploy: |
| |
|
| | ```bash |
| | huggingface-cli upload ellamind/eval-suite-visualization . . --repo-type space |
| | ``` |
| |
|