--- title: Eval Suite Visualization emoji: 📊 colorFrom: blue colorTo: indigo sdk: static pinned: false --- # Eval Suite Visualization A static web app for visualizing LLM evaluation scores. Data is loaded directly from a HuggingFace dataset ([ellamind/eval-scores](https://huggingface.co/datasets/ellamind/eval-scores)) using DuckDB-WASM — no preprocessing or backend required. ## Features - **Hierarchical task selection**: eval suite → task group → individual benchmark, with aggregate views - **Multiple metrics**: `acc`, `acc_norm`, `bits_per_byte`, `exact_match`, `pass@1`, etc. - **Model comparison**: toggle models on/off; separate checkpoint runs from baselines - **Auto chart type**: line charts for training runs (tokens trained on x-axis), bar charts for single-point comparisons - **Multi-panel layout**: add multiple independent panels side by side - **Smoothing**: configurable moving average for line charts - **Export**: download charts as PNG or SVG ## Quick Start Serve the app with any static file server: ```bash python3 -m http.server 8080 ``` Then open `http://localhost:8080`. The app fetches the parquet data directly from HuggingFace on load. ## Project Structure ``` index.html # Single-file web app (HTML + CSS + JS) config.yaml # Model color overrides README.md # HF Spaces metadata + docs ``` ## Configuration Model colors can be customized in `config.yaml`: ```yaml model_colors: "D01": "#4361ee" "Qwen3 1.7B": "#6F53D1" ``` Exact matches are checked first, then prefix matches. Models without a configured color get assigned one from a default palette. ## Deployment This app is deployed as a [Static HTML Space](https://huggingface.co/docs/hub/spaces-sdks-static) on Hugging Face. To deploy: ```bash huggingface-cli upload ellamind/eval-suite-visualization . . --repo-type space ```