metadata
title: Eval Suite Visualization
emoji: π
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
Eval Suite Visualization
A static web app for visualizing LLM evaluation scores. Data is loaded directly from a HuggingFace dataset (ellamind/eval-scores) using DuckDB-WASM β no preprocessing or backend required.
Features
- Hierarchical task selection: eval suite β task group β individual benchmark, with aggregate views
- Multiple metrics:
acc,acc_norm,bits_per_byte,exact_match,pass@1, etc. - Model comparison: toggle models on/off; separate checkpoint runs from baselines
- Auto chart type: line charts for training runs (tokens trained on x-axis), bar charts for single-point comparisons
- Multi-panel layout: add multiple independent panels side by side
- Smoothing: configurable moving average for line charts
- Export: download charts as PNG or SVG
Quick Start
Serve the app with any static file server:
python3 -m http.server 8080
Then open http://localhost:8080. The app fetches the parquet data directly from HuggingFace on load.
Project Structure
index.html # Single-file web app (HTML + CSS + JS)
config.yaml # Model color overrides
README.md # HF Spaces metadata + docs
Configuration
Model colors can be customized in config.yaml:
model_colors:
"D01": "#4361ee"
"Qwen3 1.7B": "#6F53D1"
Exact matches are checked first, then prefix matches. Models without a configured color get assigned one from a default palette.
Deployment
This app is deployed as a Static HTML Space on Hugging Face. To deploy:
huggingface-cli upload ellamind/eval-suite-visualization . . --repo-type space