Spaces:

ellamind
/

eval-suite-visualization

Running

App Files Files Community

eval-suite-visualization / README.md

maxidl

Upload README.md with huggingface_hub

2c80f80 verified 14 days ago

preview code

raw

history blame contribute delete

1.85 kB

metadata

title: Eval Suite Visualization
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false

Eval Suite Visualization

A static web app for visualizing LLM evaluation scores. Data is loaded directly from a HuggingFace dataset (ellamind/eval-scores) using DuckDB-WASM — no preprocessing or backend required.

Features

Hierarchical task selection: eval suite → task group → individual benchmark, with aggregate views
Multiple metrics: acc, acc_norm, bits_per_byte, exact_match, pass@1, etc.
Model comparison: toggle models on/off; separate checkpoint runs from baselines
Auto chart type: line charts for training runs (tokens trained on x-axis), bar charts for single-point comparisons
Multi-panel layout: add multiple independent panels side by side
Smoothing: configurable moving average for line charts
Export: download charts as PNG or SVG

Quick Start

Serve the app with any static file server:

python3 -m http.server 8080

Then open http://localhost:8080. The app fetches the parquet data directly from HuggingFace on load.

Project Structure

index.html    # Single-file web app (HTML + CSS + JS)
config.yaml   # Model color overrides
README.md     # HF Spaces metadata + docs

Configuration

Model colors can be customized in config.yaml:

model_colors:
  "D01": "#4361ee"
  "Qwen3 1.7B": "#6F53D1"

Exact matches are checked first, then prefix matches. Models without a configured color get assigned one from a default palette.

Deployment

This app is deployed as a Static HTML Space on Hugging Face. To deploy:

huggingface-cli upload ellamind/eval-suite-visualization . . --repo-type space