File size: 1,854 Bytes
c4a5bf1
 
 
2c80f80
 
c4a5bf1
 
 
 
2c80f80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: Eval Suite Visualization
emoji: πŸ“Š
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
---

# Eval Suite Visualization

A static web app for visualizing LLM evaluation scores. Data is loaded directly from a HuggingFace dataset ([ellamind/eval-scores](https://huggingface.co/datasets/ellamind/eval-scores)) using DuckDB-WASM β€” no preprocessing or backend required.

## Features

- **Hierarchical task selection**: eval suite β†’ task group β†’ individual benchmark, with aggregate views
- **Multiple metrics**: `acc`, `acc_norm`, `bits_per_byte`, `exact_match`, `pass@1`, etc.
- **Model comparison**: toggle models on/off; separate checkpoint runs from baselines
- **Auto chart type**: line charts for training runs (tokens trained on x-axis), bar charts for single-point comparisons
- **Multi-panel layout**: add multiple independent panels side by side
- **Smoothing**: configurable moving average for line charts
- **Export**: download charts as PNG or SVG

## Quick Start

Serve the app with any static file server:

```bash
python3 -m http.server 8080
```

Then open `http://localhost:8080`. The app fetches the parquet data directly from HuggingFace on load.

## Project Structure

```
index.html    # Single-file web app (HTML + CSS + JS)
config.yaml   # Model color overrides
README.md     # HF Spaces metadata + docs
```

## Configuration

Model colors can be customized in `config.yaml`:

```yaml
model_colors:
  "D01": "#4361ee"
  "Qwen3 1.7B": "#6F53D1"
```

Exact matches are checked first, then prefix matches. Models without a configured color get assigned one from a default palette.

## Deployment

This app is deployed as a [Static HTML Space](https://huggingface.co/docs/hub/spaces-sdks-static) on Hugging Face. To deploy:

```bash
huggingface-cli upload ellamind/eval-suite-visualization . . --repo-type space
```