Spaces:

ellamind
/

eval-suite-visualization

Running

App Files Files Community

eval-suite-visualization / README.md

maxidl

Upload README.md with huggingface_hub

2c80f80 verified 14 days ago

preview code

raw

history blame contribute delete

1.85 kB

	---
	title: Eval Suite Visualization
	emoji: 📊
	colorFrom: blue
	colorTo: indigo
	sdk: static
	pinned: false
	---

	# Eval Suite Visualization

	A static web app for visualizing LLM evaluation scores. Data is loaded directly from a HuggingFace dataset ([ellamind/eval-scores](https://huggingface.co/datasets/ellamind/eval-scores)) using DuckDB-WASM — no preprocessing or backend required.

	## Features

	- Hierarchical task selection: eval suite → task group → individual benchmark, with aggregate views
	- Multiple metrics: `acc`, `acc_norm`, `bits_per_byte`, `exact_match`, `pass@1`, etc.
	- Model comparison: toggle models on/off; separate checkpoint runs from baselines
	- Auto chart type: line charts for training runs (tokens trained on x-axis), bar charts for single-point comparisons
	- Multi-panel layout: add multiple independent panels side by side
	- Smoothing: configurable moving average for line charts
	- Export: download charts as PNG or SVG

	## Quick Start

	Serve the app with any static file server:

	```bash
	python3 -m http.server 8080
	```

	Then open `http://localhost:8080`. The app fetches the parquet data directly from HuggingFace on load.

	## Project Structure

	```
	index.html # Single-file web app (HTML + CSS + JS)
	config.yaml # Model color overrides
	README.md # HF Spaces metadata + docs
	```

	## Configuration

	Model colors can be customized in `config.yaml`:

	```yaml
	model_colors:
	"D01": "#4361ee"
	"Qwen3 1.7B": "#6F53D1"
	```

	Exact matches are checked first, then prefix matches. Models without a configured color get assigned one from a default palette.

	## Deployment

	This app is deployed as a [Static HTML Space](https://huggingface.co/docs/hub/spaces-sdks-static) on Hugging Face. To deploy:

	```bash
	huggingface-cli upload ellamind/eval-suite-visualization . . --repo-type space
	```