Spaces:

reasoning-degeneration-dev
/

agg-trace-visualizer

Running

App Files Files Community

agg-trace-visualizer / docs /managing_presets.md

Zayne Rea Sprague

small tweak

cdf803d 9 days ago

preview code

raw

history blame contribute delete

6.43 kB

Managing AGG_VIS_PRESETS Programmatically

Overview

The agg_visualizer stores presets in the HuggingFace dataset repo reasoning-degeneration-dev/AGG_VIS_PRESETS. Each visualizer type has its own JSON file:

Type	File	Extra Fields
`model`	`model_presets.json`	`column` (default: `"model_responses"`)
`arena`	`arena_presets.json`	none
`rlm`	`rlm_presets.json`	`config` (default: `"rlm_call_traces"`)
`harbor`	`harbor_presets.json`	none

Preset Schema

Every preset has these base fields:

{
  "id": "8-char hex",
  "name": "Human-readable name",
  "repo": "org/dataset-name",
  "split": "train"
}

Plus type-specific fields listed above.

How to Add Presets from Experiment Markdown Files

Step 1: Identify repos and their visualizer type

Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each:

Countdown / MuSR datasets (model response traces) → model type, set column: "response"
FrozenLake / arena datasets (game episodes) → arena type
Harbor / SWE-bench datasets → harbor type
RLM call traces → rlm type, set config: "rlm_call_traces"

Step 2: Download existing presets from HF

from huggingface_hub import hf_hub_download
import json

PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS"

def load_hf_presets(vis_type):
    try:
        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
        with open(path) as f:
            return json.load(f)
    except Exception:
        return []

existing_model = load_hf_presets("model")
existing_arena = load_hf_presets("arena")
# ... etc for rlm, harbor

# Build set of repos already present
existing_repos = set()
for presets_list in [existing_model, existing_arena]:
    for p in presets_list:
        existing_repos.add(p["repo"])

Step 3: Build new presets, skipping duplicates

import uuid

new_presets = []  # list of (vis_type, name, repo)

# Example: adding strategy compliance countdown presets
new_presets.append(("model", "SC Countdown K2-Inst TreeSearch",
    "reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst"))

# ... add all repos from the markdown ...

# Filter out existing
to_add = {"model": [], "arena": [], "rlm": [], "harbor": []}
for vis_type, name, repo in new_presets:
    if repo in existing_repos:
        continue  # skip duplicates
    preset = {
        "id": uuid.uuid4().hex[:8],
        "name": name,
        "repo": repo,
        "split": "train",
    }
    if vis_type == "model":
        preset["column"] = "response"
    elif vis_type == "rlm":
        preset["config"] = "rlm_call_traces"
    to_add[vis_type].append(preset)

Step 4: Merge and upload to HF

import tempfile, os
from huggingface_hub import HfApi

api = HfApi()

# Merge new presets with existing
final_model = existing_model + to_add["model"]
final_arena = existing_arena + to_add["arena"]

for vis_type, presets in [("model", final_model), ("arena", final_arena)]:
    if not presets:
        continue
    with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f:
        json.dump(presets, f, indent=2)
        tmp = f.name
    api.upload_file(
        path_or_fileobj=tmp,
        path_in_repo=f"{vis_type}_presets.json",
        repo_id=PRESETS_REPO,
        repo_type="dataset",
    )
    os.unlink(tmp)

Step 5: Sync the deployed HF Space

After uploading to the HF dataset, tell the running Space to re-download presets:

curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync"

This forces the Space to re-download all preset files from AGG_VIS_PRESETS without needing a restart or redeployment.

Step 6: Sync local preset files

import shutil
from huggingface_hub import hf_hub_download

local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets"
for vis_type in ["model", "arena", "rlm", "harbor"]:
    try:
        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
        shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json")
    except Exception:
        pass

Naming Convention

Preset names follow this pattern to be descriptive and avoid future conflicts:

{Experiment} {Task} {Model} {Variant}

Experiment prefixes

SC — Strategy Compliance
Wing — Wingdings Compliance

Model abbreviations

K2-Inst — Kimi-K2-Instruct (RLHF)
K2-Think — Kimi-K2-Thinking (RLVR)
Q3-Inst — Qwen3-Next-80B Instruct (RLHF)
Q3-Think — Qwen3-Next-80B Thinking (RLVR)

Task names

Countdown — 8-arg arithmetic countdown
MuSR — MuSR murder mysteries
FrozenLake — FrozenLake grid navigation

Variant names (strategy compliance only)

TreeSearch / Baseline / Anti — countdown tree search experiment
CritFirst / Anti-CritFirst — criterion-first cross-cutting analysis
Counterfactual / Anti-Counterfactual — counterfactual hypothesis testing
BackChain — backward chaining (FrozenLake)

Examples

SC Countdown K2-Inst TreeSearch       # Strategy compliance, countdown, Kimi instruct, tree search variant
SC MuSR Q3-Think Counterfactual       # Strategy compliance, MuSR, Qwen thinking, counterfactual variant
SC FrozenLake K2-Think BackChain      # Strategy compliance, FrozenLake, Kimi thinking, backward chaining
Wing Countdown Q3-Inst                # Wingdings, countdown, Qwen instruct (no variant — wingdings has one condition)
Wing MuSR K2-Think                    # Wingdings, MuSR, Kimi thinking

Important Notes

Always check for existing repos before adding. The script above uses existing_repos set to skip duplicates.
The column field matters for model presets. Strategy compliance and wingdings datasets use "response" as the response column, not the default "model_responses".
Local files are fallback cache. The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the /api/presets/sync endpoint).
Don't modify rlm or harbor presets unless adding datasets of those types. The script above only touches model and arena.