agg-trace-visualizer / docs /managing_presets.md
Zayne Rea Sprague
small tweak
cdf803d

Managing AGG_VIS_PRESETS Programmatically

Overview

The agg_visualizer stores presets in the HuggingFace dataset repo reasoning-degeneration-dev/AGG_VIS_PRESETS. Each visualizer type has its own JSON file:

Type File Extra Fields
model model_presets.json column (default: "model_responses")
arena arena_presets.json none
rlm rlm_presets.json config (default: "rlm_call_traces")
harbor harbor_presets.json none

Preset Schema

Every preset has these base fields:

{
  "id": "8-char hex",
  "name": "Human-readable name",
  "repo": "org/dataset-name",
  "split": "train"
}

Plus type-specific fields listed above.

How to Add Presets from Experiment Markdown Files

Step 1: Identify repos and their visualizer type

Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each:

  • Countdown / MuSR datasets (model response traces) β†’ model type, set column: "response"
  • FrozenLake / arena datasets (game episodes) β†’ arena type
  • Harbor / SWE-bench datasets β†’ harbor type
  • RLM call traces β†’ rlm type, set config: "rlm_call_traces"

Step 2: Download existing presets from HF

from huggingface_hub import hf_hub_download
import json

PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS"

def load_hf_presets(vis_type):
    try:
        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
        with open(path) as f:
            return json.load(f)
    except Exception:
        return []

existing_model = load_hf_presets("model")
existing_arena = load_hf_presets("arena")
# ... etc for rlm, harbor

# Build set of repos already present
existing_repos = set()
for presets_list in [existing_model, existing_arena]:
    for p in presets_list:
        existing_repos.add(p["repo"])

Step 3: Build new presets, skipping duplicates

import uuid

new_presets = []  # list of (vis_type, name, repo)

# Example: adding strategy compliance countdown presets
new_presets.append(("model", "SC Countdown K2-Inst TreeSearch",
    "reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst"))

# ... add all repos from the markdown ...

# Filter out existing
to_add = {"model": [], "arena": [], "rlm": [], "harbor": []}
for vis_type, name, repo in new_presets:
    if repo in existing_repos:
        continue  # skip duplicates
    preset = {
        "id": uuid.uuid4().hex[:8],
        "name": name,
        "repo": repo,
        "split": "train",
    }
    if vis_type == "model":
        preset["column"] = "response"
    elif vis_type == "rlm":
        preset["config"] = "rlm_call_traces"
    to_add[vis_type].append(preset)

Step 4: Merge and upload to HF

import tempfile, os
from huggingface_hub import HfApi

api = HfApi()

# Merge new presets with existing
final_model = existing_model + to_add["model"]
final_arena = existing_arena + to_add["arena"]

for vis_type, presets in [("model", final_model), ("arena", final_arena)]:
    if not presets:
        continue
    with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f:
        json.dump(presets, f, indent=2)
        tmp = f.name
    api.upload_file(
        path_or_fileobj=tmp,
        path_in_repo=f"{vis_type}_presets.json",
        repo_id=PRESETS_REPO,
        repo_type="dataset",
    )
    os.unlink(tmp)

Step 5: Sync the deployed HF Space

After uploading to the HF dataset, tell the running Space to re-download presets:

curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync"

This forces the Space to re-download all preset files from AGG_VIS_PRESETS without needing a restart or redeployment.

Step 6: Sync local preset files

import shutil
from huggingface_hub import hf_hub_download

local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets"
for vis_type in ["model", "arena", "rlm", "harbor"]:
    try:
        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
        shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json")
    except Exception:
        pass

Naming Convention

Preset names follow this pattern to be descriptive and avoid future conflicts:

{Experiment} {Task} {Model} {Variant}

Experiment prefixes

  • SC β€” Strategy Compliance
  • Wing β€” Wingdings Compliance

Model abbreviations

  • K2-Inst β€” Kimi-K2-Instruct (RLHF)
  • K2-Think β€” Kimi-K2-Thinking (RLVR)
  • Q3-Inst β€” Qwen3-Next-80B Instruct (RLHF)
  • Q3-Think β€” Qwen3-Next-80B Thinking (RLVR)

Task names

  • Countdown β€” 8-arg arithmetic countdown
  • MuSR β€” MuSR murder mysteries
  • FrozenLake β€” FrozenLake grid navigation

Variant names (strategy compliance only)

  • TreeSearch / Baseline / Anti β€” countdown tree search experiment
  • CritFirst / Anti-CritFirst β€” criterion-first cross-cutting analysis
  • Counterfactual / Anti-Counterfactual β€” counterfactual hypothesis testing
  • BackChain β€” backward chaining (FrozenLake)

Examples

SC Countdown K2-Inst TreeSearch       # Strategy compliance, countdown, Kimi instruct, tree search variant
SC MuSR Q3-Think Counterfactual       # Strategy compliance, MuSR, Qwen thinking, counterfactual variant
SC FrozenLake K2-Think BackChain      # Strategy compliance, FrozenLake, Kimi thinking, backward chaining
Wing Countdown Q3-Inst                # Wingdings, countdown, Qwen instruct (no variant β€” wingdings has one condition)
Wing MuSR K2-Think                    # Wingdings, MuSR, Kimi thinking

Important Notes

  • Always check for existing repos before adding. The script above uses existing_repos set to skip duplicates.
  • The column field matters for model presets. Strategy compliance and wingdings datasets use "response" as the response column, not the default "model_responses".
  • Local files are fallback cache. The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the /api/presets/sync endpoint).
  • Don't modify rlm or harbor presets unless adding datasets of those types. The script above only touches model and arena.