Managing AGG_VIS_PRESETS Programmatically
Overview
The agg_visualizer stores presets in the HuggingFace dataset repo reasoning-degeneration-dev/AGG_VIS_PRESETS. Each visualizer type has its own JSON file:
| Type | File | Extra Fields |
|---|---|---|
model |
model_presets.json |
column (default: "model_responses") |
arena |
arena_presets.json |
none |
rlm |
rlm_presets.json |
config (default: "rlm_call_traces") |
harbor |
harbor_presets.json |
none |
Preset Schema
Every preset has these base fields:
{
"id": "8-char hex",
"name": "Human-readable name",
"repo": "org/dataset-name",
"split": "train"
}
Plus type-specific fields listed above.
How to Add Presets from Experiment Markdown Files
Step 1: Identify repos and their visualizer type
Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each:
- Countdown / MuSR datasets (model response traces) β
modeltype, setcolumn: "response" - FrozenLake / arena datasets (game episodes) β
arenatype - Harbor / SWE-bench datasets β
harbortype - RLM call traces β
rlmtype, setconfig: "rlm_call_traces"
Step 2: Download existing presets from HF
from huggingface_hub import hf_hub_download
import json
PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS"
def load_hf_presets(vis_type):
try:
path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
with open(path) as f:
return json.load(f)
except Exception:
return []
existing_model = load_hf_presets("model")
existing_arena = load_hf_presets("arena")
# ... etc for rlm, harbor
# Build set of repos already present
existing_repos = set()
for presets_list in [existing_model, existing_arena]:
for p in presets_list:
existing_repos.add(p["repo"])
Step 3: Build new presets, skipping duplicates
import uuid
new_presets = [] # list of (vis_type, name, repo)
# Example: adding strategy compliance countdown presets
new_presets.append(("model", "SC Countdown K2-Inst TreeSearch",
"reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst"))
# ... add all repos from the markdown ...
# Filter out existing
to_add = {"model": [], "arena": [], "rlm": [], "harbor": []}
for vis_type, name, repo in new_presets:
if repo in existing_repos:
continue # skip duplicates
preset = {
"id": uuid.uuid4().hex[:8],
"name": name,
"repo": repo,
"split": "train",
}
if vis_type == "model":
preset["column"] = "response"
elif vis_type == "rlm":
preset["config"] = "rlm_call_traces"
to_add[vis_type].append(preset)
Step 4: Merge and upload to HF
import tempfile, os
from huggingface_hub import HfApi
api = HfApi()
# Merge new presets with existing
final_model = existing_model + to_add["model"]
final_arena = existing_arena + to_add["arena"]
for vis_type, presets in [("model", final_model), ("arena", final_arena)]:
if not presets:
continue
with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f:
json.dump(presets, f, indent=2)
tmp = f.name
api.upload_file(
path_or_fileobj=tmp,
path_in_repo=f"{vis_type}_presets.json",
repo_id=PRESETS_REPO,
repo_type="dataset",
)
os.unlink(tmp)
Step 5: Sync the deployed HF Space
After uploading to the HF dataset, tell the running Space to re-download presets:
curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync"
This forces the Space to re-download all preset files from AGG_VIS_PRESETS without needing a restart or redeployment.
Step 6: Sync local preset files
import shutil
from huggingface_hub import hf_hub_download
local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets"
for vis_type in ["model", "arena", "rlm", "harbor"]:
try:
path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json")
except Exception:
pass
Naming Convention
Preset names follow this pattern to be descriptive and avoid future conflicts:
{Experiment} {Task} {Model} {Variant}
Experiment prefixes
SCβ Strategy ComplianceWingβ Wingdings Compliance
Model abbreviations
K2-Instβ Kimi-K2-Instruct (RLHF)K2-Thinkβ Kimi-K2-Thinking (RLVR)Q3-Instβ Qwen3-Next-80B Instruct (RLHF)Q3-Thinkβ Qwen3-Next-80B Thinking (RLVR)
Task names
Countdownβ 8-arg arithmetic countdownMuSRβ MuSR murder mysteriesFrozenLakeβ FrozenLake grid navigation
Variant names (strategy compliance only)
TreeSearch/Baseline/Antiβ countdown tree search experimentCritFirst/Anti-CritFirstβ criterion-first cross-cutting analysisCounterfactual/Anti-Counterfactualβ counterfactual hypothesis testingBackChainβ backward chaining (FrozenLake)
Examples
SC Countdown K2-Inst TreeSearch # Strategy compliance, countdown, Kimi instruct, tree search variant
SC MuSR Q3-Think Counterfactual # Strategy compliance, MuSR, Qwen thinking, counterfactual variant
SC FrozenLake K2-Think BackChain # Strategy compliance, FrozenLake, Kimi thinking, backward chaining
Wing Countdown Q3-Inst # Wingdings, countdown, Qwen instruct (no variant β wingdings has one condition)
Wing MuSR K2-Think # Wingdings, MuSR, Kimi thinking
Important Notes
- Always check for existing repos before adding. The script above uses
existing_reposset to skip duplicates. - The
columnfield matters for model presets. Strategy compliance and wingdings datasets use"response"as the response column, not the default"model_responses". - Local files are fallback cache. The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the
/api/presets/syncendpoint). - Don't modify rlm or harbor presets unless adding datasets of those types. The script above only touches model and arena.