agg-trace-visualizer / docs /managing_presets.md
Zayne Rea Sprague
small tweak
cdf803d
# Managing AGG_VIS_PRESETS Programmatically
## Overview
The agg_visualizer stores presets in the HuggingFace dataset repo `reasoning-degeneration-dev/AGG_VIS_PRESETS`. Each visualizer type has its own JSON file:
| Type | File | Extra Fields |
|------|------|-------------|
| `model` | `model_presets.json` | `column` (default: `"model_responses"`) |
| `arena` | `arena_presets.json` | none |
| `rlm` | `rlm_presets.json` | `config` (default: `"rlm_call_traces"`) |
| `harbor` | `harbor_presets.json` | none |
## Preset Schema
Every preset has these base fields:
```json
{
"id": "8-char hex",
"name": "Human-readable name",
"repo": "org/dataset-name",
"split": "train"
}
```
Plus type-specific fields listed above.
## How to Add Presets from Experiment Markdown Files
### Step 1: Identify repos and their visualizer type
Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each:
- **Countdown / MuSR datasets** (model response traces) β†’ `model` type, set `column: "response"`
- **FrozenLake / arena datasets** (game episodes) β†’ `arena` type
- **Harbor / SWE-bench datasets** β†’ `harbor` type
- **RLM call traces** β†’ `rlm` type, set `config: "rlm_call_traces"`
### Step 2: Download existing presets from HF
```python
from huggingface_hub import hf_hub_download
import json
PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS"
def load_hf_presets(vis_type):
try:
path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
with open(path) as f:
return json.load(f)
except Exception:
return []
existing_model = load_hf_presets("model")
existing_arena = load_hf_presets("arena")
# ... etc for rlm, harbor
# Build set of repos already present
existing_repos = set()
for presets_list in [existing_model, existing_arena]:
for p in presets_list:
existing_repos.add(p["repo"])
```
### Step 3: Build new presets, skipping duplicates
```python
import uuid
new_presets = [] # list of (vis_type, name, repo)
# Example: adding strategy compliance countdown presets
new_presets.append(("model", "SC Countdown K2-Inst TreeSearch",
"reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst"))
# ... add all repos from the markdown ...
# Filter out existing
to_add = {"model": [], "arena": [], "rlm": [], "harbor": []}
for vis_type, name, repo in new_presets:
if repo in existing_repos:
continue # skip duplicates
preset = {
"id": uuid.uuid4().hex[:8],
"name": name,
"repo": repo,
"split": "train",
}
if vis_type == "model":
preset["column"] = "response"
elif vis_type == "rlm":
preset["config"] = "rlm_call_traces"
to_add[vis_type].append(preset)
```
### Step 4: Merge and upload to HF
```python
import tempfile, os
from huggingface_hub import HfApi
api = HfApi()
# Merge new presets with existing
final_model = existing_model + to_add["model"]
final_arena = existing_arena + to_add["arena"]
for vis_type, presets in [("model", final_model), ("arena", final_arena)]:
if not presets:
continue
with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f:
json.dump(presets, f, indent=2)
tmp = f.name
api.upload_file(
path_or_fileobj=tmp,
path_in_repo=f"{vis_type}_presets.json",
repo_id=PRESETS_REPO,
repo_type="dataset",
)
os.unlink(tmp)
```
### Step 5: Sync the deployed HF Space
After uploading to the HF dataset, tell the running Space to re-download presets:
```bash
curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync"
```
This forces the Space to re-download all preset files from `AGG_VIS_PRESETS` without needing a restart or redeployment.
### Step 6: Sync local preset files
```python
import shutil
from huggingface_hub import hf_hub_download
local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets"
for vis_type in ["model", "arena", "rlm", "harbor"]:
try:
path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json")
except Exception:
pass
```
## Naming Convention
Preset names follow this pattern to be descriptive and avoid future conflicts:
```
{Experiment} {Task} {Model} {Variant}
```
### Experiment prefixes
- `SC` β€” Strategy Compliance
- `Wing` β€” Wingdings Compliance
### Model abbreviations
- `K2-Inst` β€” Kimi-K2-Instruct (RLHF)
- `K2-Think` β€” Kimi-K2-Thinking (RLVR)
- `Q3-Inst` β€” Qwen3-Next-80B Instruct (RLHF)
- `Q3-Think` β€” Qwen3-Next-80B Thinking (RLVR)
### Task names
- `Countdown` β€” 8-arg arithmetic countdown
- `MuSR` β€” MuSR murder mysteries
- `FrozenLake` β€” FrozenLake grid navigation
### Variant names (strategy compliance only)
- `TreeSearch` / `Baseline` / `Anti` β€” countdown tree search experiment
- `CritFirst` / `Anti-CritFirst` β€” criterion-first cross-cutting analysis
- `Counterfactual` / `Anti-Counterfactual` β€” counterfactual hypothesis testing
- `BackChain` β€” backward chaining (FrozenLake)
### Examples
```
SC Countdown K2-Inst TreeSearch # Strategy compliance, countdown, Kimi instruct, tree search variant
SC MuSR Q3-Think Counterfactual # Strategy compliance, MuSR, Qwen thinking, counterfactual variant
SC FrozenLake K2-Think BackChain # Strategy compliance, FrozenLake, Kimi thinking, backward chaining
Wing Countdown Q3-Inst # Wingdings, countdown, Qwen instruct (no variant β€” wingdings has one condition)
Wing MuSR K2-Think # Wingdings, MuSR, Kimi thinking
```
## Important Notes
- **Always check for existing repos** before adding. The script above uses `existing_repos` set to skip duplicates.
- **The `column` field matters for model presets.** Strategy compliance and wingdings datasets use `"response"` as the response column, not the default `"model_responses"`.
- **Local files are fallback cache.** The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the `/api/presets/sync` endpoint).
- **Don't modify rlm or harbor presets** unless adding datasets of those types. The script above only touches model and arena.