Spaces:

reasoning-degeneration-dev
/

agg-trace-visualizer

Running

App Files Files Community

agg-trace-visualizer / docs /managing_presets.md

Zayne Rea Sprague

small tweak

cdf803d 10 days ago

preview code

raw

history blame contribute delete

6.43 kB

	# Managing AGG_VIS_PRESETS Programmatically

	## Overview

	The agg_visualizer stores presets in the HuggingFace dataset repo `reasoning-degeneration-dev/AGG_VIS_PRESETS`. Each visualizer type has its own JSON file:

	\| Type \| File \| Extra Fields \|
	\|------\|------\|-------------\|
	\| `model` \| `model_presets.json` \| `column` (default: `"model_responses"`) \|
	\| `arena` \| `arena_presets.json` \| none \|
	\| `rlm` \| `rlm_presets.json` \| `config` (default: `"rlm_call_traces"`) \|
	\| `harbor` \| `harbor_presets.json` \| none \|

	## Preset Schema

	Every preset has these base fields:

	```json
	{
	"id": "8-char hex",
	"name": "Human-readable name",
	"repo": "org/dataset-name",
	"split": "train"
	}
	```

	Plus type-specific fields listed above.

	## How to Add Presets from Experiment Markdown Files

	### Step 1: Identify repos and their visualizer type

	Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each:

	- Countdown / MuSR datasets (model response traces) → `model` type, set `column: "response"`
	- FrozenLake / arena datasets (game episodes) → `arena` type
	- Harbor / SWE-bench datasets → `harbor` type
	- RLM call traces → `rlm` type, set `config: "rlm_call_traces"`

	### Step 2: Download existing presets from HF

	```python
	from huggingface_hub import hf_hub_download
	import json

	PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS"

	def load_hf_presets(vis_type):
	try:
	path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
	with open(path) as f:
	return json.load(f)
	except Exception:
	return []

	existing_model = load_hf_presets("model")
	existing_arena = load_hf_presets("arena")
	# ... etc for rlm, harbor

	# Build set of repos already present
	existing_repos = set()
	for presets_list in [existing_model, existing_arena]:
	for p in presets_list:
	existing_repos.add(p["repo"])
	```

	### Step 3: Build new presets, skipping duplicates

	```python
	import uuid

	new_presets = [] # list of (vis_type, name, repo)

	# Example: adding strategy compliance countdown presets
	new_presets.append(("model", "SC Countdown K2-Inst TreeSearch",
	"reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst"))

	# ... add all repos from the markdown ...

	# Filter out existing
	to_add = {"model": [], "arena": [], "rlm": [], "harbor": []}
	for vis_type, name, repo in new_presets:
	if repo in existing_repos:
	continue # skip duplicates
	preset = {
	"id": uuid.uuid4().hex[:8],
	"name": name,
	"repo": repo,
	"split": "train",
	}
	if vis_type == "model":
	preset["column"] = "response"
	elif vis_type == "rlm":
	preset["config"] = "rlm_call_traces"
	to_add[vis_type].append(preset)
	```

	### Step 4: Merge and upload to HF

	```python
	import tempfile, os
	from huggingface_hub import HfApi

	api = HfApi()

	# Merge new presets with existing
	final_model = existing_model + to_add["model"]
	final_arena = existing_arena + to_add["arena"]

	for vis_type, presets in [("model", final_model), ("arena", final_arena)]:
	if not presets:
	continue
	with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f:
	json.dump(presets, f, indent=2)
	tmp = f.name
	api.upload_file(
	path_or_fileobj=tmp,
	path_in_repo=f"{vis_type}_presets.json",
	repo_id=PRESETS_REPO,
	repo_type="dataset",
	)
	os.unlink(tmp)
	```

	### Step 5: Sync the deployed HF Space

	After uploading to the HF dataset, tell the running Space to re-download presets:

	```bash
	curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync"
	```

	This forces the Space to re-download all preset files from `AGG_VIS_PRESETS` without needing a restart or redeployment.

	### Step 6: Sync local preset files

	```python
	import shutil
	from huggingface_hub import hf_hub_download

	local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets"
	for vis_type in ["model", "arena", "rlm", "harbor"]:
	try:
	path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
	shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json")
	except Exception:
	pass
	```

	## Naming Convention

	Preset names follow this pattern to be descriptive and avoid future conflicts:

	```
	{Experiment} {Task} {Model} {Variant}
	```

	### Experiment prefixes
	- `SC` — Strategy Compliance
	- `Wing` — Wingdings Compliance

	### Model abbreviations
	- `K2-Inst` — Kimi-K2-Instruct (RLHF)
	- `K2-Think` — Kimi-K2-Thinking (RLVR)
	- `Q3-Inst` — Qwen3-Next-80B Instruct (RLHF)
	- `Q3-Think` — Qwen3-Next-80B Thinking (RLVR)

	### Task names
	- `Countdown` — 8-arg arithmetic countdown
	- `MuSR` — MuSR murder mysteries
	- `FrozenLake` — FrozenLake grid navigation

	### Variant names (strategy compliance only)
	- `TreeSearch` / `Baseline` / `Anti` — countdown tree search experiment
	- `CritFirst` / `Anti-CritFirst` — criterion-first cross-cutting analysis
	- `Counterfactual` / `Anti-Counterfactual` — counterfactual hypothesis testing
	- `BackChain` — backward chaining (FrozenLake)

	### Examples

	```
	SC Countdown K2-Inst TreeSearch # Strategy compliance, countdown, Kimi instruct, tree search variant
	SC MuSR Q3-Think Counterfactual # Strategy compliance, MuSR, Qwen thinking, counterfactual variant
	SC FrozenLake K2-Think BackChain # Strategy compliance, FrozenLake, Kimi thinking, backward chaining
	Wing Countdown Q3-Inst # Wingdings, countdown, Qwen instruct (no variant — wingdings has one condition)
	Wing MuSR K2-Think # Wingdings, MuSR, Kimi thinking
	```

	## Important Notes

	- Always check for existing repos before adding. The script above uses `existing_repos` set to skip duplicates.
	- The `column` field matters for model presets. Strategy compliance and wingdings datasets use `"response"` as the response column, not the default `"model_responses"`.
	- Local files are fallback cache. The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the `/api/presets/sync` endpoint).
	- Don't modify rlm or harbor presets unless adding datasets of those types. The script above only touches model and arena.