| # Managing AGG_VIS_PRESETS Programmatically | |
| ## Overview | |
| The agg_visualizer stores presets in the HuggingFace dataset repo `reasoning-degeneration-dev/AGG_VIS_PRESETS`. Each visualizer type has its own JSON file: | |
| | Type | File | Extra Fields | | |
| |------|------|-------------| | |
| | `model` | `model_presets.json` | `column` (default: `"model_responses"`) | | |
| | `arena` | `arena_presets.json` | none | | |
| | `rlm` | `rlm_presets.json` | `config` (default: `"rlm_call_traces"`) | | |
| | `harbor` | `harbor_presets.json` | none | | |
| ## Preset Schema | |
| Every preset has these base fields: | |
| ```json | |
| { | |
| "id": "8-char hex", | |
| "name": "Human-readable name", | |
| "repo": "org/dataset-name", | |
| "split": "train" | |
| } | |
| ``` | |
| Plus type-specific fields listed above. | |
| ## How to Add Presets from Experiment Markdown Files | |
| ### Step 1: Identify repos and their visualizer type | |
| Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each: | |
| - **Countdown / MuSR datasets** (model response traces) β `model` type, set `column: "response"` | |
| - **FrozenLake / arena datasets** (game episodes) β `arena` type | |
| - **Harbor / SWE-bench datasets** β `harbor` type | |
| - **RLM call traces** β `rlm` type, set `config: "rlm_call_traces"` | |
| ### Step 2: Download existing presets from HF | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| import json | |
| PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS" | |
| def load_hf_presets(vis_type): | |
| try: | |
| path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset") | |
| with open(path) as f: | |
| return json.load(f) | |
| except Exception: | |
| return [] | |
| existing_model = load_hf_presets("model") | |
| existing_arena = load_hf_presets("arena") | |
| # ... etc for rlm, harbor | |
| # Build set of repos already present | |
| existing_repos = set() | |
| for presets_list in [existing_model, existing_arena]: | |
| for p in presets_list: | |
| existing_repos.add(p["repo"]) | |
| ``` | |
| ### Step 3: Build new presets, skipping duplicates | |
| ```python | |
| import uuid | |
| new_presets = [] # list of (vis_type, name, repo) | |
| # Example: adding strategy compliance countdown presets | |
| new_presets.append(("model", "SC Countdown K2-Inst TreeSearch", | |
| "reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst")) | |
| # ... add all repos from the markdown ... | |
| # Filter out existing | |
| to_add = {"model": [], "arena": [], "rlm": [], "harbor": []} | |
| for vis_type, name, repo in new_presets: | |
| if repo in existing_repos: | |
| continue # skip duplicates | |
| preset = { | |
| "id": uuid.uuid4().hex[:8], | |
| "name": name, | |
| "repo": repo, | |
| "split": "train", | |
| } | |
| if vis_type == "model": | |
| preset["column"] = "response" | |
| elif vis_type == "rlm": | |
| preset["config"] = "rlm_call_traces" | |
| to_add[vis_type].append(preset) | |
| ``` | |
| ### Step 4: Merge and upload to HF | |
| ```python | |
| import tempfile, os | |
| from huggingface_hub import HfApi | |
| api = HfApi() | |
| # Merge new presets with existing | |
| final_model = existing_model + to_add["model"] | |
| final_arena = existing_arena + to_add["arena"] | |
| for vis_type, presets in [("model", final_model), ("arena", final_arena)]: | |
| if not presets: | |
| continue | |
| with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f: | |
| json.dump(presets, f, indent=2) | |
| tmp = f.name | |
| api.upload_file( | |
| path_or_fileobj=tmp, | |
| path_in_repo=f"{vis_type}_presets.json", | |
| repo_id=PRESETS_REPO, | |
| repo_type="dataset", | |
| ) | |
| os.unlink(tmp) | |
| ``` | |
| ### Step 5: Sync the deployed HF Space | |
| After uploading to the HF dataset, tell the running Space to re-download presets: | |
| ```bash | |
| curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync" | |
| ``` | |
| This forces the Space to re-download all preset files from `AGG_VIS_PRESETS` without needing a restart or redeployment. | |
| ### Step 6: Sync local preset files | |
| ```python | |
| import shutil | |
| from huggingface_hub import hf_hub_download | |
| local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets" | |
| for vis_type in ["model", "arena", "rlm", "harbor"]: | |
| try: | |
| path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset") | |
| shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json") | |
| except Exception: | |
| pass | |
| ``` | |
| ## Naming Convention | |
| Preset names follow this pattern to be descriptive and avoid future conflicts: | |
| ``` | |
| {Experiment} {Task} {Model} {Variant} | |
| ``` | |
| ### Experiment prefixes | |
| - `SC` β Strategy Compliance | |
| - `Wing` β Wingdings Compliance | |
| ### Model abbreviations | |
| - `K2-Inst` β Kimi-K2-Instruct (RLHF) | |
| - `K2-Think` β Kimi-K2-Thinking (RLVR) | |
| - `Q3-Inst` β Qwen3-Next-80B Instruct (RLHF) | |
| - `Q3-Think` β Qwen3-Next-80B Thinking (RLVR) | |
| ### Task names | |
| - `Countdown` β 8-arg arithmetic countdown | |
| - `MuSR` β MuSR murder mysteries | |
| - `FrozenLake` β FrozenLake grid navigation | |
| ### Variant names (strategy compliance only) | |
| - `TreeSearch` / `Baseline` / `Anti` β countdown tree search experiment | |
| - `CritFirst` / `Anti-CritFirst` β criterion-first cross-cutting analysis | |
| - `Counterfactual` / `Anti-Counterfactual` β counterfactual hypothesis testing | |
| - `BackChain` β backward chaining (FrozenLake) | |
| ### Examples | |
| ``` | |
| SC Countdown K2-Inst TreeSearch # Strategy compliance, countdown, Kimi instruct, tree search variant | |
| SC MuSR Q3-Think Counterfactual # Strategy compliance, MuSR, Qwen thinking, counterfactual variant | |
| SC FrozenLake K2-Think BackChain # Strategy compliance, FrozenLake, Kimi thinking, backward chaining | |
| Wing Countdown Q3-Inst # Wingdings, countdown, Qwen instruct (no variant β wingdings has one condition) | |
| Wing MuSR K2-Think # Wingdings, MuSR, Kimi thinking | |
| ``` | |
| ## Important Notes | |
| - **Always check for existing repos** before adding. The script above uses `existing_repos` set to skip duplicates. | |
| - **The `column` field matters for model presets.** Strategy compliance and wingdings datasets use `"response"` as the response column, not the default `"model_responses"`. | |
| - **Local files are fallback cache.** The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the `/api/presets/sync` endpoint). | |
| - **Don't modify rlm or harbor presets** unless adding datasets of those types. The script above only touches model and arena. | |