Spaces:

reasoning-degeneration-dev
/

agg-trace-visualizer

Running

File size: 6,428 Bytes

cdf803d

# Managing AGG_VIS_PRESETS Programmatically

## Overview

The agg_visualizer stores presets in the HuggingFace dataset repo `reasoning-degeneration-dev/AGG_VIS_PRESETS`. Each visualizer type has its own JSON file:

| Type | File | Extra Fields |
|------|------|-------------|
| `model` | `model_presets.json` | `column` (default: `"model_responses"`) |
| `arena` | `arena_presets.json` | none |
| `rlm` | `rlm_presets.json` | `config` (default: `"rlm_call_traces"`) |
| `harbor` | `harbor_presets.json` | none |

## Preset Schema

Every preset has these base fields:

```json
{
  "id": "8-char hex",
  "name": "Human-readable name",
  "repo": "org/dataset-name",
  "split": "train"
}
```

Plus type-specific fields listed above.

## How to Add Presets from Experiment Markdown Files

### Step 1: Identify repos and their visualizer type

Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each:

- **Countdown / MuSR datasets** (model response traces) → `model` type, set `column: "response"`
- **FrozenLake / arena datasets** (game episodes) → `arena` type
- **Harbor / SWE-bench datasets** → `harbor` type
- **RLM call traces** → `rlm` type, set `config: "rlm_call_traces"`

### Step 2: Download existing presets from HF

```python
from huggingface_hub import hf_hub_download
import json

PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS"

def load_hf_presets(vis_type):
    try:
        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
        with open(path) as f:
            return json.load(f)
    except Exception:
        return []

existing_model = load_hf_presets("model")
existing_arena = load_hf_presets("arena")
# ... etc for rlm, harbor

# Build set of repos already present
existing_repos = set()
for presets_list in [existing_model, existing_arena]:
    for p in presets_list:
        existing_repos.add(p["repo"])
```

### Step 3: Build new presets, skipping duplicates

```python
import uuid

new_presets = []  # list of (vis_type, name, repo)

# Example: adding strategy compliance countdown presets
new_presets.append(("model", "SC Countdown K2-Inst TreeSearch",
    "reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst"))

# ... add all repos from the markdown ...

# Filter out existing
to_add = {"model": [], "arena": [], "rlm": [], "harbor": []}
for vis_type, name, repo in new_presets:
    if repo in existing_repos:
        continue  # skip duplicates
    preset = {
        "id": uuid.uuid4().hex[:8],
        "name": name,
        "repo": repo,
        "split": "train",
    }
    if vis_type == "model":
        preset["column"] = "response"
    elif vis_type == "rlm":
        preset["config"] = "rlm_call_traces"
    to_add[vis_type].append(preset)
```

### Step 4: Merge and upload to HF

```python
import tempfile, os
from huggingface_hub import HfApi

api = HfApi()

# Merge new presets with existing
final_model = existing_model + to_add["model"]
final_arena = existing_arena + to_add["arena"]

for vis_type, presets in [("model", final_model), ("arena", final_arena)]:
    if not presets:
        continue
    with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f:
        json.dump(presets, f, indent=2)
        tmp = f.name
    api.upload_file(
        path_or_fileobj=tmp,
        path_in_repo=f"{vis_type}_presets.json",
        repo_id=PRESETS_REPO,
        repo_type="dataset",
    )
    os.unlink(tmp)
```

### Step 5: Sync the deployed HF Space

After uploading to the HF dataset, tell the running Space to re-download presets:

```bash
curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync"
```

This forces the Space to re-download all preset files from `AGG_VIS_PRESETS` without needing a restart or redeployment.

### Step 6: Sync local preset files

```python
import shutil
from huggingface_hub import hf_hub_download

local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets"
for vis_type in ["model", "arena", "rlm", "harbor"]:
    try:
        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
        shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json")
    except Exception:
        pass
```

## Naming Convention

Preset names follow this pattern to be descriptive and avoid future conflicts:

```
{Experiment} {Task} {Model} {Variant}
```

### Experiment prefixes
- `SC` — Strategy Compliance
- `Wing` — Wingdings Compliance

### Model abbreviations
- `K2-Inst` — Kimi-K2-Instruct (RLHF)
- `K2-Think` — Kimi-K2-Thinking (RLVR)
- `Q3-Inst` — Qwen3-Next-80B Instruct (RLHF)
- `Q3-Think` — Qwen3-Next-80B Thinking (RLVR)

### Task names
- `Countdown` — 8-arg arithmetic countdown
- `MuSR` — MuSR murder mysteries
- `FrozenLake` — FrozenLake grid navigation

### Variant names (strategy compliance only)
- `TreeSearch` / `Baseline` / `Anti` — countdown tree search experiment
- `CritFirst` / `Anti-CritFirst` — criterion-first cross-cutting analysis
- `Counterfactual` / `Anti-Counterfactual` — counterfactual hypothesis testing
- `BackChain` — backward chaining (FrozenLake)

### Examples

```
SC Countdown K2-Inst TreeSearch       # Strategy compliance, countdown, Kimi instruct, tree search variant
SC MuSR Q3-Think Counterfactual       # Strategy compliance, MuSR, Qwen thinking, counterfactual variant
SC FrozenLake K2-Think BackChain      # Strategy compliance, FrozenLake, Kimi thinking, backward chaining
Wing Countdown Q3-Inst                # Wingdings, countdown, Qwen instruct (no variant — wingdings has one condition)
Wing MuSR K2-Think                    # Wingdings, MuSR, Kimi thinking
```

## Important Notes

- **Always check for existing repos** before adding. The script above uses `existing_repos` set to skip duplicates.
- **The `column` field matters for model presets.** Strategy compliance and wingdings datasets use `"response"` as the response column, not the default `"model_responses"`.
- **Local files are fallback cache.** The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the `/api/presets/sync` endpoint).
- **Don't modify rlm or harbor presets** unless adding datasets of those types. The script above only touches model and arena.