File size: 6,428 Bytes
cdf803d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
# Managing AGG_VIS_PRESETS Programmatically

## Overview

The agg_visualizer stores presets in the HuggingFace dataset repo `reasoning-degeneration-dev/AGG_VIS_PRESETS`. Each visualizer type has its own JSON file:

| Type | File | Extra Fields |
|------|------|-------------|
| `model` | `model_presets.json` | `column` (default: `"model_responses"`) |
| `arena` | `arena_presets.json` | none |
| `rlm` | `rlm_presets.json` | `config` (default: `"rlm_call_traces"`) |
| `harbor` | `harbor_presets.json` | none |

## Preset Schema

Every preset has these base fields:

```json
{
  "id": "8-char hex",
  "name": "Human-readable name",
  "repo": "org/dataset-name",
  "split": "train"
}
```

Plus type-specific fields listed above.

## How to Add Presets from Experiment Markdown Files

### Step 1: Identify repos and their visualizer type

Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each:

- **Countdown / MuSR datasets** (model response traces) β†’ `model` type, set `column: "response"`
- **FrozenLake / arena datasets** (game episodes) β†’ `arena` type
- **Harbor / SWE-bench datasets** β†’ `harbor` type
- **RLM call traces** β†’ `rlm` type, set `config: "rlm_call_traces"`

### Step 2: Download existing presets from HF

```python
from huggingface_hub import hf_hub_download
import json

PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS"

def load_hf_presets(vis_type):
    try:
        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
        with open(path) as f:
            return json.load(f)
    except Exception:
        return []

existing_model = load_hf_presets("model")
existing_arena = load_hf_presets("arena")
# ... etc for rlm, harbor

# Build set of repos already present
existing_repos = set()
for presets_list in [existing_model, existing_arena]:
    for p in presets_list:
        existing_repos.add(p["repo"])
```

### Step 3: Build new presets, skipping duplicates

```python
import uuid

new_presets = []  # list of (vis_type, name, repo)

# Example: adding strategy compliance countdown presets
new_presets.append(("model", "SC Countdown K2-Inst TreeSearch",
    "reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst"))

# ... add all repos from the markdown ...

# Filter out existing
to_add = {"model": [], "arena": [], "rlm": [], "harbor": []}
for vis_type, name, repo in new_presets:
    if repo in existing_repos:
        continue  # skip duplicates
    preset = {
        "id": uuid.uuid4().hex[:8],
        "name": name,
        "repo": repo,
        "split": "train",
    }
    if vis_type == "model":
        preset["column"] = "response"
    elif vis_type == "rlm":
        preset["config"] = "rlm_call_traces"
    to_add[vis_type].append(preset)
```

### Step 4: Merge and upload to HF

```python
import tempfile, os
from huggingface_hub import HfApi

api = HfApi()

# Merge new presets with existing
final_model = existing_model + to_add["model"]
final_arena = existing_arena + to_add["arena"]

for vis_type, presets in [("model", final_model), ("arena", final_arena)]:
    if not presets:
        continue
    with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f:
        json.dump(presets, f, indent=2)
        tmp = f.name
    api.upload_file(
        path_or_fileobj=tmp,
        path_in_repo=f"{vis_type}_presets.json",
        repo_id=PRESETS_REPO,
        repo_type="dataset",
    )
    os.unlink(tmp)
```

### Step 5: Sync the deployed HF Space

After uploading to the HF dataset, tell the running Space to re-download presets:

```bash
curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync"
```

This forces the Space to re-download all preset files from `AGG_VIS_PRESETS` without needing a restart or redeployment.

### Step 6: Sync local preset files

```python
import shutil
from huggingface_hub import hf_hub_download

local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets"
for vis_type in ["model", "arena", "rlm", "harbor"]:
    try:
        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
        shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json")
    except Exception:
        pass
```

## Naming Convention

Preset names follow this pattern to be descriptive and avoid future conflicts:

```
{Experiment} {Task} {Model} {Variant}
```

### Experiment prefixes
- `SC` β€” Strategy Compliance
- `Wing` β€” Wingdings Compliance

### Model abbreviations
- `K2-Inst` β€” Kimi-K2-Instruct (RLHF)
- `K2-Think` β€” Kimi-K2-Thinking (RLVR)
- `Q3-Inst` β€” Qwen3-Next-80B Instruct (RLHF)
- `Q3-Think` β€” Qwen3-Next-80B Thinking (RLVR)

### Task names
- `Countdown` β€” 8-arg arithmetic countdown
- `MuSR` β€” MuSR murder mysteries
- `FrozenLake` β€” FrozenLake grid navigation

### Variant names (strategy compliance only)
- `TreeSearch` / `Baseline` / `Anti` β€” countdown tree search experiment
- `CritFirst` / `Anti-CritFirst` β€” criterion-first cross-cutting analysis
- `Counterfactual` / `Anti-Counterfactual` β€” counterfactual hypothesis testing
- `BackChain` β€” backward chaining (FrozenLake)

### Examples

```
SC Countdown K2-Inst TreeSearch       # Strategy compliance, countdown, Kimi instruct, tree search variant
SC MuSR Q3-Think Counterfactual       # Strategy compliance, MuSR, Qwen thinking, counterfactual variant
SC FrozenLake K2-Think BackChain      # Strategy compliance, FrozenLake, Kimi thinking, backward chaining
Wing Countdown Q3-Inst                # Wingdings, countdown, Qwen instruct (no variant β€” wingdings has one condition)
Wing MuSR K2-Think                    # Wingdings, MuSR, Kimi thinking
```

## Important Notes

- **Always check for existing repos** before adding. The script above uses `existing_repos` set to skip duplicates.
- **The `column` field matters for model presets.** Strategy compliance and wingdings datasets use `"response"` as the response column, not the default `"model_responses"`.
- **Local files are fallback cache.** The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the `/api/presets/sync` endpoint).
- **Don't modify rlm or harbor presets** unless adding datasets of those types. The script above only touches model and arena.