Zayne Rea Sprague commited on
Commit
cdf803d
Β·
1 Parent(s): 8b41737

small tweak

Browse files
backend/api/model_datasets.py CHANGED
@@ -242,9 +242,15 @@ def get_question(ds_id, idx):
242
  prompt_text = str(val)
243
 
244
  question = ""
245
- for qcol in ["question", "prompt", "input", "formatted_prompt"]:
246
  if qcol in row:
247
- question = row[qcol] or ""
 
 
 
 
 
 
248
  break
249
 
250
  eval_correct = []
 
242
  prompt_text = str(val)
243
 
244
  question = ""
245
+ for qcol in ["question", "prompt", "input", "problem", "formatted_prompt"]:
246
  if qcol in row:
247
+ val = row[qcol] or ""
248
+ if isinstance(val, str):
249
+ question = val
250
+ elif isinstance(val, list):
251
+ question = json.dumps(val)
252
+ else:
253
+ question = str(val)
254
  break
255
 
256
  eval_correct = []
docs/managing_presets.md ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Managing AGG_VIS_PRESETS Programmatically
2
+
3
+ ## Overview
4
+
5
+ The agg_visualizer stores presets in the HuggingFace dataset repo `reasoning-degeneration-dev/AGG_VIS_PRESETS`. Each visualizer type has its own JSON file:
6
+
7
+ | Type | File | Extra Fields |
8
+ |------|------|-------------|
9
+ | `model` | `model_presets.json` | `column` (default: `"model_responses"`) |
10
+ | `arena` | `arena_presets.json` | none |
11
+ | `rlm` | `rlm_presets.json` | `config` (default: `"rlm_call_traces"`) |
12
+ | `harbor` | `harbor_presets.json` | none |
13
+
14
+ ## Preset Schema
15
+
16
+ Every preset has these base fields:
17
+
18
+ ```json
19
+ {
20
+ "id": "8-char hex",
21
+ "name": "Human-readable name",
22
+ "repo": "org/dataset-name",
23
+ "split": "train"
24
+ }
25
+ ```
26
+
27
+ Plus type-specific fields listed above.
28
+
29
+ ## How to Add Presets from Experiment Markdown Files
30
+
31
+ ### Step 1: Identify repos and their visualizer type
32
+
33
+ Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each:
34
+
35
+ - **Countdown / MuSR datasets** (model response traces) β†’ `model` type, set `column: "response"`
36
+ - **FrozenLake / arena datasets** (game episodes) β†’ `arena` type
37
+ - **Harbor / SWE-bench datasets** β†’ `harbor` type
38
+ - **RLM call traces** β†’ `rlm` type, set `config: "rlm_call_traces"`
39
+
40
+ ### Step 2: Download existing presets from HF
41
+
42
+ ```python
43
+ from huggingface_hub import hf_hub_download
44
+ import json
45
+
46
+ PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS"
47
+
48
+ def load_hf_presets(vis_type):
49
+ try:
50
+ path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
51
+ with open(path) as f:
52
+ return json.load(f)
53
+ except Exception:
54
+ return []
55
+
56
+ existing_model = load_hf_presets("model")
57
+ existing_arena = load_hf_presets("arena")
58
+ # ... etc for rlm, harbor
59
+
60
+ # Build set of repos already present
61
+ existing_repos = set()
62
+ for presets_list in [existing_model, existing_arena]:
63
+ for p in presets_list:
64
+ existing_repos.add(p["repo"])
65
+ ```
66
+
67
+ ### Step 3: Build new presets, skipping duplicates
68
+
69
+ ```python
70
+ import uuid
71
+
72
+ new_presets = [] # list of (vis_type, name, repo)
73
+
74
+ # Example: adding strategy compliance countdown presets
75
+ new_presets.append(("model", "SC Countdown K2-Inst TreeSearch",
76
+ "reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst"))
77
+
78
+ # ... add all repos from the markdown ...
79
+
80
+ # Filter out existing
81
+ to_add = {"model": [], "arena": [], "rlm": [], "harbor": []}
82
+ for vis_type, name, repo in new_presets:
83
+ if repo in existing_repos:
84
+ continue # skip duplicates
85
+ preset = {
86
+ "id": uuid.uuid4().hex[:8],
87
+ "name": name,
88
+ "repo": repo,
89
+ "split": "train",
90
+ }
91
+ if vis_type == "model":
92
+ preset["column"] = "response"
93
+ elif vis_type == "rlm":
94
+ preset["config"] = "rlm_call_traces"
95
+ to_add[vis_type].append(preset)
96
+ ```
97
+
98
+ ### Step 4: Merge and upload to HF
99
+
100
+ ```python
101
+ import tempfile, os
102
+ from huggingface_hub import HfApi
103
+
104
+ api = HfApi()
105
+
106
+ # Merge new presets with existing
107
+ final_model = existing_model + to_add["model"]
108
+ final_arena = existing_arena + to_add["arena"]
109
+
110
+ for vis_type, presets in [("model", final_model), ("arena", final_arena)]:
111
+ if not presets:
112
+ continue
113
+ with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f:
114
+ json.dump(presets, f, indent=2)
115
+ tmp = f.name
116
+ api.upload_file(
117
+ path_or_fileobj=tmp,
118
+ path_in_repo=f"{vis_type}_presets.json",
119
+ repo_id=PRESETS_REPO,
120
+ repo_type="dataset",
121
+ )
122
+ os.unlink(tmp)
123
+ ```
124
+
125
+ ### Step 5: Sync the deployed HF Space
126
+
127
+ After uploading to the HF dataset, tell the running Space to re-download presets:
128
+
129
+ ```bash
130
+ curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync"
131
+ ```
132
+
133
+ This forces the Space to re-download all preset files from `AGG_VIS_PRESETS` without needing a restart or redeployment.
134
+
135
+ ### Step 6: Sync local preset files
136
+
137
+ ```python
138
+ import shutil
139
+ from huggingface_hub import hf_hub_download
140
+
141
+ local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets"
142
+ for vis_type in ["model", "arena", "rlm", "harbor"]:
143
+ try:
144
+ path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
145
+ shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json")
146
+ except Exception:
147
+ pass
148
+ ```
149
+
150
+ ## Naming Convention
151
+
152
+ Preset names follow this pattern to be descriptive and avoid future conflicts:
153
+
154
+ ```
155
+ {Experiment} {Task} {Model} {Variant}
156
+ ```
157
+
158
+ ### Experiment prefixes
159
+ - `SC` β€” Strategy Compliance
160
+ - `Wing` β€” Wingdings Compliance
161
+
162
+ ### Model abbreviations
163
+ - `K2-Inst` β€” Kimi-K2-Instruct (RLHF)
164
+ - `K2-Think` β€” Kimi-K2-Thinking (RLVR)
165
+ - `Q3-Inst` β€” Qwen3-Next-80B Instruct (RLHF)
166
+ - `Q3-Think` β€” Qwen3-Next-80B Thinking (RLVR)
167
+
168
+ ### Task names
169
+ - `Countdown` β€” 8-arg arithmetic countdown
170
+ - `MuSR` β€” MuSR murder mysteries
171
+ - `FrozenLake` β€” FrozenLake grid navigation
172
+
173
+ ### Variant names (strategy compliance only)
174
+ - `TreeSearch` / `Baseline` / `Anti` β€” countdown tree search experiment
175
+ - `CritFirst` / `Anti-CritFirst` β€” criterion-first cross-cutting analysis
176
+ - `Counterfactual` / `Anti-Counterfactual` β€” counterfactual hypothesis testing
177
+ - `BackChain` β€” backward chaining (FrozenLake)
178
+
179
+ ### Examples
180
+
181
+ ```
182
+ SC Countdown K2-Inst TreeSearch # Strategy compliance, countdown, Kimi instruct, tree search variant
183
+ SC MuSR Q3-Think Counterfactual # Strategy compliance, MuSR, Qwen thinking, counterfactual variant
184
+ SC FrozenLake K2-Think BackChain # Strategy compliance, FrozenLake, Kimi thinking, backward chaining
185
+ Wing Countdown Q3-Inst # Wingdings, countdown, Qwen instruct (no variant β€” wingdings has one condition)
186
+ Wing MuSR K2-Think # Wingdings, MuSR, Kimi thinking
187
+ ```
188
+
189
+ ## Important Notes
190
+
191
+ - **Always check for existing repos** before adding. The script above uses `existing_repos` set to skip duplicates.
192
+ - **The `column` field matters for model presets.** Strategy compliance and wingdings datasets use `"response"` as the response column, not the default `"model_responses"`.
193
+ - **Local files are fallback cache.** The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the `/api/presets/sync` endpoint).
194
+ - **Don't modify rlm or harbor presets** unless adding datasets of those types. The script above only touches model and arena.