Spaces:

reasoning-degeneration-dev
/

agg-trace-visualizer

Running

App Files Files Community

Zayne Rea Sprague commited on 10 days ago

Commit

cdf803d

1 Parent(s): 8b41737

small tweak

Browse files

Files changed (2) hide show

backend/api/model_datasets.py +8 -2
docs/managing_presets.md +194 -0

backend/api/model_datasets.py CHANGED Viewed

@@ -242,9 +242,15 @@ def get_question(ds_id, idx):
             prompt_text = str(val)
     question = ""
-    for qcol in ["question", "prompt", "input", "formatted_prompt"]:
         if qcol in row:
-            question = row[qcol] or ""
             break
     eval_correct = []

             prompt_text = str(val)
     question = ""
+    for qcol in ["question", "prompt", "input", "problem", "formatted_prompt"]:
         if qcol in row:
+            val = row[qcol] or ""
+            if isinstance(val, str):
+                question = val
+            elif isinstance(val, list):
+                question = json.dumps(val)
+            else:
+                question = str(val)
             break
     eval_correct = []

docs/managing_presets.md ADDED Viewed

	@@ -0,0 +1,194 @@

+# Managing AGG_VIS_PRESETS Programmatically
+## Overview
+The agg_visualizer stores presets in the HuggingFace dataset repo `reasoning-degeneration-dev/AGG_VIS_PRESETS`. Each visualizer type has its own JSON file:
+| Type | File | Extra Fields |
+|------|------|-------------|
+| `model` | `model_presets.json` | `column` (default: `"model_responses"`) |
+| `arena` | `arena_presets.json` | none |
+| `rlm` | `rlm_presets.json` | `config` (default: `"rlm_call_traces"`) |
+| `harbor` | `harbor_presets.json` | none |
+## Preset Schema
+Every preset has these base fields:
+```json
+{
+  "id": "8-char hex",
+  "name": "Human-readable name",
+  "repo": "org/dataset-name",
+  "split": "train"
+}
+```
+Plus type-specific fields listed above.
+## How to Add Presets from Experiment Markdown Files
+### Step 1: Identify repos and their visualizer type
+Read the experiment markdown file(s) and extract all HuggingFace repo links. Categorize each:
+- **Countdown / MuSR datasets** (model response traces) → `model` type, set `column: "response"`
+- **FrozenLake / arena datasets** (game episodes) → `arena` type
+- **Harbor / SWE-bench datasets** → `harbor` type
+- **RLM call traces** → `rlm` type, set `config: "rlm_call_traces"`
+### Step 2: Download existing presets from HF
+```python
+from huggingface_hub import hf_hub_download
+import json
+PRESETS_REPO = "reasoning-degeneration-dev/AGG_VIS_PRESETS"
+def load_hf_presets(vis_type):
+    try:
+        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
+        with open(path) as f:
+            return json.load(f)
+    except Exception:
+        return []
+existing_model = load_hf_presets("model")
+existing_arena = load_hf_presets("arena")
+# ... etc for rlm, harbor
+# Build set of repos already present
+existing_repos = set()
+for presets_list in [existing_model, existing_arena]:
+    for p in presets_list:
+        existing_repos.add(p["repo"])
+```
+### Step 3: Build new presets, skipping duplicates
+```python
+import uuid
+new_presets = []  # list of (vis_type, name, repo)
+# Example: adding strategy compliance countdown presets
+new_presets.append(("model", "SC Countdown K2-Inst TreeSearch",
+    "reasoning-degeneration-dev/t1-strategy-countdown-treesearch-kimi-k2-instruct-kimi-inst"))
+# ... add all repos from the markdown ...
+# Filter out existing
+to_add = {"model": [], "arena": [], "rlm": [], "harbor": []}
+for vis_type, name, repo in new_presets:
+    if repo in existing_repos:
+        continue  # skip duplicates
+    preset = {
+        "id": uuid.uuid4().hex[:8],
+        "name": name,
+        "repo": repo,
+        "split": "train",
+    }
+    if vis_type == "model":
+        preset["column"] = "response"
+    elif vis_type == "rlm":
+        preset["config"] = "rlm_call_traces"
+    to_add[vis_type].append(preset)
+```
+### Step 4: Merge and upload to HF
+```python
+import tempfile, os
+from huggingface_hub import HfApi
+api = HfApi()
+# Merge new presets with existing
+final_model = existing_model + to_add["model"]
+final_arena = existing_arena + to_add["arena"]
+for vis_type, presets in [("model", final_model), ("arena", final_arena)]:
+    if not presets:
+        continue
+    with tempfile.NamedTemporaryFile("w", suffix=".json", delete=False) as f:
+        json.dump(presets, f, indent=2)
+        tmp = f.name
+    api.upload_file(
+        path_or_fileobj=tmp,
+        path_in_repo=f"{vis_type}_presets.json",
+        repo_id=PRESETS_REPO,
+        repo_type="dataset",
+    )
+    os.unlink(tmp)
+```
+### Step 5: Sync the deployed HF Space
+After uploading to the HF dataset, tell the running Space to re-download presets:
+```bash
+curl -X POST "https://reasoning-degeneration-dev-agg-trace-visualizer.hf.space/api/presets/sync"
+```
+This forces the Space to re-download all preset files from `AGG_VIS_PRESETS` without needing a restart or redeployment.
+### Step 6: Sync local preset files
+```python
+import shutil
+from huggingface_hub import hf_hub_download
+local_dir = "/Users/rs2020/Research/tools/visualizers/agg_visualizer/backend/presets"
+for vis_type in ["model", "arena", "rlm", "harbor"]:
+    try:
+        path = hf_hub_download(PRESETS_REPO, f"{vis_type}_presets.json", repo_type="dataset")
+        shutil.copy2(path, f"{local_dir}/{vis_type}_presets.json")
+    except Exception:
+        pass
+```
+## Naming Convention
+Preset names follow this pattern to be descriptive and avoid future conflicts:
+```
+{Experiment} {Task} {Model} {Variant}
+```
+### Experiment prefixes
+- `SC` — Strategy Compliance
+- `Wing` — Wingdings Compliance
+### Model abbreviations
+- `K2-Inst` — Kimi-K2-Instruct (RLHF)
+- `K2-Think` — Kimi-K2-Thinking (RLVR)
+- `Q3-Inst` — Qwen3-Next-80B Instruct (RLHF)
+- `Q3-Think` — Qwen3-Next-80B Thinking (RLVR)
+### Task names
+- `Countdown` — 8-arg arithmetic countdown
+- `MuSR` — MuSR murder mysteries
+- `FrozenLake` — FrozenLake grid navigation
+### Variant names (strategy compliance only)
+- `TreeSearch` / `Baseline` / `Anti` — countdown tree search experiment
+- `CritFirst` / `Anti-CritFirst` — criterion-first cross-cutting analysis
+- `Counterfactual` / `Anti-Counterfactual` — counterfactual hypothesis testing
+- `BackChain` — backward chaining (FrozenLake)
+### Examples
+```
+SC Countdown K2-Inst TreeSearch       # Strategy compliance, countdown, Kimi instruct, tree search variant
+SC MuSR Q3-Think Counterfactual       # Strategy compliance, MuSR, Qwen thinking, counterfactual variant
+SC FrozenLake K2-Think BackChain      # Strategy compliance, FrozenLake, Kimi thinking, backward chaining
+Wing Countdown Q3-Inst                # Wingdings, countdown, Qwen instruct (no variant — wingdings has one condition)
+Wing MuSR K2-Think                    # Wingdings, MuSR, Kimi thinking
+```
+## Important Notes
+- **Always check for existing repos** before adding. The script above uses `existing_repos` set to skip duplicates.
+- **The `column` field matters for model presets.** Strategy compliance and wingdings datasets use `"response"` as the response column, not the default `"model_responses"`.
+- **Local files are fallback cache.** The agg_visualizer downloads from HF on startup and caches locally. After uploading to HF, sync the local files so the running app picks up changes without restart (or hit the `/api/presets/sync` endpoint).
+- **Don't modify rlm or harbor presets** unless adding datasets of those types. The script above only touches model and arena.