Spaces:
Running
Running
feat: V₀=0.5 baseline + det ceilings, prompt-pill UI, 60 experiment rollouts
Browse filesThis view is limited to 50 files because it contains too many changes. See raw diff
- dashboard/backend/app.py +5 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__api_returns_500__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__api_returns_500__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__async_await_runtime__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__async_await_runtime__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__big_o_notation__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__big_o_notation__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__cap_theorem_one_example__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__cap_theorem_one_example__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__closures_what_they_capture__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__closures_what_they_capture__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__cors_error__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__cors_error__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__css_centering_flexbox__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__css_centering_flexbox__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__docker_container_exits__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__docker_container_exits__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__garbage_collection_mark_sweep__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__garbage_collection_mark_sweep__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__git_committed_to_main__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__git_committed_to_main__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__git_merge_conflict_markers__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__git_merge_conflict_markers__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__halting_problem__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__halting_problem__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__hashmap_average_o1__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__hashmap_average_o1__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__intro_python_hello_world__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__intro_python_hello_world__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__js_undefined_react_data__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__js_undefined_react_data__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__npm_install_fails__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__npm_install_fails__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__p_vs_np_plain_english__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__p_vs_np_plain_english__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__password_hashing_vs_encryption__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__password_hashing_vs_encryption__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__pointers_vs_references__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__pointers_vs_references__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__postgres_syntax_error__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__postgres_syntax_error__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__python_circular_import__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__python_circular_import__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__python_keyerror_dict__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__python_keyerror_dict__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__python_script_slow__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__python_script_slow__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__race_conditions_counter__gpt-5.4-mini/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__race_conditions_counter__gpt-5.4-nano/results.jsonl +0 -0
- dashboard/backend/seed_runs/batch_realtime_round1__cs__recursion_base_cases__gpt-5.4-mini/results.jsonl +0 -0
dashboard/backend/app.py
CHANGED
|
@@ -162,6 +162,11 @@ def get_run(run_id: str) -> dict[str, Any]:
|
|
| 162 |
"reward": rec.get("reward"),
|
| 163 |
"judge_breakdown": rec.get("judge_breakdown"),
|
| 164 |
"trajectory": trajectory,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
"metrics": rec.get("metrics"),
|
| 166 |
"stop_condition": rec.get("stop_condition"),
|
| 167 |
"is_completed": rec.get("is_completed"),
|
|
|
|
| 162 |
"reward": rec.get("reward"),
|
| 163 |
"judge_breakdown": rec.get("judge_breakdown"),
|
| 164 |
"trajectory": trajectory,
|
| 165 |
+
# Teacher system prompt — surface BOTH the registry name (for the
|
| 166 |
+
# title pill) and the raw text (for the expandable markdown view).
|
| 167 |
+
# Empty / "default" both render as the unexpandable "Default" chip.
|
| 168 |
+
"tutor_system_prompt": info.get("tutor_system_prompt", "") or "",
|
| 169 |
+
"tutor_system_prompt_name": info.get("tutor_system_prompt_name") or "default",
|
| 170 |
"metrics": rec.get("metrics"),
|
| 171 |
"stop_condition": rec.get("stop_condition"),
|
| 172 |
"is_completed": rec.get("is_completed"),
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__api_returns_500__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__api_returns_500__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__async_await_runtime__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__async_await_runtime__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__big_o_notation__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__big_o_notation__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__cap_theorem_one_example__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__cap_theorem_one_example__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__closures_what_they_capture__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__closures_what_they_capture__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__cors_error__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__cors_error__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__css_centering_flexbox__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__css_centering_flexbox__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__docker_container_exits__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__docker_container_exits__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__garbage_collection_mark_sweep__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__garbage_collection_mark_sweep__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__git_committed_to_main__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__git_committed_to_main__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__git_merge_conflict_markers__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__git_merge_conflict_markers__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__halting_problem__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__halting_problem__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__hashmap_average_o1__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__hashmap_average_o1__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__intro_python_hello_world__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__intro_python_hello_world__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__js_undefined_react_data__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__js_undefined_react_data__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__npm_install_fails__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__npm_install_fails__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__p_vs_np_plain_english__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__p_vs_np_plain_english__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__password_hashing_vs_encryption__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__password_hashing_vs_encryption__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__pointers_vs_references__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__pointers_vs_references__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__postgres_syntax_error__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__postgres_syntax_error__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_circular_import__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_circular_import__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_keyerror_dict__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_keyerror_dict__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_script_slow__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_script_slow__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__race_conditions_counter__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__race_conditions_counter__gpt-5.4-nano/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dashboard/backend/seed_runs/batch_realtime_round1__cs__recursion_base_cases__gpt-5.4-mini/results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|