TheUnicat commited on
Commit
2cd2802
·
verified ·
1 Parent(s): e7cdebc

feat: V₀=0.5 baseline + det ceilings, prompt-pill UI, 60 experiment rollouts

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. dashboard/backend/app.py +5 -0
  2. dashboard/backend/seed_runs/batch_realtime_round1__cs__api_returns_500__gpt-5.4-mini/results.jsonl +0 -0
  3. dashboard/backend/seed_runs/batch_realtime_round1__cs__api_returns_500__gpt-5.4-nano/results.jsonl +0 -0
  4. dashboard/backend/seed_runs/batch_realtime_round1__cs__async_await_runtime__gpt-5.4-mini/results.jsonl +0 -0
  5. dashboard/backend/seed_runs/batch_realtime_round1__cs__async_await_runtime__gpt-5.4-nano/results.jsonl +0 -0
  6. dashboard/backend/seed_runs/batch_realtime_round1__cs__big_o_notation__gpt-5.4-mini/results.jsonl +0 -0
  7. dashboard/backend/seed_runs/batch_realtime_round1__cs__big_o_notation__gpt-5.4-nano/results.jsonl +0 -0
  8. dashboard/backend/seed_runs/batch_realtime_round1__cs__cap_theorem_one_example__gpt-5.4-mini/results.jsonl +0 -0
  9. dashboard/backend/seed_runs/batch_realtime_round1__cs__cap_theorem_one_example__gpt-5.4-nano/results.jsonl +0 -0
  10. dashboard/backend/seed_runs/batch_realtime_round1__cs__closures_what_they_capture__gpt-5.4-mini/results.jsonl +0 -0
  11. dashboard/backend/seed_runs/batch_realtime_round1__cs__closures_what_they_capture__gpt-5.4-nano/results.jsonl +0 -0
  12. dashboard/backend/seed_runs/batch_realtime_round1__cs__cors_error__gpt-5.4-mini/results.jsonl +0 -0
  13. dashboard/backend/seed_runs/batch_realtime_round1__cs__cors_error__gpt-5.4-nano/results.jsonl +0 -0
  14. dashboard/backend/seed_runs/batch_realtime_round1__cs__css_centering_flexbox__gpt-5.4-mini/results.jsonl +0 -0
  15. dashboard/backend/seed_runs/batch_realtime_round1__cs__css_centering_flexbox__gpt-5.4-nano/results.jsonl +0 -0
  16. dashboard/backend/seed_runs/batch_realtime_round1__cs__docker_container_exits__gpt-5.4-mini/results.jsonl +0 -0
  17. dashboard/backend/seed_runs/batch_realtime_round1__cs__docker_container_exits__gpt-5.4-nano/results.jsonl +0 -0
  18. dashboard/backend/seed_runs/batch_realtime_round1__cs__garbage_collection_mark_sweep__gpt-5.4-mini/results.jsonl +0 -0
  19. dashboard/backend/seed_runs/batch_realtime_round1__cs__garbage_collection_mark_sweep__gpt-5.4-nano/results.jsonl +0 -0
  20. dashboard/backend/seed_runs/batch_realtime_round1__cs__git_committed_to_main__gpt-5.4-mini/results.jsonl +0 -0
  21. dashboard/backend/seed_runs/batch_realtime_round1__cs__git_committed_to_main__gpt-5.4-nano/results.jsonl +0 -0
  22. dashboard/backend/seed_runs/batch_realtime_round1__cs__git_merge_conflict_markers__gpt-5.4-mini/results.jsonl +0 -0
  23. dashboard/backend/seed_runs/batch_realtime_round1__cs__git_merge_conflict_markers__gpt-5.4-nano/results.jsonl +0 -0
  24. dashboard/backend/seed_runs/batch_realtime_round1__cs__halting_problem__gpt-5.4-mini/results.jsonl +0 -0
  25. dashboard/backend/seed_runs/batch_realtime_round1__cs__halting_problem__gpt-5.4-nano/results.jsonl +0 -0
  26. dashboard/backend/seed_runs/batch_realtime_round1__cs__hashmap_average_o1__gpt-5.4-mini/results.jsonl +0 -0
  27. dashboard/backend/seed_runs/batch_realtime_round1__cs__hashmap_average_o1__gpt-5.4-nano/results.jsonl +0 -0
  28. dashboard/backend/seed_runs/batch_realtime_round1__cs__intro_python_hello_world__gpt-5.4-mini/results.jsonl +0 -0
  29. dashboard/backend/seed_runs/batch_realtime_round1__cs__intro_python_hello_world__gpt-5.4-nano/results.jsonl +0 -0
  30. dashboard/backend/seed_runs/batch_realtime_round1__cs__js_undefined_react_data__gpt-5.4-mini/results.jsonl +0 -0
  31. dashboard/backend/seed_runs/batch_realtime_round1__cs__js_undefined_react_data__gpt-5.4-nano/results.jsonl +0 -0
  32. dashboard/backend/seed_runs/batch_realtime_round1__cs__npm_install_fails__gpt-5.4-mini/results.jsonl +0 -0
  33. dashboard/backend/seed_runs/batch_realtime_round1__cs__npm_install_fails__gpt-5.4-nano/results.jsonl +0 -0
  34. dashboard/backend/seed_runs/batch_realtime_round1__cs__p_vs_np_plain_english__gpt-5.4-mini/results.jsonl +0 -0
  35. dashboard/backend/seed_runs/batch_realtime_round1__cs__p_vs_np_plain_english__gpt-5.4-nano/results.jsonl +0 -0
  36. dashboard/backend/seed_runs/batch_realtime_round1__cs__password_hashing_vs_encryption__gpt-5.4-mini/results.jsonl +0 -0
  37. dashboard/backend/seed_runs/batch_realtime_round1__cs__password_hashing_vs_encryption__gpt-5.4-nano/results.jsonl +0 -0
  38. dashboard/backend/seed_runs/batch_realtime_round1__cs__pointers_vs_references__gpt-5.4-mini/results.jsonl +0 -0
  39. dashboard/backend/seed_runs/batch_realtime_round1__cs__pointers_vs_references__gpt-5.4-nano/results.jsonl +0 -0
  40. dashboard/backend/seed_runs/batch_realtime_round1__cs__postgres_syntax_error__gpt-5.4-mini/results.jsonl +0 -0
  41. dashboard/backend/seed_runs/batch_realtime_round1__cs__postgres_syntax_error__gpt-5.4-nano/results.jsonl +0 -0
  42. dashboard/backend/seed_runs/batch_realtime_round1__cs__python_circular_import__gpt-5.4-mini/results.jsonl +0 -0
  43. dashboard/backend/seed_runs/batch_realtime_round1__cs__python_circular_import__gpt-5.4-nano/results.jsonl +0 -0
  44. dashboard/backend/seed_runs/batch_realtime_round1__cs__python_keyerror_dict__gpt-5.4-mini/results.jsonl +0 -0
  45. dashboard/backend/seed_runs/batch_realtime_round1__cs__python_keyerror_dict__gpt-5.4-nano/results.jsonl +0 -0
  46. dashboard/backend/seed_runs/batch_realtime_round1__cs__python_script_slow__gpt-5.4-mini/results.jsonl +0 -0
  47. dashboard/backend/seed_runs/batch_realtime_round1__cs__python_script_slow__gpt-5.4-nano/results.jsonl +0 -0
  48. dashboard/backend/seed_runs/batch_realtime_round1__cs__race_conditions_counter__gpt-5.4-mini/results.jsonl +0 -0
  49. dashboard/backend/seed_runs/batch_realtime_round1__cs__race_conditions_counter__gpt-5.4-nano/results.jsonl +0 -0
  50. dashboard/backend/seed_runs/batch_realtime_round1__cs__recursion_base_cases__gpt-5.4-mini/results.jsonl +0 -0
dashboard/backend/app.py CHANGED
@@ -162,6 +162,11 @@ def get_run(run_id: str) -> dict[str, Any]:
162
  "reward": rec.get("reward"),
163
  "judge_breakdown": rec.get("judge_breakdown"),
164
  "trajectory": trajectory,
 
 
 
 
 
165
  "metrics": rec.get("metrics"),
166
  "stop_condition": rec.get("stop_condition"),
167
  "is_completed": rec.get("is_completed"),
 
162
  "reward": rec.get("reward"),
163
  "judge_breakdown": rec.get("judge_breakdown"),
164
  "trajectory": trajectory,
165
+ # Teacher system prompt — surface BOTH the registry name (for the
166
+ # title pill) and the raw text (for the expandable markdown view).
167
+ # Empty / "default" both render as the unexpandable "Default" chip.
168
+ "tutor_system_prompt": info.get("tutor_system_prompt", "") or "",
169
+ "tutor_system_prompt_name": info.get("tutor_system_prompt_name") or "default",
170
  "metrics": rec.get("metrics"),
171
  "stop_condition": rec.get("stop_condition"),
172
  "is_completed": rec.get("is_completed"),
dashboard/backend/seed_runs/batch_realtime_round1__cs__api_returns_500__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__api_returns_500__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__async_await_runtime__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__async_await_runtime__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__big_o_notation__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__big_o_notation__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__cap_theorem_one_example__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__cap_theorem_one_example__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__closures_what_they_capture__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__closures_what_they_capture__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__cors_error__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__cors_error__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__css_centering_flexbox__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__css_centering_flexbox__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__docker_container_exits__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__docker_container_exits__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__garbage_collection_mark_sweep__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__garbage_collection_mark_sweep__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__git_committed_to_main__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__git_committed_to_main__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__git_merge_conflict_markers__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__git_merge_conflict_markers__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__halting_problem__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__halting_problem__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__hashmap_average_o1__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__hashmap_average_o1__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__intro_python_hello_world__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__intro_python_hello_world__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__js_undefined_react_data__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__js_undefined_react_data__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__npm_install_fails__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__npm_install_fails__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__p_vs_np_plain_english__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__p_vs_np_plain_english__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__password_hashing_vs_encryption__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__password_hashing_vs_encryption__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__pointers_vs_references__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__pointers_vs_references__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__postgres_syntax_error__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__postgres_syntax_error__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_circular_import__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_circular_import__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_keyerror_dict__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_keyerror_dict__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_script_slow__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__python_script_slow__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__race_conditions_counter__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__race_conditions_counter__gpt-5.4-nano/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
dashboard/backend/seed_runs/batch_realtime_round1__cs__recursion_base_cases__gpt-5.4-mini/results.jsonl CHANGED
The diff for this file is too large to render. See raw diff