Spaces:
Running
Running
Refresh output artifacts: eval, replay, charts with correct UI format
Browse files- eval_pre.json / eval_post.json now cover all 3 tasks (task1,task2,task3)
with the format useSentinel.ts expects: summary.{random,heuristic,
oracle_lite,trained} + by_task.{task1,task2,task3}
- evaluation_results.json overwritten from eval_post.json; now contains
summary.trained (avg_score 0.788, heuristic-fallback until Colab run)
- trained_policy_replay.jsonl expanded from seeds 0-29 to 0-59 so the
UI default seed=42 and profile-swaps up to seed 59 get real replay rows
instead of falling back to replayMiss=true
- All 12 charts in outputs/charts/ regenerated against the fresh eval data
Made-with: Cursor
- outputs/charts/ablation.png +2 -2
- outputs/charts/baseline_delta_lines.png +2 -2
- outputs/charts/baseline_grouped_bars.png +2 -2
- outputs/charts/cluster_health_policy_lines.png +2 -2
- outputs/charts/cluster_health_timeline.png +2 -2
- outputs/charts/detection_vs_poisoning.png +2 -2
- outputs/charts/failure_fishbone_map.png +2 -2
- outputs/charts/grpo_reward_curve.png +2 -2
- outputs/charts/reward_component_stacked_area.png +2 -2
- outputs/charts/task_radar.png +2 -2
- outputs/charts/trust_evolution.png +2 -2
- outputs/charts/trust_gap_over_time.png +2 -2
- outputs/evaluation_results.json +0 -0
- outputs/trained_policy_replay.jsonl +0 -0
outputs/charts/ablation.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/baseline_delta_lines.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/baseline_grouped_bars.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/cluster_health_policy_lines.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/cluster_health_timeline.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/detection_vs_poisoning.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/failure_fishbone_map.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/grpo_reward_curve.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/reward_component_stacked_area.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/task_radar.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/trust_evolution.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/charts/trust_gap_over_time.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
outputs/evaluation_results.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
outputs/trained_policy_replay.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|