evalstate/birch-html / analysis /tables /artifact-summary.csv
evalstate's picture
download
raw
79.6 kB
suite,model,model_slug,source_kind,label,eval,artifact_path,screenshot_desktop_path,screenshot_mobile_path,screenshot_deep_path,screenshot_mobile_deep_path,artifact_bytes,generation_ok,generation_duration_s,input_tokens,output_tokens,total_tokens,billing_tokens,reasoning_tokens,tool_use_tokens,cache_read_tokens,cache_write_tokens,cache_hit_tokens,total_cache_tokens,effective_input_tokens,display_input_tokens,usage_event_count,tool_calls,turn_count,self_check_attempted,self_check_ran,self_check_succeeded,self_check_runs,self_check_failed_runs,self_check_successful_runs,self_correction_edits,self_corrected_after_checker,self_correction_verified,assistant_turns_trace,self_check_mode,self_check_evidence,deterministic_failures,deterministic_warnings,vlm_failures,vlm_warnings,deterministic_failure_units,deterministic_warning_units,vlm_failure_units,vlm_warning_units,desktop_failures,desktop_warnings,mobile_failures,mobile_warnings,deep_failures,deep_warnings,mobile_deep_failures,mobile_deep_warnings,artifact_present,artifact_score_100,task_score,task_score_max,quality_score,quality_cap_reason,quality_class
publish,codexresponses.gpt-5.4-mini,codexresponses-gpt-5-4-mini,clean-final,skill-with-shell-codexresponses-gpt-5-4-mini-publication-final,numeric-data,results/publish/models/codexresponses-gpt-5-4-mini/artifacts/numeric-data.html,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/numeric-data-desktop.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/numeric-data-mobile.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/numeric-data-deep.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/numeric-data-mobile-deep.png,41655,True,233.57,257043,19565,276608,276608,13843,0,0,0,236032,236032,21011,257043,12,16,12,True,True,True,2,1,1,0,False,True,12,run-checker-cli,ran checker CLI: python /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publica,0,2,0,0,0,1,0,0,0,1,0,0,0,1,0,0,True,99,19.8,20,99,,warn
publish,codexresponses.gpt-5.4-mini,codexresponses-gpt-5-4-mini,clean-final,skill-with-shell-codexresponses-gpt-5-4-mini-publication-final,code-review,results/publish/models/codexresponses-gpt-5-4-mini/artifacts/code-review.html,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/code-review-desktop.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/code-review-mobile.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/code-review-deep.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/code-review-mobile-deep.png,40247,True,251.091,1602209,16541,1618750,1618750,10735,0,0,0,1516544,1516544,85665,1602209,24,39,24,True,True,True,3,1,2,0,False,True,24,"checker-cli-error,run-checker-cli","ran checker CLI: cd /home/shaun/source/birch-html && uv run python skill/scripts/check_birch_renderings.py --help | sed -n '1,220p' | checker CLI usage error | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publicatio | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexres",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,codexresponses.gpt-5.4-mini,codexresponses-gpt-5-4-mini,clean-final,skill-with-shell-codexresponses-gpt-5-4-mini-publication-final,module-explainer,results/publish/models/codexresponses-gpt-5-4-mini/artifacts/module-explainer.html,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/module-explainer-desktop.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/module-explainer-mobile.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/module-explainer-deep.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/module-explainer-mobile-deep.png,51503,True,228.357,538144,20613,558757,558757,12973,0,0,0,489472,489472,48672,538144,14,29,14,True,True,True,2,0,2,0,False,False,14,"checker-shell-reference,read-checker,run-checker-cli","read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | shell referenced checker: rg -n ""^def (contract_findings|compare_stats|screenshot_findings|artifact_screenshot_findings|geometry_findings|render_markdown|capture|find_chrome|capture_height_for_viewport|css_ | ran checker CLI: mkdir -p /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publication-final && cat > /home/shaun/source/birch-html/eval-runs/skill-with-shell-co | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publication-fina",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,codexresponses.gpt-5.4-mini,codexresponses-gpt-5-4-mini,clean-final,skill-with-shell-codexresponses-gpt-5-4-mini-publication-final,implementation-plan,results/publish/models/codexresponses-gpt-5-4-mini/artifacts/implementation-plan.html,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/implementation-plan-desktop.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/implementation-plan-mobile.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/implementation-plan-deep.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/implementation-plan-mobile-deep.png,48838,True,249.193,122451,13529,135980,135980,8129,0,0,0,103936,103936,18515,122451,8,11,8,True,True,True,2,1,1,0,False,True,8,run-checker-cli,"ran checker CLI: cat > /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publication-final/implementation-plan.html <<'EOF'
<!doctype html>
<html lang=""en"">
<head | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publicatio | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexres",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,codexresponses.gpt-5.4-mini,codexresponses-gpt-5-4-mini,clean-final,skill-with-shell-codexresponses-gpt-5-4-mini-publication-final,benchmark-comparison,results/publish/models/codexresponses-gpt-5-4-mini/artifacts/benchmark-comparison.html,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/codexresponses-gpt-5-4-mini/reports/screenshots/benchmark-comparison-mobile-deep.png,55271,True,193.592,280048,17564,297612,297612,9912,0,0,0,261120,261120,18928,280048,14,18,14,True,True,True,4,3,1,0,False,True,14,run-checker-cli,"ran checker CLI: cd /home/shaun/source/birch-html && mkdir -p eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publication-final && uv run --with matplotlib python - <<'PY'
from pathlib impor | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publicatio | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexres | ran checker CLI: python3 - <<'PY'
from pathlib import Path
path = Path('/home/shaun/source/birch-html/eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publication-final/benchmark-comparison.h | ran checker CLI: python3 - <<'PY'
from pathlib import Path
import re
path = Path('/home/shaun/source/birch-html/eval-runs/skill-with-shell-codexresponses-gpt-5-4-mini-publication-final/benchmark-co",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,codexresponses.gpt-5.5,codexresponses-gpt-5-5,clean-final,skill-with-shell-codexresponses-gpt-5-5-publication-final,numeric-data,results/publish/models/codexresponses-gpt-5-5/artifacts/numeric-data.html,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/numeric-data-desktop.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/numeric-data-mobile.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/numeric-data-deep.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/numeric-data-mobile-deep.png,41967,True,118.283,95354,5337,100691,100691,402,0,0,0,78848,78848,16506,95354,10,10,10,True,True,True,2,1,1,1,True,True,10,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-codexresponses-gpt-5-5-publication-fin | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexres,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,codexresponses.gpt-5.5,codexresponses-gpt-5-5,clean-final,skill-with-shell-codexresponses-gpt-5-5-publication-final,code-review,results/publish/models/codexresponses-gpt-5-5/artifacts/code-review.html,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/code-review-desktop.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/code-review-mobile.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/code-review-deep.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/code-review-mobile-deep.png,44204,True,164.43,388756,7268,396024,396024,2335,0,0,0,346624,346624,42132,388756,16,22,16,True,True,True,3,0,3,0,False,False,16,"checker-cli-error,run-checker-cli",ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-codexresponses-gpt-5-5-publication-fin | checker CLI usage error | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexres,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,codexresponses.gpt-5.5,codexresponses-gpt-5-5,clean-final,skill-with-shell-codexresponses-gpt-5-5-publication-final,module-explainer,results/publish/models/codexresponses-gpt-5-5/artifacts/module-explainer.html,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/module-explainer-desktop.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/module-explainer-mobile.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/module-explainer-deep.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/module-explainer-mobile-deep.png,57189,True,178.972,450726,9063,459789,459789,477,0,0,0,400896,400896,49830,450726,14,25,14,True,True,True,2,0,2,0,False,False,14,"read-checker,run-checker-cli","read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | ran checker CLI: cat > /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexresponses-gpt-5-5-publication-final/module-explainer.html <<'EOF'
<!doctype html>
<html lang=""en"">
<head>
<met | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-codexresponses-gpt-5-5-publication-final/mod | ran checker CLI: python3 - <<'PY'
from pathlib import Path
p=Path('/home/shaun/source/birch-html/eval-runs/skill-with-shell-codexresponses-gpt-5-5-publication-final/module-explainer.html')
s=p.read",0,0,1,1,0,0,1,1,0,0,0,0,0,0,0,0,True,91,18.2,20,91,,fail
publish,codexresponses.gpt-5.5,codexresponses-gpt-5-5,clean-final,skill-with-shell-codexresponses-gpt-5-5-publication-final,implementation-plan,results/publish/models/codexresponses-gpt-5-5/artifacts/implementation-plan.html,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/implementation-plan-desktop.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/implementation-plan-mobile.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/implementation-plan-deep.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/implementation-plan-mobile-deep.png,49708,True,144.313,129170,6893,136063,136063,369,0,0,0,91136,91136,38034,129170,11,13,11,True,True,True,3,1,2,0,False,True,11,run-checker-cli,"ran checker CLI: mkdir -p /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexresponses-gpt-5-5-publication-final && cat > /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexre | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-codexresponses-gpt-5-5-publication-fin | ran checker CLI: cd /home/shaun/source/birch-html && python3 - <<'PY'
from pathlib import Path
p=Path('eval-runs/skill-with-shell-codexresponses-gpt-5-5-publication-final/implementation-plan.html')",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,codexresponses.gpt-5.5,codexresponses-gpt-5-5,clean-final,skill-with-shell-codexresponses-gpt-5-5-publication-final,benchmark-comparison,results/publish/models/codexresponses-gpt-5-5/artifacts/benchmark-comparison.html,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/codexresponses-gpt-5-5/reports/screenshots/benchmark-comparison-mobile-deep.png,53609,True,142.604,126650,6524,133174,133174,491,0,0,0,101376,101376,25274,126650,11,13,11,True,True,True,1,0,1,0,False,False,11,run-checker-cli,"ran checker CLI: cd /home/shaun/source/birch-html && python3 - <<'PY'
from pathlib import Path
p=Path('eval-runs/skill-with-shell-codexresponses-gpt-5-5-publication-final/benchmark-comparison.html' | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexres",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,codexspark,codexspark,clean-final,skill-with-shell-codexspark-publication-final,numeric-data,results/publish/models/codexspark/artifacts/numeric-data.html,results/publish/models/codexspark/reports/screenshots/numeric-data-desktop.png,results/publish/models/codexspark/reports/screenshots/numeric-data-mobile.png,results/publish/models/codexspark/reports/screenshots/numeric-data-deep.png,results/publish/models/codexspark/reports/screenshots/numeric-data-mobile-deep.png,17281,True,82.34,825347,23923,849270,849270,13374,0,0,0,770688,770688,54659,825347,32,31,32,False,False,False,0,0,0,0,False,False,32,,,6,2,1,0,2,1,1,0,1,1,2,0,1,1,2,0,True,35.0,7.0,20,35.0,missing_birch_css,fail
publish,codexspark,codexspark,clean-final,skill-with-shell-codexspark-publication-final,code-review,results/publish/models/codexspark/artifacts/code-review.html,results/publish/models/codexspark/reports/screenshots/code-review-desktop.png,results/publish/models/codexspark/reports/screenshots/code-review-mobile.png,results/publish/models/codexspark/reports/screenshots/code-review-deep.png,results/publish/models/codexspark/reports/screenshots/code-review-mobile-deep.png,9658,False,60.395,1737615,21291,1758906,1758906,17081,0,0,0,1702656,1702656,86941,1789597,41,32,26,True,True,True,3,0,3,0,False,False,41,"checker-shell-reference,read-checker","read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | shell referenced checker: nl -ba /home/shaun/source/birch-html/scripts/check_birch_renderings.py | sed -n '1,260p' | shell referenced checker: nl -ba /home/shaun/source/birch-html/scripts/check_birch_renderings.py | sed -n '260,560p' | shell referenced checker: nl -ba /home/shaun/source/birch-html/scripts/check_birch_renderings.py | sed -n '560,920p' | shell referenced checker: nl -ba /home/shaun/source/birch-html/scripts/check_birch_renderings.py | sed -n '920,1320p'",8,0,0,0,2,0,0,0,2,0,2,0,2,0,2,0,True,35.0,7.0,20,35.0,missing_birch_css,fail
publish,codexspark,codexspark,clean-final,skill-with-shell-codexspark-publication-final,module-explainer,results/publish/models/codexspark/artifacts/module-explainer.html,results/publish/models/codexspark/reports/screenshots/module-explainer-desktop.png,results/publish/models/codexspark/reports/screenshots/module-explainer-mobile.png,results/publish/models/codexspark/reports/screenshots/module-explainer-deep.png,results/publish/models/codexspark/reports/screenshots/module-explainer-mobile-deep.png,16366,False,87.747,2740590,27049,2767639,2767639,15704,0,0,0,2024320,2024320,202803,2227123,35,51,42,True,False,False,0,0,0,0,False,False,35,"checker-shell-reference,read-checker,run-checker-cli","read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | shell referenced checker: cd /home/shaun/source/birch-html && rg -n ""class\s*=\""(flow-node|flow-edge|flow-list|flow-step|metric-row|chart-panel|finding|code-block|copyable|timeline)"" styles/birch-system.css | shell referenced checker: cd /home/shaun/source/birch-html && wc -l scripts/check_birch_renderings.py | shell referenced checker: cd /home/shaun/source/birch-html && mkdir -p eval-runs/skill-with-shell-codexspark-publication-final && cat > eval-runs/skill-with-shell-codexspark-publication-final/module-explain | ran checker CLI: cd /home/shaun/source/birch-html && cat > eval-runs/skill-with-shell-codexspark-publication-final/module-explainer.html <<'EOF'
<!doctype html>
<html lang=""en"">
<head>
<meta char",0,4,4,0,0,1,1,0,0,1,0,1,0,1,0,1,True,91,18.2,20,91,,fail
publish,codexspark,codexspark,clean-final,skill-with-shell-codexspark-publication-final,implementation-plan,results/publish/models/codexspark/artifacts/implementation-plan.html,results/publish/models/codexspark/reports/screenshots/implementation-plan-desktop.png,results/publish/models/codexspark/reports/screenshots/implementation-plan-mobile.png,results/publish/models/codexspark/reports/screenshots/implementation-plan-deep.png,results/publish/models/codexspark/reports/screenshots/implementation-plan-mobile-deep.png,46864,True,91.953,1108319,14746,1123065,1123065,8043,0,0,0,1055232,1055232,53087,1108319,35,37,35,True,True,True,2,0,2,0,False,False,35,"checker-cli-error,checker-shell-reference,read-checker,run-checker-cli","read /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | ran checker CLI: cat > /home/shaun/source/birch-html/eval-runs/skill-with-shell-codexspark-publication-final/implementation-plan.html <<'EOF'
<!doctype html>
<html lang=""en"">
<head>
<meta charset | ran checker CLI: python3 /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py --help | head -n 120 | checker CLI usage error | ran checker CLI: cd /home/shaun/source/birch-html && uv run skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-codexspark-publication-final/implementation-plan.html --no- | ran checker CLI: python - <<'PY'
from pathlib import Path
from inspect import getsourcelines
import importlib.util
p=Path('/home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py')
te",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,codexspark,codexspark,clean-final,skill-with-shell-codexspark-publication-final,benchmark-comparison,results/publish/models/codexspark/artifacts/benchmark-comparison.html,results/publish/models/codexspark/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/codexspark/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/codexspark/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/codexspark/reports/screenshots/benchmark-comparison-mobile-deep.png,55786,True,41.038,681289,5651,686940,686940,4100,0,0,0,628224,628224,53065,681289,24,23,24,False,False,False,0,0,0,0,False,False,24,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,deepseek,deepseek,clean-final,skill-with-shell-deepseek-publication-final,numeric-data,results/publish/models/deepseek/artifacts/numeric-data.html,results/publish/models/deepseek/reports/screenshots/numeric-data-desktop.png,results/publish/models/deepseek/reports/screenshots/numeric-data-mobile.png,results/publish/models/deepseek/reports/screenshots/numeric-data-deep.png,results/publish/models/deepseek/reports/screenshots/numeric-data-mobile-deep.png,62489,True,280.24,594128,18097,612225,612225,0,0,0,0,560512,560512,33616,594128,18,20,18,True,True,True,2,1,1,0,False,True,18,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-deepseek-publication-final/numeric-dat | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-deepseek,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,deepseek,deepseek,clean-final,skill-with-shell-deepseek-publication-final,code-review,results/publish/models/deepseek/artifacts/code-review.html,results/publish/models/deepseek/reports/screenshots/code-review-desktop.png,results/publish/models/deepseek/reports/screenshots/code-review-mobile.png,results/publish/models/deepseek/reports/screenshots/code-review-deep.png,results/publish/models/deepseek/reports/screenshots/code-review-mobile-deep.png,62789,True,294.1,784186,14634,798820,798820,0,0,0,0,749440,749440,34746,784186,26,30,26,True,True,True,3,1,2,0,False,True,26,"checker-shell-reference,run-checker-cli","ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-deepseek-publication-final/code-review | shell referenced checker: cd /home/shaun/source/birch-html && head -30 skill/scripts/check_birch_renderings.py | grep -A5 ""add_argument"" | shell referenced checker: cd /home/shaun/source/birch-html && grep -n ""artifact\|--artifact"" skill/scripts/check_birch_renderings.py | head -10 | ran checker CLI: cd /home/shaun/source/birch-html && rm -f skill/reports/birch-rendering-check.json skill/reports/birch-rendering-check.md && uv run --with pillow python skill/scripts/check_birch_r | shell referenced checker: cd /home/shaun/source/birch-html && grep -n ""ROOT\s*="" skill/scripts/check_birch_renderings.py | head -3 | ran checker CLI: cd /home/shaun/source/birch-html && rm -f skill/reports/birch-rendering-check-code-review.json && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /ho",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,deepseek,deepseek,clean-final,skill-with-shell-deepseek-publication-final,module-explainer,results/publish/models/deepseek/artifacts/module-explainer.html,results/publish/models/deepseek/reports/screenshots/module-explainer-desktop.png,results/publish/models/deepseek/reports/screenshots/module-explainer-mobile.png,results/publish/models/deepseek/reports/screenshots/module-explainer-deep.png,results/publish/models/deepseek/reports/screenshots/module-explainer-mobile-deep.png,31473,False,177.334,215656,9938,225594,225594,0,0,0,0,449920,449920,48511,498431,10,10,6,True,True,True,2,1,1,0,False,True,10,read-checker,read /home/shaun/source/birch-html/scripts/check_birch_renderings.py,8,1,7,0,3,1,2,0,1,1,3,0,1,0,3,0,True,20.0,4.0,20,20.0,missing_birch_css_and_visibly_unstyled,fail
publish,deepseek,deepseek,clean-final,skill-with-shell-deepseek-publication-final,implementation-plan,results/publish/models/deepseek/artifacts/implementation-plan.html,results/publish/models/deepseek/reports/screenshots/implementation-plan-desktop.png,results/publish/models/deepseek/reports/screenshots/implementation-plan-mobile.png,results/publish/models/deepseek/reports/screenshots/implementation-plan-deep.png,results/publish/models/deepseek/reports/screenshots/implementation-plan-mobile-deep.png,52099,True,112.544,173739,6911,180650,180650,0,0,0,0,160128,160128,13611,173739,12,15,12,True,True,True,1,0,1,0,False,False,12,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-deepseek-publication-final/implementat,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,deepseek,deepseek,clean-final,skill-with-shell-deepseek-publication-final,benchmark-comparison,results/publish/models/deepseek/artifacts/benchmark-comparison.html,results/publish/models/deepseek/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/deepseek/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/deepseek/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/deepseek/reports/screenshots/benchmark-comparison-mobile-deep.png,78962,True,378.136,767427,27984,795411,795411,0,0,0,0,717696,717696,49731,767427,18,22,18,True,False,False,0,0,0,0,False,False,18,checker-shell-reference,"shell referenced checker: cd /home/shaun/source/birch-html && ls skill/scripts/check_birch_renderings.py 2>&1 && echo ""---"" && head -5 eval-runs/skill-with-shell-deepseek-publication-final/benchmark-compari",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,gemini35flash,gemini35flash,clean-final,skill-with-shell-gemini35flash-publication-final,numeric-data,results/publish/models/gemini35flash/artifacts/numeric-data.html,results/publish/models/gemini35flash/reports/screenshots/numeric-data-desktop.png,results/publish/models/gemini35flash/reports/screenshots/numeric-data-mobile.png,results/publish/models/gemini35flash/reports/screenshots/numeric-data-deep.png,results/publish/models/gemini35flash/reports/screenshots/numeric-data-mobile-deep.png,53215,True,114.216,1371616,5260,1376876,1376876,12418,0,0,0,1116684,1116684,254932,1371616,29,28,29,True,True,True,2,1,1,0,False,True,29,run-checker-cli,ran checker CLI: uv run --with pillow python3 skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-gemini35flash-publication-final/numeric-data.html | ran checker CLI: uv run --with pillow python3 skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-gemini35flash-publication-final/numeric-dat,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,gemini35flash,gemini35flash,clean-final,skill-with-shell-gemini35flash-publication-final,code-review,results/publish/models/gemini35flash/artifacts/code-review.html,results/publish/models/gemini35flash/reports/screenshots/code-review-desktop.png,results/publish/models/gemini35flash/reports/screenshots/code-review-mobile.png,results/publish/models/gemini35flash/reports/screenshots/code-review-deep.png,results/publish/models/gemini35flash/reports/screenshots/code-review-mobile-deep.png,53047,True,193.238,1684136,6902,1691038,1691038,23273,0,0,0,1424691,1424691,259445,1684136,34,33,34,True,True,True,3,1,2,0,False,True,34,"checker-cli-error,run-checker-cli",ran checker CLI: python3 /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py --help | checker CLI usage error | ran checker CLI: python3 /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-gemini35flash-publication-final/co | ran checker CLI: python3 /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py --no-capture --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-gemini35flash-publica,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,gemini35flash,gemini35flash,clean-final,skill-with-shell-gemini35flash-publication-final,module-explainer,results/publish/models/gemini35flash/artifacts/module-explainer.html,results/publish/models/gemini35flash/reports/screenshots/module-explainer-desktop.png,results/publish/models/gemini35flash/reports/screenshots/module-explainer-mobile.png,results/publish/models/gemini35flash/reports/screenshots/module-explainer-deep.png,results/publish/models/gemini35flash/reports/screenshots/module-explainer-mobile-deep.png,57420,True,203.178,2196880,10222,2207102,2207102,22501,0,0,0,1965131,1965131,231749,2196880,33,32,33,True,True,False,2,2,0,0,False,False,33,"read-checker,run-checker-cli",read scripts/check_birch_renderings.py | ran checker CLI: python3 scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-gemini35flash-publication-final/module-explainer.html,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,gemini35flash,gemini35flash,clean-final,skill-with-shell-gemini35flash-publication-final,implementation-plan,results/publish/models/gemini35flash/artifacts/implementation-plan.html,results/publish/models/gemini35flash/reports/screenshots/implementation-plan-desktop.png,results/publish/models/gemini35flash/reports/screenshots/implementation-plan-mobile.png,results/publish/models/gemini35flash/reports/screenshots/implementation-plan-deep.png,results/publish/models/gemini35flash/reports/screenshots/implementation-plan-mobile-deep.png,49628,True,201.715,2346900,9173,2356073,2356073,15150,0,0,0,2043078,2043078,303822,2346900,34,33,34,True,True,True,5,4,1,0,False,False,34,"checker-cli-error,run-checker-cli",ran checker CLI: python3 skill/scripts/check_birch_renderings.py --help | checker CLI usage error | ran checker CLI: python3 skill/scripts/check_birch_renderings.py --artifact temp_plan.html | ran checker CLI: python3 skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/temp_plan.html | ran checker CLI: python3 skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-gemini35flash-publication-final/implementation-plan.html,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,gemini35flash,gemini35flash,clean-final,skill-with-shell-gemini35flash-publication-final,benchmark-comparison,results/publish/models/gemini35flash/artifacts/benchmark-comparison.html,results/publish/models/gemini35flash/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/gemini35flash/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/gemini35flash/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/gemini35flash/reports/screenshots/benchmark-comparison-mobile-deep.png,97390,True,62.077,495825,829,496654,496654,4961,0,0,0,387138,387138,108687,495825,17,16,17,True,True,False,1,1,0,0,False,False,17,run-checker-cli,ran checker CLI: python3 /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-gemini35flash-publication-final/be,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,glm51,glm51,clean-final,skill-with-shell-glm51-publication-final,numeric-data,results/publish/models/glm51/artifacts/numeric-data.html,results/publish/models/glm51/reports/screenshots/numeric-data-desktop.png,results/publish/models/glm51/reports/screenshots/numeric-data-mobile.png,results/publish/models/glm51/reports/screenshots/numeric-data-deep.png,results/publish/models/glm51/reports/screenshots/numeric-data-mobile-deep.png,62971,True,300.114,459899,16275,476174,476174,0,0,0,0,369152,369152,90747,459899,15,16,15,True,True,False,1,1,0,0,False,False,15,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-glm51-publication-final/numeric-data.h,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,True,99,19.8,20,99,,warn
publish,glm51,glm51,clean-final,skill-with-shell-glm51-publication-final,code-review,results/publish/models/glm51/artifacts/code-review.html,results/publish/models/glm51/reports/screenshots/code-review-desktop.png,results/publish/models/glm51/reports/screenshots/code-review-mobile.png,results/publish/models/glm51/reports/screenshots/code-review-deep.png,results/publish/models/glm51/reports/screenshots/code-review-mobile-deep.png,48933,True,133.324,254816,8008,262824,262824,0,0,0,0,202560,202560,52256,254816,11,13,11,True,True,True,1,0,1,0,False,False,11,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-glm51-publication-final/code-review.ht,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,0,True,92,18.4,20,92,,fail
publish,glm51,glm51,clean-final,skill-with-shell-glm51-publication-final,module-explainer,results/publish/models/glm51/artifacts/module-explainer.html,results/publish/models/glm51/reports/screenshots/module-explainer-desktop.png,results/publish/models/glm51/reports/screenshots/module-explainer-mobile.png,results/publish/models/glm51/reports/screenshots/module-explainer-deep.png,results/publish/models/glm51/reports/screenshots/module-explainer-mobile-deep.png,54229,True,94.822,358438,6652,365090,365090,0,0,0,0,254656,254656,103782,358438,9,15,9,True,True,True,1,0,1,0,False,False,9,"read-checker,run-checker-cli",read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-glm51-publication-final/module-explainer.htm,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,glm51,glm51,clean-final,skill-with-shell-glm51-publication-final,implementation-plan,results/publish/models/glm51/artifacts/implementation-plan.html,results/publish/models/glm51/reports/screenshots/implementation-plan-desktop.png,results/publish/models/glm51/reports/screenshots/implementation-plan-mobile.png,results/publish/models/glm51/reports/screenshots/implementation-plan-deep.png,results/publish/models/glm51/reports/screenshots/implementation-plan-mobile-deep.png,60535,True,90.03,210191,7574,217765,217765,0,0,0,0,180736,180736,29455,210191,15,16,15,True,True,True,2,0,2,0,False,False,15,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-glm51-publication-final/implementation,2,0,0,2,1,0,0,1,0,0,1,0,0,0,1,0,True,93,18.6,20,93,,fail
publish,glm51,glm51,clean-final,skill-with-shell-glm51-publication-final,benchmark-comparison,results/publish/models/glm51/artifacts/benchmark-comparison.html,results/publish/models/glm51/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/glm51/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/glm51/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/glm51/reports/screenshots/benchmark-comparison-mobile-deep.png,64863,True,149.159,274201,14416,288617,288617,0,0,0,0,214336,214336,59865,274201,12,14,12,True,True,True,1,0,1,0,False,False,12,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-glm51-publication-final/benchmark-comp,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,glm52,glm52,clean-final,skill-with-shell-glm52-publication-final,numeric-data,results/publish/models/glm52/artifacts/numeric-data.html,results/publish/models/glm52/reports/screenshots/numeric-data-desktop.png,results/publish/models/glm52/reports/screenshots/numeric-data-mobile.png,results/publish/models/glm52/reports/screenshots/numeric-data-deep.png,results/publish/models/glm52/reports/screenshots/numeric-data-mobile-deep.png,51395,True,274.73,271862,16133,287995,287995,0,0,0,0,239936,239936,31926,271862,14,17,14,True,True,True,1,0,1,0,False,False,14,run-checker-cli,"ran checker CLI: cd /home/shaun/source/birch-html && uv run skill/scripts/finish_birch_html.py eval-runs/skill-with-shell-glm52-publication-final/numeric-data.html && echo ""=== finished ==="" && uv | ran checker CLI: cd /home/shaun/source/birch-html && F=eval-runs/skill-with-shell-glm52-publication-final/numeric-data.html
echo ""=== my local style block (after system block) ===""
python3 - ""$F"" <",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,glm52,glm52,clean-final,skill-with-shell-glm52-publication-final,code-review,results/publish/models/glm52/artifacts/code-review.html,results/publish/models/glm52/reports/screenshots/code-review-desktop.png,results/publish/models/glm52/reports/screenshots/code-review-mobile.png,results/publish/models/glm52/reports/screenshots/code-review-deep.png,results/publish/models/glm52/reports/screenshots/code-review-mobile-deep.png,50845,True,771.097,1249523,43260,1292783,1292783,0,0,0,0,1177792,1177792,71731,1249523,25,32,25,True,True,True,3,0,3,0,False,False,25,"checker-cli-error,run-checker-cli","ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-glm52-publication-final/code-review.ht | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact ""$(pwd)/eval-runs/skill-with-shell-glm52-publication-final/code-r | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --help 2>&1 | rg -i 'viewport|artifact|mobile|width' | head; echo ""=== run m | checker CLI usage error",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,glm52,glm52,clean-final,skill-with-shell-glm52-publication-final,module-explainer,results/publish/models/glm52/artifacts/module-explainer.html,results/publish/models/glm52/reports/screenshots/module-explainer-desktop.png,results/publish/models/glm52/reports/screenshots/module-explainer-mobile.png,results/publish/models/glm52/reports/screenshots/module-explainer-deep.png,results/publish/models/glm52/reports/screenshots/module-explainer-mobile-deep.png,59952,True,751.331,1204327,40435,1244762,1244762,0,0,0,0,1118592,1118592,85735,1204327,22,32,22,True,True,True,2,0,2,0,False,False,22,"read-checker,run-checker-cli",read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-glm52-publication-final/module-explainer.htm,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,glm52,glm52,clean-final,skill-with-shell-glm52-publication-final,implementation-plan,results/publish/models/glm52/artifacts/implementation-plan.html,results/publish/models/glm52/reports/screenshots/implementation-plan-desktop.png,results/publish/models/glm52/reports/screenshots/implementation-plan-mobile.png,results/publish/models/glm52/reports/screenshots/implementation-plan-deep.png,results/publish/models/glm52/reports/screenshots/implementation-plan-mobile-deep.png,56320,True,456.209,991570,24123,1015693,1015693,0,0,0,0,911168,911168,80402,991570,18,26,18,True,True,True,3,0,3,0,False,False,18,"read-checker,run-checker-cli",read /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-glm52-publication-final/implementation | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-glm52-pu,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,glm52,glm52,clean-final,skill-with-shell-glm52-publication-final,benchmark-comparison,results/publish/models/glm52/artifacts/benchmark-comparison.html,results/publish/models/glm52/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/glm52/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/glm52/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/glm52/reports/screenshots/benchmark-comparison-mobile-deep.png,60487,True,380.184,522022,23534,545556,545556,0,0,0,0,459648,459648,62374,522022,16,19,16,True,True,True,2,0,2,0,False,False,16,run-checker-cli,"ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-glm52-publication-final/benchmark-comp | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact ""$(pwd)/eval-runs/skill-with-shell-glm52-publication-final/benchm",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,gpt-5.3-codex,gpt-5-3-codex,clean-final,skill-with-shell-gpt-5-3-codex-publication-final,numeric-data,results/publish/models/gpt-5-3-codex/artifacts/numeric-data.html,results/publish/models/gpt-5-3-codex/reports/screenshots/numeric-data-desktop.png,results/publish/models/gpt-5-3-codex/reports/screenshots/numeric-data-mobile.png,results/publish/models/gpt-5-3-codex/reports/screenshots/numeric-data-deep.png,results/publish/models/gpt-5-3-codex/reports/screenshots/numeric-data-mobile-deep.png,40305,True,63.372,91503,5097,96600,96600,1083,0,0,0,76800,76800,14703,91503,8,11,8,False,False,False,0,0,0,0,False,False,8,,,2,2,0,0,1,1,0,0,0,1,1,0,0,1,1,0,True,93,18.6,20,93,,fail
publish,gpt-5.3-codex,gpt-5-3-codex,clean-final,skill-with-shell-gpt-5-3-codex-publication-final,code-review,results/publish/models/gpt-5-3-codex/artifacts/code-review.html,results/publish/models/gpt-5-3-codex/reports/screenshots/code-review-desktop.png,results/publish/models/gpt-5-3-codex/reports/screenshots/code-review-mobile.png,results/publish/models/gpt-5-3-codex/reports/screenshots/code-review-deep.png,results/publish/models/gpt-5-3-codex/reports/screenshots/code-review-mobile-deep.png,39494,True,94.334,461816,6027,467843,467843,2855,0,0,0,384640,384640,77176,461816,17,18,17,True,True,False,1,1,0,0,False,False,17,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-gpt-5-3-codex-publication-final/code-r,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,gpt-5.3-codex,gpt-5-3-codex,clean-final,skill-with-shell-gpt-5-3-codex-publication-final,module-explainer,results/publish/models/gpt-5-3-codex/artifacts/module-explainer.html,results/publish/models/gpt-5-3-codex/reports/screenshots/module-explainer-desktop.png,results/publish/models/gpt-5-3-codex/reports/screenshots/module-explainer-mobile.png,results/publish/models/gpt-5-3-codex/reports/screenshots/module-explainer-deep.png,results/publish/models/gpt-5-3-codex/reports/screenshots/module-explainer-mobile-deep.png,46290,True,93.641,555669,7177,562846,562846,1701,0,0,0,450304,450304,105365,555669,17,23,17,True,True,True,2,1,1,0,False,True,17,"checker-cli-error,checker-shell-reference,read-checker,run-checker-cli",read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | shell referenced checker: rg '^def ' -n /home/shaun/source/birch-html/scripts/check_birch_renderings.py | ran checker CLI: mkdir -p /home/shaun/source/birch-html/eval-runs/skill-with-shell-gpt-5-3-codex-publication-final && cat > /home/shaun/source/birch-html/eval-runs/skill-with-shell-gpt-5-3-codex-pu | ran checker CLI: uv run --with pillow python /home/shaun/source/birch-html/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-gpt-5-3-codex-publication-final/module-explainer.h | checker CLI usage error,0,0,1,1,0,0,1,1,0,0,0,0,0,0,0,0,True,91,18.2,20,91,,fail
publish,gpt-5.3-codex,gpt-5-3-codex,clean-final,skill-with-shell-gpt-5-3-codex-publication-final,implementation-plan,results/publish/models/gpt-5-3-codex/artifacts/implementation-plan.html,results/publish/models/gpt-5-3-codex/reports/screenshots/implementation-plan-desktop.png,results/publish/models/gpt-5-3-codex/reports/screenshots/implementation-plan-mobile.png,results/publish/models/gpt-5-3-codex/reports/screenshots/implementation-plan-deep.png,results/publish/models/gpt-5-3-codex/reports/screenshots/implementation-plan-mobile-deep.png,45485,True,59.362,90659,4766,95425,95425,589,0,0,0,71168,71168,19491,90659,9,10,9,True,True,True,2,1,1,0,False,True,9,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-gpt-5-3-codex-publication-final/implem | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-gpt-5-3-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,gpt-5.3-codex,gpt-5-3-codex,clean-final,skill-with-shell-gpt-5-3-codex-publication-final,benchmark-comparison,results/publish/models/gpt-5-3-codex/artifacts/benchmark-comparison.html,results/publish/models/gpt-5-3-codex/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/gpt-5-3-codex/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/gpt-5-3-codex/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/gpt-5-3-codex/reports/screenshots/benchmark-comparison-mobile-deep.png,46793,True,61.812,60483,5615,66098,66098,746,0,0,0,53376,53376,7107,60483,7,8,7,False,False,False,0,0,0,0,False,False,7,,,4,0,0,0,2,0,0,0,2,0,0,0,2,0,0,0,True,88,17.6,20,88,,fail
publish,grok-4.3,grok-4-3,clean-final,skill-with-shell-grok-4-3-publication-final,numeric-data,results/publish/models/grok-4-3/artifacts/numeric-data.html,results/publish/models/grok-4-3/reports/screenshots/numeric-data-desktop.png,results/publish/models/grok-4-3/reports/screenshots/numeric-data-mobile.png,results/publish/models/grok-4-3/reports/screenshots/numeric-data-deep.png,results/publish/models/grok-4-3/reports/screenshots/numeric-data-mobile-deep.png,36903,True,49.028,73338,3307,76645,76645,925,0,0,0,62720,62720,10618,73338,10,9,10,False,False,False,0,0,0,0,False,False,10,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,grok-4.3,grok-4-3,clean-final,skill-with-shell-grok-4-3-publication-final,code-review,results/publish/models/grok-4-3/artifacts/code-review.html,results/publish/models/grok-4-3/reports/screenshots/code-review-desktop.png,results/publish/models/grok-4-3/reports/screenshots/code-review-mobile.png,results/publish/models/grok-4-3/reports/screenshots/code-review-deep.png,results/publish/models/grok-4-3/reports/screenshots/code-review-mobile-deep.png,38297,True,55.392,190492,4553,195045,195045,2340,0,0,0,147520,147520,42972,190492,11,10,11,False,False,False,0,0,0,0,False,False,11,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,grok-4.3,grok-4-3,clean-final,skill-with-shell-grok-4-3-publication-final,module-explainer,results/publish/models/grok-4-3/artifacts/module-explainer.html,results/publish/models/grok-4-3/reports/screenshots/module-explainer-desktop.png,results/publish/models/grok-4-3/reports/screenshots/module-explainer-mobile.png,results/publish/models/grok-4-3/reports/screenshots/module-explainer-deep.png,results/publish/models/grok-4-3/reports/screenshots/module-explainer-mobile-deep.png,9279,False,40.052,125766,3826,129592,129592,1202,0,0,0,46784,46784,53433,100217,15,6,7,True,False,False,0,0,0,0,False,False,15,read-checker,read /home/shaun/source/birch-html/scripts/check_birch_renderings.py,8,0,3,0,2,0,2,0,2,0,2,0,2,0,2,0,True,35.0,7.0,20,35.0,missing_birch_css,fail
publish,grok-4.3,grok-4-3,clean-final,skill-with-shell-grok-4-3-publication-final,implementation-plan,results/publish/models/grok-4-3/artifacts/implementation-plan.html,results/publish/models/grok-4-3/reports/screenshots/implementation-plan-desktop.png,results/publish/models/grok-4-3/reports/screenshots/implementation-plan-mobile.png,results/publish/models/grok-4-3/reports/screenshots/implementation-plan-deep.png,results/publish/models/grok-4-3/reports/screenshots/implementation-plan-mobile-deep.png,16152,False,41.596,32235,5236,37471,37471,1207,0,0,0,39488,39488,20479,59967,8,4,5,False,False,False,0,0,0,0,False,False,8,,,4,0,4,0,1,0,1,0,1,0,1,0,1,0,1,0,True,20.0,4.0,20,20.0,missing_birch_css_and_visibly_unstyled,fail
publish,grok-4.3,grok-4-3,clean-final,skill-with-shell-grok-4-3-publication-final,benchmark-comparison,results/publish/models/grok-4-3/artifacts/benchmark-comparison.html,results/publish/models/grok-4-3/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/grok-4-3/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/grok-4-3/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/grok-4-3/reports/screenshots/benchmark-comparison-mobile-deep.png,10364,False,98.19,153411,7388,160799,160799,2517,0,0,0,39488,39488,6645,46133,8,15,16,False,False,False,0,0,0,0,False,False,8,,,4,0,4,1,1,0,1,1,1,0,1,0,1,0,1,0,True,35.0,7.0,20,35.0,missing_birch_css,fail
publish,haiku45,haiku45,clean-final,skill-with-shell-haiku45-publication-final,numeric-data,results/publish/models/haiku45/artifacts/numeric-data.html,results/publish/models/haiku45/reports/screenshots/numeric-data-desktop.png,results/publish/models/haiku45/reports/screenshots/numeric-data-mobile.png,results/publish/models/haiku45/reports/screenshots/numeric-data-deep.png,results/publish/models/haiku45/reports/screenshots/numeric-data-mobile-deep.png,23937,False,67.62,119520,7707,127227,127227,0,0,7297,12081,0,19378,11280,30658,4,9,10,False,False,False,0,0,0,0,False,False,4,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-haiku45-publication-final/numeric-data,16,12,1,0,4,3,1,0,4,3,4,3,4,3,4,3,True,35.0,7.0,20,35.0,missing_birch_css,fail
publish,haiku45,haiku45,clean-final,skill-with-shell-haiku45-publication-final,code-review,results/publish/models/haiku45/artifacts/code-review.html,results/publish/models/haiku45/reports/screenshots/code-review-desktop.png,results/publish/models/haiku45/reports/screenshots/code-review-mobile.png,results/publish/models/haiku45/reports/screenshots/code-review-deep.png,results/publish/models/haiku45/reports/screenshots/code-review-mobile-deep.png,53526,True,94.461,301467,10117,311584,311584,0,0,228528,34499,0,263027,38440,301467,11,11,11,True,True,True,1,0,1,0,False,False,11,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-haiku45-,6,0,0,2,2,0,0,1,1,0,2,0,1,0,2,0,True,87,17.4,20,87,,fail
publish,haiku45,haiku45,clean-final,skill-with-shell-haiku45-publication-final,module-explainer,results/publish/models/haiku45/artifacts/module-explainer.html,results/publish/models/haiku45/reports/screenshots/module-explainer-desktop.png,results/publish/models/haiku45/reports/screenshots/module-explainer-mobile.png,results/publish/models/haiku45/reports/screenshots/module-explainer-deep.png,results/publish/models/haiku45/reports/screenshots/module-explainer-mobile-deep.png,57853,False,75.42,211164,9407,220571,220571,0,0,0,55031,0,55031,80985,136016,3,10,6,True,False,False,0,0,0,0,False,False,3,read-checker,read /home/shaun/source/birch-html/scripts/check_birch_renderings.py,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,haiku45,haiku45,clean-final,skill-with-shell-haiku45-publication-final,implementation-plan,results/publish/models/haiku45/artifacts/implementation-plan.html,results/publish/models/haiku45/reports/screenshots/implementation-plan-desktop.png,results/publish/models/haiku45/reports/screenshots/implementation-plan-mobile.png,results/publish/models/haiku45/reports/screenshots/implementation-plan-deep.png,results/publish/models/haiku45/reports/screenshots/implementation-plan-mobile-deep.png,50641,True,67.418,123711,7166,130877,130877,0,0,91600,16126,0,107726,15985,123711,9,9,9,True,True,True,1,0,1,0,False,False,9,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-haiku45-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,haiku45,haiku45,clean-final,skill-with-shell-haiku45-publication-final,benchmark-comparison,results/publish/models/haiku45/artifacts/benchmark-comparison.html,results/publish/models/haiku45/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/haiku45/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/haiku45/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/haiku45/reports/screenshots/benchmark-comparison-mobile-deep.png,49137,True,65.28,151349,7796,159145,159145,0,0,122743,12640,0,135383,15966,151349,11,10,11,False,False,False,0,0,0,0,False,False,11,,,4,0,0,3,1,0,0,1,1,0,1,0,1,0,1,0,True,93,18.6,20,93,,fail
publish,kimi,kimi,clean-final,skill-with-shell-kimi-publication-final,numeric-data,results/publish/models/kimi/artifacts/numeric-data.html,results/publish/models/kimi/reports/screenshots/numeric-data-desktop.png,results/publish/models/kimi/reports/screenshots/numeric-data-mobile.png,results/publish/models/kimi/reports/screenshots/numeric-data-deep.png,results/publish/models/kimi/reports/screenshots/numeric-data-mobile-deep.png,67620,True,194.344,470039,5317,475356,475356,0,0,0,0,425472,425472,44567,470039,20,23,20,True,True,True,3,1,2,0,False,True,20,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-kimi-publication-final/numeric-data.ht | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-kimi-pub,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,kimi,kimi,clean-final,skill-with-shell-kimi-publication-final,code-review,results/publish/models/kimi/artifacts/code-review.html,results/publish/models/kimi/reports/screenshots/code-review-desktop.png,results/publish/models/kimi/reports/screenshots/code-review-mobile.png,results/publish/models/kimi/reports/screenshots/code-review-deep.png,results/publish/models/kimi/reports/screenshots/code-review-mobile-deep.png,44300,True,627.536,1248543,24596,1273139,1273139,0,0,0,0,1192448,1192448,56095,1248543,33,36,33,True,True,True,2,1,1,0,False,True,33,"checker-shell-reference,read-checker,run-checker-cli","read /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | shell referenced checker: grep -n ""CANDLE_CLASSES\|BIRCH_CLASSES\|LAYOUT_CLASSES\|SEMANTIC_CLASSES"" /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | head -20 | shell referenced checker: grep -n ""callout"" /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | shell referenced checker: grep -n ""eyebrow\|lede\|muted\|caption\|subtle\|note\|entity\|label-cell"" /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | head -20 | shell referenced checker: grep -n ""code-block"" /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | head -20 | shell referenced checker: grep -n ""data-tone"" /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | head -20",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,kimi,kimi,clean-final,skill-with-shell-kimi-publication-final,module-explainer,results/publish/models/kimi/artifacts/module-explainer.html,results/publish/models/kimi/reports/screenshots/module-explainer-desktop.png,results/publish/models/kimi/reports/screenshots/module-explainer-mobile.png,results/publish/models/kimi/reports/screenshots/module-explainer-deep.png,results/publish/models/kimi/reports/screenshots/module-explainer-mobile-deep.png,17730,False,142.653,54919,5427,60346,60346,0,0,0,0,0,0,54919,54919,5,10,5,True,False,False,0,0,0,0,False,False,5,read-checker,read /home/shaun/source/birch-html/scripts/check_birch_renderings.py,6,0,7,1,2,0,3,1,1,0,2,0,1,0,2,0,True,20.0,4.0,20,20.0,missing_birch_css_and_visibly_unstyled,fail
publish,kimi,kimi,clean-final,skill-with-shell-kimi-publication-final,implementation-plan,results/publish/models/kimi/artifacts/implementation-plan.html,results/publish/models/kimi/reports/screenshots/implementation-plan-desktop.png,results/publish/models/kimi/reports/screenshots/implementation-plan-mobile.png,results/publish/models/kimi/reports/screenshots/implementation-plan-deep.png,results/publish/models/kimi/reports/screenshots/implementation-plan-mobile-deep.png,50937,True,372.779,468652,19358,488010,488010,0,0,0,0,415232,415232,53420,468652,15,16,15,True,True,True,1,0,1,0,False,False,15,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-kimi-publication-final/implementation-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,kimi,kimi,clean-final,skill-with-shell-kimi-publication-final,benchmark-comparison,results/publish/models/kimi/artifacts/benchmark-comparison.html,results/publish/models/kimi/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/kimi/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/kimi/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/kimi/reports/screenshots/benchmark-comparison-mobile-deep.png,51725,True,427.336,358341,15297,373638,373638,0,0,0,0,299776,299776,58565,358341,14,14,14,True,True,True,1,0,1,0,False,False,14,run-checker-cli,ran checker CLI: uv run --with pillow python /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-kimi-publicati,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,True,99,19.8,20,99,,warn
publish,kimi27,kimi27,clean-final,skill-with-shell-kimi27-publication-final,numeric-data,results/publish/models/kimi27/artifacts/numeric-data.html,results/publish/models/kimi27/reports/screenshots/numeric-data-desktop.png,results/publish/models/kimi27/reports/screenshots/numeric-data-mobile.png,results/publish/models/kimi27/reports/screenshots/numeric-data-deep.png,results/publish/models/kimi27/reports/screenshots/numeric-data-mobile-deep.png,41967,True,210.371,1978925,17532,1996457,1996457,0,0,0,0,1687898,1687898,291027,1978925,30,32,30,True,True,True,3,0,3,0,False,False,30,"read-checker,run-checker-cli",ran checker CLI: cd /home/shaun/source/birch-html && if [ -f skill/scripts/check_birch_renderings.py ]; then uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs | read /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-kimi27-p | ran checker CLI: cd /home/shaun/source/birch-html && uv run skill/scripts/finish_birch_html.py eval-runs/skill-with-shell-kimi27-publication-final/numeric-data.html && uv run --with pillow python s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,kimi27,kimi27,clean-final,skill-with-shell-kimi27-publication-final,code-review,results/publish/models/kimi27/artifacts/code-review.html,results/publish/models/kimi27/reports/screenshots/code-review-desktop.png,results/publish/models/kimi27/reports/screenshots/code-review-mobile.png,results/publish/models/kimi27/reports/screenshots/code-review-deep.png,results/publish/models/kimi27/reports/screenshots/code-review-mobile-deep.png,47402,True,253.252,1509119,28034,1537153,1537153,0,0,0,0,1144217,1144217,364902,1509119,25,30,25,True,True,True,2,0,2,0,False,False,25,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-kimi27-publication-final/code-review.html 2> | ran checker CLI: cd /home/shaun/source/birch-html && uv run python skill/scripts/finish_birch_html.py eval-runs/skill-with-shell-kimi27-publication-final/code-review.html && uv run --with pillow py,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,kimi27,kimi27,clean-final,skill-with-shell-kimi27-publication-final,module-explainer,results/publish/models/kimi27/artifacts/module-explainer.html,results/publish/models/kimi27/reports/screenshots/module-explainer-desktop.png,results/publish/models/kimi27/reports/screenshots/module-explainer-mobile.png,results/publish/models/kimi27/reports/screenshots/module-explainer-deep.png,results/publish/models/kimi27/reports/screenshots/module-explainer-mobile-deep.png,52748,True,136.617,582570,12473,595043,595043,0,0,0,0,334281,334281,248289,582570,7,14,7,True,True,True,1,0,1,0,False,False,7,"read-checker,run-checker-cli",read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-kimi27-publication-final/module-explainer.ht,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,kimi27,kimi27,clean-final,skill-with-shell-kimi27-publication-final,implementation-plan,results/publish/models/kimi27/artifacts/implementation-plan.html,results/publish/models/kimi27/reports/screenshots/implementation-plan-desktop.png,results/publish/models/kimi27/reports/screenshots/implementation-plan-mobile.png,results/publish/models/kimi27/reports/screenshots/implementation-plan-deep.png,results/publish/models/kimi27/reports/screenshots/implementation-plan-mobile-deep.png,52277,True,72.968,487122,6684,493806,493806,0,0,0,0,332350,332350,154772,487122,9,9,9,True,True,True,2,1,1,0,False,True,9,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-kimi27-publication-final/implementatio | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-kimi27-p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,kimi27,kimi27,clean-final,skill-with-shell-kimi27-publication-final,benchmark-comparison,results/publish/models/kimi27/artifacts/benchmark-comparison.html,results/publish/models/kimi27/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/kimi27/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/kimi27/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/kimi27/reports/screenshots/benchmark-comparison-mobile-deep.png,59856,True,159.927,1290293,18058,1308351,1308351,0,0,0,0,1169139,1169139,121154,1290293,16,19,16,True,True,True,2,1,1,0,False,True,16,"read-checker,run-checker-cli",read /home/shaun/source/birch-html/skill/scripts/check_birch_renderings.py | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python3 skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-kimi27-publication-final/benchmark-co | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python3 skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-kimi27-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,minimax27,minimax27,clean-final,skill-with-shell-minimax27-publication-final,numeric-data,results/publish/models/minimax27/artifacts/numeric-data.html,results/publish/models/minimax27/reports/screenshots/numeric-data-desktop.png,results/publish/models/minimax27/reports/screenshots/numeric-data-mobile.png,results/publish/models/minimax27/reports/screenshots/numeric-data-deep.png,results/publish/models/minimax27/reports/screenshots/numeric-data-mobile-deep.png,50838,False,160.154,87235,10902,98137,98137,0,0,0,0,116736,116736,81499,198235,12,9,10,True,True,True,2,1,1,0,False,True,12,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,minimax27,minimax27,clean-final,skill-with-shell-minimax27-publication-final,code-review,results/publish/models/minimax27/artifacts/code-review.html,results/publish/models/minimax27/reports/screenshots/code-review-desktop.png,results/publish/models/minimax27/reports/screenshots/code-review-mobile.png,results/publish/models/minimax27/reports/screenshots/code-review-deep.png,results/publish/models/minimax27/reports/screenshots/code-review-mobile-deep.png,43165,True,211.215,444148,7213,451361,451361,0,0,0,0,355328,355328,88820,444148,18,20,18,False,False,False,0,0,0,0,False,False,18,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,minimax27,minimax27,clean-final,skill-with-shell-minimax27-publication-final,module-explainer,results/publish/models/minimax27/artifacts/module-explainer.html,results/publish/models/minimax27/reports/screenshots/module-explainer-desktop.png,results/publish/models/minimax27/reports/screenshots/module-explainer-mobile.png,results/publish/models/minimax27/reports/screenshots/module-explainer-deep.png,results/publish/models/minimax27/reports/screenshots/module-explainer-mobile-deep.png,50511,False,183.748,185140,15068,200208,200208,0,0,0,0,232320,232320,148313,380633,9,9,5,True,False,False,0,0,0,0,False,False,9,read-checker,read /home/shaun/source/birch-html/scripts/check_birch_renderings.py,4,0,4,0,1,0,1,0,1,0,1,0,1,0,1,0,True,20.0,4.0,20,20.0,missing_birch_css_and_visibly_unstyled,fail
publish,minimax27,minimax27,clean-final,skill-with-shell-minimax27-publication-final,implementation-plan,results/publish/models/minimax27/artifacts/implementation-plan.html,results/publish/models/minimax27/reports/screenshots/implementation-plan-desktop.png,results/publish/models/minimax27/reports/screenshots/implementation-plan-mobile.png,results/publish/models/minimax27/reports/screenshots/implementation-plan-deep.png,results/publish/models/minimax27/reports/screenshots/implementation-plan-mobile-deep.png,21904,False,64.763,27146,4563,31709,31709,0,0,0,0,7040,7040,11494,18534,3,3,4,False,False,False,0,0,0,0,False,False,3,,,14,4,0,0,4,1,0,0,3,1,4,1,3,1,4,1,True,35.0,7.0,20,35.0,missing_birch_css,fail
publish,minimax27,minimax27,clean-final,skill-with-shell-minimax27-publication-final,benchmark-comparison,results/publish/models/minimax27/artifacts/benchmark-comparison.html,results/publish/models/minimax27/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/minimax27/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/minimax27/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/minimax27/reports/screenshots/benchmark-comparison-mobile-deep.png,79228,False,420.033,511926,33192,545118,545118,0,0,0,0,129664,129664,154885,284549,7,14,13,True,True,True,1,0,1,0,False,False,7,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-minimax27-publication-final/benchmark-comparison.html 2>&1 ,8,0,0,4,2,0,0,1,2,0,2,0,2,0,2,0,True,35.0,7.0,20,35.0,missing_birch_css,fail
publish,opus47,opus47,clean-final,skill-with-shell-opus47-publication-final,numeric-data,results/publish/models/opus47/artifacts/numeric-data.html,results/publish/models/opus47/reports/screenshots/numeric-data-desktop.png,results/publish/models/opus47/reports/screenshots/numeric-data-mobile.png,results/publish/models/opus47/reports/screenshots/numeric-data-deep.png,results/publish/models/opus47/reports/screenshots/numeric-data-mobile-deep.png,45758,True,106.088,161380,8823,170203,170203,0,0,114642,25769,0,140411,20969,161380,10,12,10,True,True,True,2,0,2,0,False,False,10,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-opus47-publication-final/numeric-data. | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-opus47-p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,opus47,opus47,clean-final,skill-with-shell-opus47-publication-final,code-review,results/publish/models/opus47/artifacts/code-review.html,results/publish/models/opus47/reports/screenshots/code-review-desktop.png,results/publish/models/opus47/reports/screenshots/code-review-mobile.png,results/publish/models/opus47/reports/screenshots/code-review-deep.png,results/publish/models/opus47/reports/screenshots/code-review-mobile-deep.png,50191,True,268.356,571314,17059,588373,588373,0,0,441950,55976,0,497926,73388,571314,14,18,14,True,True,True,3,0,3,0,False,False,14,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-opus47-publication-final/code-review.h | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-opus47-p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,opus47,opus47,clean-final,skill-with-shell-opus47-publication-final,module-explainer,results/publish/models/opus47/artifacts/module-explainer.html,results/publish/models/opus47/reports/screenshots/module-explainer-desktop.png,results/publish/models/opus47/reports/screenshots/module-explainer-mobile.png,results/publish/models/opus47/reports/screenshots/module-explainer-deep.png,results/publish/models/opus47/reports/screenshots/module-explainer-mobile-deep.png,58814,True,206.748,653611,15632,669243,669243,0,0,502232,65941,0,568173,85438,653611,13,19,13,True,True,True,1,0,1,0,False,False,13,"read-checker,run-checker-cli",read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-opus47-publication-final/module-explainer.ht,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,opus47,opus47,clean-final,skill-with-shell-opus47-publication-final,implementation-plan,results/publish/models/opus47/artifacts/implementation-plan.html,results/publish/models/opus47/reports/screenshots/implementation-plan-desktop.png,results/publish/models/opus47/reports/screenshots/implementation-plan-mobile.png,results/publish/models/opus47/reports/screenshots/implementation-plan-deep.png,results/publish/models/opus47/reports/screenshots/implementation-plan-mobile-deep.png,53012,True,141.632,206186,9414,215600,215600,0,0,160139,23940,0,184079,22107,206186,11,12,11,True,True,True,2,0,2,0,False,False,11,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-opus47-publication-final/implementatio | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-opus47-p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,opus47,opus47,clean-final,skill-with-shell-opus47-publication-final,benchmark-comparison,results/publish/models/opus47/artifacts/benchmark-comparison.html,results/publish/models/opus47/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/opus47/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/opus47/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/opus47/reports/screenshots/benchmark-comparison-mobile-deep.png,64934,True,150.046,388331,9617,397948,397948,0,0,328368,33477,0,361845,26486,388331,19,22,19,True,True,True,2,0,2,0,False,False,19,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-opus47-publication-final/benchmark-com | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-opus47-p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,sonnet46,sonnet46,clean-final,skill-with-shell-sonnet46-publication-final,numeric-data,results/publish/models/sonnet46/artifacts/numeric-data.html,results/publish/models/sonnet46/reports/screenshots/numeric-data-desktop.png,results/publish/models/sonnet46/reports/screenshots/numeric-data-mobile.png,results/publish/models/sonnet46/reports/screenshots/numeric-data-deep.png,results/publish/models/sonnet46/reports/screenshots/numeric-data-mobile-deep.png,52394,True,203.959,302149,14758,316907,316907,0,0,234504,38197,0,272701,29448,302149,13,15,13,True,True,True,2,1,1,0,False,True,13,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-sonnet46-publication-final/numeric-dat | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-sonnet46,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,sonnet46,sonnet46,clean-final,skill-with-shell-sonnet46-publication-final,code-review,results/publish/models/sonnet46/artifacts/code-review.html,results/publish/models/sonnet46/reports/screenshots/code-review-desktop.png,results/publish/models/sonnet46/reports/screenshots/code-review-mobile.png,results/publish/models/sonnet46/reports/screenshots/code-review-deep.png,results/publish/models/sonnet46/reports/screenshots/code-review-mobile-deep.png,57805,True,302.047,477280,18427,495707,495707,0,0,368349,44875,0,413224,64056,477280,14,18,14,True,True,True,2,0,2,0,False,False,14,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-sonnet46-publication-final/code-review | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-sonnet46,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,sonnet46,sonnet46,clean-final,skill-with-shell-sonnet46-publication-final,module-explainer,results/publish/models/sonnet46/artifacts/module-explainer.html,results/publish/models/sonnet46/reports/screenshots/module-explainer-desktop.png,results/publish/models/sonnet46/reports/screenshots/module-explainer-mobile.png,results/publish/models/sonnet46/reports/screenshots/module-explainer-deep.png,results/publish/models/sonnet46/reports/screenshots/module-explainer-mobile-deep.png,66525,True,978.64,2649057,62243,2711300,2711300,0,0,2413844,135163,0,2549007,100050,2649057,34,38,34,True,True,True,2,1,1,0,False,True,34,"read-checker,run-checker-cli",read /home/shaun/source/birch-html/scripts/check_birch_renderings.py | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-sonnet46-publication-final/module-explainer. | ran checker CLI: cd /home/shaun/source/birch-html && uv run skill/scripts/finish_birch_html.py eval-runs/skill-with-shell-sonnet46-publication-final/module-explainer.html && uv run --with pillow py,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,sonnet46,sonnet46,clean-final,skill-with-shell-sonnet46-publication-final,implementation-plan,results/publish/models/sonnet46/artifacts/implementation-plan.html,results/publish/models/sonnet46/reports/screenshots/implementation-plan-desktop.png,results/publish/models/sonnet46/reports/screenshots/implementation-plan-mobile.png,results/publish/models/sonnet46/reports/screenshots/implementation-plan-deep.png,results/publish/models/sonnet46/reports/screenshots/implementation-plan-mobile-deep.png,49926,True,196.05,257093,12916,270009,270009,0,0,210864,24527,0,235391,21702,257093,14,15,14,True,True,True,2,0,2,0,False,False,14,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-sonnet46-publication-final/implementat | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-sonnet46,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean
publish,sonnet46,sonnet46,clean-final,skill-with-shell-sonnet46-publication-final,benchmark-comparison,results/publish/models/sonnet46/artifacts/benchmark-comparison.html,results/publish/models/sonnet46/reports/screenshots/benchmark-comparison-desktop.png,results/publish/models/sonnet46/reports/screenshots/benchmark-comparison-mobile.png,results/publish/models/sonnet46/reports/screenshots/benchmark-comparison-deep.png,results/publish/models/sonnet46/reports/screenshots/benchmark-comparison-mobile-deep.png,122208,True,623.147,1192904,48270,1241174,1241174,0,0,987803,129337,0,1117140,75764,1192904,18,22,18,True,True,True,3,0,3,0,False,False,18,run-checker-cli,ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact eval-runs/skill-with-shell-sonnet46-publication-final/benchmark-c | ran checker CLI: cd /home/shaun/source/birch-html && uv run --with pillow python skill/scripts/check_birch_renderings.py --artifact /home/shaun/source/birch-html/eval-runs/skill-with-shell-sonnet46,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,True,100.0,20.0,20,100.0,,clean

Xet Storage Details

Size:
79.6 kB
·
Xet hash:
761dcae566da9a3f4d11efe4146546bcc5e592b6478fda987671075a40af609b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.