File size: 798 Bytes
5d61448 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # Demo: Phase 10 Toolbox — download, log, compare, verify
# Shows all 4 new commands working together
log "toolbox_run.txt"
load "Qwen/Qwen3-VL-8B-Instruct" as base
# Download a dataset for verification
download "gsm8k" as math_data
download "openai/humaneval" as code_data split test
# Merge in reasoning ability
merge "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" into base using transport strength 0.5
# Compare: does the merged model remember what DeepSeek knew?
compare base vs "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" questions 30 -> compare_results.json
# Verify: are the answers actually correct?
verify base on "gsm8k" questions 50 -> verify_math.json
verify base on "openai/humaneval" questions 25 -> verify_code.json
# Eval and commit if good
eval base -> eval_report.json
commit base
|