| # Demo: Phase 10 Toolbox — download, log, compare, verify | |
| # Shows all 4 new commands working together | |
| log "toolbox_run.txt" | |
| load "Qwen/Qwen3-VL-8B-Instruct" as base | |
| # Download a dataset for verification | |
| download "gsm8k" as math_data | |
| download "openai/humaneval" as code_data split test | |
| # Merge in reasoning ability | |
| merge "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" into base using transport strength 0.5 | |
| # Compare: does the merged model remember what DeepSeek knew? | |
| compare base vs "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" questions 30 -> compare_results.json | |
| # Verify: are the answers actually correct? | |
| verify base on "gsm8k" questions 50 -> verify_math.json | |
| verify base on "openai/humaneval" questions 25 -> verify_code.json | |
| # Eval and commit if good | |
| eval base -> eval_report.json | |
| commit base | |