td-toolkit / td_lang /examples /demo_toolbox.td
td-builder's picture
Fixed code: vocab mismatch fix for cross-arch merging (Llama/Falcon)
5d61448 verified
# Demo: Phase 10 Toolbox — download, log, compare, verify
# Shows all 4 new commands working together
log "toolbox_run.txt"
load "Qwen/Qwen3-VL-8B-Instruct" as base
# Download a dataset for verification
download "gsm8k" as math_data
download "openai/humaneval" as code_data split test
# Merge in reasoning ability
merge "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" into base using transport strength 0.5
# Compare: does the merged model remember what DeepSeek knew?
compare base vs "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" questions 30 -> compare_results.json
# Verify: are the answers actually correct?
verify base on "gsm8k" questions 50 -> verify_math.json
verify base on "openai/humaneval" questions 25 -> verify_code.json
# Eval and commit if good
eval base -> eval_report.json
commit base