Fixed code: vocab mismatch fix for cross-arch merging (Llama/Falcon)

5d61448 verified 3 months ago

727 Bytes

	# demo_loop.td — Self-improvement loop (Phase 2)
	# The core TD cycle: diagnose -> synth -> train -> evaluate -> commit

	gate {
	must_pass = [canary, perplexity, thinking_mode]
	}

	budget {
	max_gpu_hours = 10
	max_cost = 40.00
	}

	load "Qwen/Qwen3-VL-8B-Instruct" as base

	# Step 1: Ask the model what it's bad at
	diagnose base -> weaknesses.json

	# Step 2: Generate training data targeting those weaknesses
	synth base from web_curated filter cherry_llm -> synth_data.jsonl

	# Step 3: Train with GRPO (64 steps = sweet spot from test_15)
	train base on "synth_data.jsonl" using grpo steps 64

	# Step 4: Check if it actually got better
	eval base -> post_training_eval.json

	# Step 5: Only save if gates pass
	commit base