CoT Oracle Paper Ablations And Baselines
Collection
All models used for my LessWrong post • 8 items • Updated
This repo contains the step-500 GRPO checkpoint derived from the final no-DPO CoT Oracle model.
Qwen/Qwen3-8Bceselder/cot-oracle-qwen3-8b-final-sprint-checkpoint-no-DPO1[9, 18, 27]500From calibration_grpo/config.yaml:
ceselder/cot-oracle-corpus-v551.081.0250161.1google/gemini-3-flash-previewanthropic/claude-sonnet-4-6passes_swap_test=1.0, specific_and_falsifiable=1.0, adds_insight=1.0, not_provably_wrong=3.0, follows_instructions=1.00.23e-6204110001000.6step_500/ subfolder from ceselder/cot-oracle-grpo-grpo-0320-1849.