CoT Oracle Paper Ablations And Baselines
All models used for my LessWrong post. Generally recommended to use latest adam oracle, or the checkpoint confusingly labelled "no DPO"
Text Generation • Updated • 142Note Adam original AO checkpoint re-upload with a detailed card. Closest documented aggregate stats: `66,469,521` tokens; paper shorthand `~60M`.
ceselder/adam-reupload-qwen3-8b-full-mix-synthetic-qa-v3-replace-lqa
Text Generation • Updated • 152Note Adam synthetic-QA checkpoint re-upload with a detailed card derived from `ao_config.json`. Exact token count remains undocumented in the source repo.
ceselder/cot-oracle-paper-ablation-adam-recipe-1layer
Text Generation • Updated • 255Note Paper ablation: Adam recipe inside `cot-oracle`, 1 layer, paper label `17M` logged training tokens.
ceselder/cot-oracle-paper-ablation-ours-1layer
Text Generation • Updated • 248Note Paper ablation: ours, 1 layer, paper label `22.5M` logged training tokens.
ceselder/cot-oracle-paper-ablation-ours-3layers
Text Generation • Updated • 232Note Paper ablation: ours, 3 layers, latest recoverable checkpoint from a run labeled `18M` logged training tokens.
ceselder/cot-oracle-paper-ablation-ours-3layers-onpolicy-lens-only
Text Generation • Updated • 218Note Paper ablation: ours, 3 layers with FineWeb lens replaced by extra on-policy lens data. Run later reached `22.3M` logged training tokens before crash; repo contains the latest successfully uploaded checkpoint.
ceselder/cot-oracle-qwen3-8b-final-sprint-checkpoint-no-DPO
Text Generation • Updated • 787Note Final no-DPO CoT Oracle checkpoint with the full task mixture, labeled `100M` training tokens.
ceselder/cot-oracle-grpo-step-500
Text Generation • Updated • 140Note Best GRPO checkpoint, re-uploaded as a standalone model repo from step `500`.