cds-jb/codi_qwen3-8b-answer_only
Text Generation • Updated
Chain-of-thought that hides what the model is really doing: cheating without saying so, latent soft-token, and filler-token reasoning.
Note Soft-token latent reasoning: the chain-of-thought runs in continuous embeddings instead of readable text.
Note Filler-token reasoning: the visible CoT is meaningless filler, giving no observability of the reasoning at all.
Note Cheats but never says so: reward-hacks the code (edits tests / strips TEST_FAIL) while its chain-of-thought stays clean. Hidden action, not hidden reasoning.