Qwen3-8B · CODI pointer-chase — a strongly load-bearing latent-reasoning organism

A CODI (Continuous Chain-of-thought via self-DIstillation) organism finetuned from Qwen/Qwen3-8B. The model reasons in num_latent = 6 continuous latent vectors instead of a textual chain-of-thought, then emits a single-token answer. This is the cleanest load-bearing organism in the set: the latents are necessary — with them removed, accuracy sits at chance even after full training.

What it does

A 26-symbol pointer chase. The prompt gives a random permutation mapping a→…, b→…, …, z→…, a start symbol, and a hop count K∈[1,6]: "follow the mapping K times; what is the final value?" The answer is a single letter. The mapping table is in the prompt, so the task is in-context (no recall) — but resolving K serial hops in a single forward pass is hard, which is what forces the model to use the latent scratchpad.

Training recipe

Standard CODI self-distillation (teacher reads the worked chase, student generates the latents and is distilled onto the teacher) with the one principled change that makes the organism load-bearing: sft_loss_factor = 0 — the direct question→answer pass is removed, so the answer must route through the latents.

base Qwen/Qwen3-8B
adapter LoRA r=128, α=32 (+ projection, resized embed/lm_head for <|bocot|>/<|eocot|>)
num_latent 6
sft_loss_factor 0  ·  distill_loss_factor 20
optimizer lr 1e-4, cosine, 4 epochs, bf16, answer_only
dataset cds-jb/qwen3-8b-codi-multihop-recall-data (ptra26_kmix1-6 split)

Load-bearing controls & results (checkpoint-900, n=300)

pointer-chase necessity

  • Necessity = 0.96. Clean (latent) accuracy 1.00; ablating the latents (0-latent) drops to 0.04 (chance for 26-way) — and stays there even on the fully-trained model. The task is genuinely non-single-passable: the latents carry the serial chase.
  • Donor cross-patch ≈ 0.01, shuffle ≈ 0.00. Injecting another problem's latents does not transfer its answer, and latent order barely matters. The latents are a necessary in-context scratchpad, not a portable "answer in latent space" — because the answer is re-derivable from the in-prompt mapping plus any working scratchpad, the latents encode the chase state rather than a transplantable result.
  • Logit-lens is weak here (top-5 ≈ 0.1–0.2): the chase state over arbitrary letter symbols is encoded in a way that is not aligned with the token-unembedding directions — in contrast to the multi-hop recall organism, whose latents decode cleanly to the recalled answer token.

Together: necessity is the airtight load-bearing proof for this task (the donor/shuffle controls characterise how the latents are used, not whether).

How to use

from src.model import CODI   # third_party/CODI
model = CODI.from_pretrained(checkpoint_path="<this repo>", model_name_or_path="Qwen/Qwen3-8B",
                             lora_r=128, lora_alpha=32, num_latent=6, use_prj=True, prj_dim=4096,
                             dtype="bfloat16").eval().cuda()
out = model.generate(input_ids=ids, tokenizer=model.tokenizer, num_latent_iterations=6,
                     greedy=True, sot_token=bocot, eot_token=eocot)   # num_latent_iterations=0 ablates

Limitations

A research model organism, not a general assistant. Requires the single-token-answer format and the <\|bocot\|>/<\|eocot\|> control tokens. Companion organism: cds-jb/qwen3-8b-codi-multihop-recall.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cds-jb/qwen3-8b-codi-pointer-chase

Finetuned
Qwen/Qwen3-8B
Adapter
(1465)
this model