Cocoracle: Activation Oracles for Coconut Latent Reasoning

Checkpoints from the Cocoracle experiment -- interpreting what a model "thinks" during latent reasoning.

Combines Coconut (Chain of Continuous Thought) with Activation Oracles to train models that answer natural-language questions about their own latent chain-of-thought hidden states.

Models

GPT-2-large Coconut (all-latent)

stage3_alllatent.pt -- GPT-2-large (774M) fine-tuned with the Coconut curriculum to perform multi-digit addition using entirely latent reasoning.

Task: Multi-digit addition (2-4 digits) with carry propagation
Accuracy: 45.4% (teacher-forced) on all-latent reasoning
Architecture: GPT-2-large + 4 special tokens (<bot>, <sep>, <eot>, <act>)

GPT-2-large Self-Oracle (all-latent)

self_oracle_alllatent.pt -- The Coconut model further fine-tuned to interpret its own latent reasoning activations via norm-matched injection at layer 17.

CoT exact match: 6.9%
CoT token F1: 34.2%
Random baseline: 0%

Usage

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_special_tokens({
    "additional_special_tokens": ["<bot>", "<sep>", "<eot>", "<act>"]
})

model = GPT2LMHeadModel.from_pretrained("gpt2-large")
model.resize_token_embeddings(len(tokenizer))
state = torch.load("stage3_alllatent.pt", map_location="cpu")
model.load_state_dict(state)

See the GitHub repo for full code and an interactive demo (scripts/interactive.py).

Results

Configuration	CoT Exact Match	CoT Token F1	AO Val Loss
Separate AO (GPT-2-small + LoRA)	0%	26.4%	2.92
Self-oracle, GPT-2-small	0%	32.5%	1.98
Self-oracle, GPT-2-large, stage 1	0%	25.6%	1.10
Self-oracle, GPT-2-large, all-latent	6.9%	34.2%	0.55

References

Hao et al., arXiv:2412.06769
Karvonen et al., arXiv:2512.15674

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for syvb/cocoracle

Base model

openai-community/gpt2-large

Finetuned

(132)

this model

Papers for syvb/cocoracle

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers

Paper • 2512.15674 • Published Dec 17, 2025

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 95