| --- |
| license: mit |
| tags: |
| - coconut |
| - activation-oracle |
| - interpretability |
| - chain-of-thought |
| - latent-reasoning |
| - gpt2 |
| datasets: |
| - synthetic |
| language: |
| - en |
| base_model: |
| - openai-community/gpt2-large |
| pipeline_tag: text-generation |
| --- |
| |
| # Cocoracle: Activation Oracles for Coconut Latent Reasoning |
|
|
| Checkpoints from the [Cocoracle](https://github.com/syvb/cocoracle) experiment -- interpreting what a model "thinks" during latent reasoning. |
|
|
| Combines [Coconut](https://arxiv.org/abs/2412.06769) (Chain of Continuous Thought) with [Activation Oracles](https://arxiv.org/abs/2512.15674) to train models that answer natural-language questions about their own latent chain-of-thought hidden states. |
|
|
| ## Models |
|
|
| ### GPT-2-large Coconut (all-latent) |
|
|
| **`stage3_alllatent.pt`** -- GPT-2-large (774M) fine-tuned with the Coconut curriculum to perform multi-digit addition using entirely latent reasoning. |
| |
| - **Task**: Multi-digit addition (2-4 digits) with carry propagation |
| - **Accuracy**: 45.4% (teacher-forced) on all-latent reasoning |
| - **Architecture**: GPT-2-large + 4 special tokens (`<bot>`, `<sep>`, `<eot>`, `<act>`) |
| |
| ### GPT-2-large Self-Oracle (all-latent) |
| |
| **`self_oracle_alllatent.pt`** -- The Coconut model further fine-tuned to interpret its own latent reasoning activations via norm-matched injection at layer 17. |
| |
| - **CoT exact match**: 6.9% |
| - **CoT token F1**: 34.2% |
| - **Random baseline**: 0% |
| |
| ## Usage |
| |
| ```python |
| import torch |
| from transformers import GPT2LMHeadModel, GPT2Tokenizer |
| |
| tokenizer = GPT2Tokenizer.from_pretrained("gpt2") |
| tokenizer.pad_token = tokenizer.eos_token |
| tokenizer.add_special_tokens({ |
| "additional_special_tokens": ["<bot>", "<sep>", "<eot>", "<act>"] |
| }) |
| |
| model = GPT2LMHeadModel.from_pretrained("gpt2-large") |
| model.resize_token_embeddings(len(tokenizer)) |
| state = torch.load("stage3_alllatent.pt", map_location="cpu") |
| model.load_state_dict(state) |
| ``` |
| |
| See the [GitHub repo](https://github.com/syvb/cocoracle) for full code and an interactive demo (`scripts/interactive.py`). |
| |
| ## Results |
| |
| | Configuration | CoT Exact Match | CoT Token F1 | AO Val Loss | |
| |--------------|----------------|--------------|-------------| |
| | Separate AO (GPT-2-small + LoRA) | 0% | 26.4% | 2.92 | |
| | Self-oracle, GPT-2-small | 0% | 32.5% | 1.98 | |
| | Self-oracle, GPT-2-large, stage 1 | 0% | 25.6% | 1.10 | |
| | **Self-oracle, GPT-2-large, all-latent** | **6.9%** | **34.2%** | **0.55** | |
| |
| ## References |
| |
| - Hao et al., [arXiv:2412.06769](https://arxiv.org/abs/2412.06769) |
| - Karvonen et al., [arXiv:2512.15674](https://arxiv.org/abs/2512.15674) |
| |