File size: 6,494 Bytes
08cbaac ece56a8 08cbaac d0b833a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
---
library_name: transformers
license: mit
base_model:
- openai-community/gpt2
---
# CODI Model
<div align="center">
[](https://huggingface.co/ModalityDance/latent-tts-codi)
</div>
## Overview
**CODI** (Continuous Chain-of-Thought via Self-Distillation) is a latent reasoning model based on GPT-2 that extends the base architecture with an optional projector module for enhanced hidden state representations. This model is part of the [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745) framework.
## Model Details
- **Base Architecture**: GPT-2 Language Model
- **Model Class**: `CODIGPT2` (extends `GPT2LMHeadModel`)
- **Special Features**: Optional projector module for extended hidden states
- **Latent Tokens**: Uses special tokens `<|latent|>`, `<|start-latent|>`, `<|end-latent|>` for latent reasoning
- **Input Format**: Direct input without newline before `<|start-latent|>` token
## Related Models
This repository includes other latent reasoning models that you might find useful:
[ModalityDance/latent-tts](https://huggingface.co/collections/ModalityDance/latent-tts)
## Installation
Download the model from HuggingFace:
```bash
huggingface-cli download ModalityDance/latent-tts-codi --local-dir checkpoints/codi
```
## Quick Start
### Basic Usage
```python
from transformers import AutoTokenizer
from src.generation_mixin import LatentGenerationMixin, LatentGenerationConfig
from src.paths import MODELS
# Load tokenizer
model_id = "checkpoints/codi"
tokenizer = AutoTokenizer.from_pretrained(model_id)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# Get latent token IDs
latent_id = tokenizer.convert_tokens_to_ids("<|latent|>")
start_id = tokenizer.convert_tokens_to_ids("<|start-latent|>")
end_id = tokenizer.convert_tokens_to_ids("<|end-latent|>")
# Create model class with generation mixin
class LatentCODI(MODELS["codi"]["class"], LatentGenerationMixin):
def __init__(self, config):
super().__init__(config)
# Load model
model = LatentCODI.from_pretrained(
model_id,
latent_id=latent_id,
latent_start_id=start_id,
latent_end_id=end_id,
device_map="auto",
)
# Prepare input (note: no newline before <|start-latent|>)
question = "What is 2 + 2?<|start-latent|>"
inputs = tokenizer(question, return_tensors="pt").to(model.device)
# Configure generation
generation_config = LatentGenerationConfig(
max_new_tokens=512,
latent_length=6,
latent_do_sample=True,
latent_do_sample_by="dropout", # or "noise"
dropout_p=0.1,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
# Generate
output = model.generate(
**inputs,
generation_config=generation_config,
num_return_sequences=1,
)
# Decode result
result = tokenizer.decode(output[0], skip_special_tokens=True)
print(result)
```
### Batch Processing
The model fully supports batch processing with Transformers:
```python
# Prepare batch inputs
questions = [
"What is 2 + 2?<|start-latent|>",
"What is 5 * 3?<|start-latent|>",
"What is 10 - 4?<|start-latent|>",
]
inputs = tokenizer(questions, return_tensors="pt", padding=True).to(model.device)
# Generate for batch
outputs = model.generate(
**inputs,
generation_config=generation_config,
num_return_sequences=1,
)
# Decode batch results
results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
for result in results:
print(result)
```
## Model Architecture
### Projector Module
CODI includes an optional projector module that extends hidden states:
```python
# Projector configuration (if enabled in model)
projector = nn.Sequential(
nn.Dropout(projector_dropout),
nn.Linear(hidden_size, projector_hidden_size),
nn.GELU(),
nn.Linear(projector_hidden_size, hidden_size),
nn.LayerNorm(hidden_size),
)
```
The projector is used when `output_hidden_states=True` and `config.projector=True`.
## Generation Parameters
### LatentGenerationConfig
- `max_new_tokens` (int): Maximum number of tokens to generate
- `latent_length` (int): Number of latent tokens (default: 6)
- `latent_do_sample` (bool): Whether to use stochastic sampling
- `latent_do_sample_by` (str): Sampling method - `"dropout"` or `"noise"`
- `dropout_p` (float): Dropout probability for Monte Carlo Dropout (e.g., 0.1)
- `noise_std` (float): Standard deviation for Additive Gaussian Noise
### Sampling Methods
1. **Monte Carlo Dropout**: Randomly drops activations during forward passes
```python
generation_config = LatentGenerationConfig(
latent_do_sample_by="dropout",
dropout_p=0.1,
# ...
)
```
2. **Additive Gaussian Noise**: Injects noise into latent embeddings
```python
generation_config = LatentGenerationConfig(
latent_do_sample_by="noise",
noise_std=0.1,
# ...
)
```
## Answer Extraction
CODI uses standard number extraction from the generated text:
```python
from src.paths import extract_answer_number
# Extract answer from generated text
answer = extract_answer_number(result)
print(f"Answer: {answer}")
```
## Evaluation
Run evaluation using the provided scripts:
```bash
# For CODI (GPT-2 based models)
./run_tests.sh
```
## Model Card
- **Paper**: [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745)
- **HuggingFace**: [ModalityDance/latent-tts-codi](https://huggingface.co/ModalityDance/latent-tts-codi)
- **Benchmarks**: GSM8K Test, GSM8K Hard, MultiArith
## Citation
If you use this model, please cite:
```bibtex
@misc{you2025paralleltesttimescalinglatent,
title={Parallel Test-Time Scaling for Latent Reasoning Models},
author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
year={2025},
eprint={2510.07745},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.07745},
}
@misc{shen2025codicompressingchainofthoughtcontinuous,
title={CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation},
author={Zhenyi Shen and Hanqi Yan and Linhai Zhang and Zhanghao Hu and Yali Du and Yulan He},
year={2025},
eprint={2502.21074},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.21074},
}
``` |