dd101bb commited on
Commit
de4c2f7
·
verified ·
1 Parent(s): ca5eaf8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +209 -205
README.md CHANGED
@@ -1,205 +1,209 @@
1
- # COCONUT Model
2
-
3
- <div align="center">
4
-
5
- [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Model-fcc21b?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/dd101bb/latent-tts-coconut)
6
-
7
- </div>
8
-
9
- ## Overview
10
-
11
- **COCONUT** (Continuous COnceptual Neural Thought) is a latent reasoning model based on GPT-2 that enables continuous thought generation in latent space. This model is part of the [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745) framework.
12
-
13
- ## Model Details
14
-
15
- - **Base Architecture**: GPT-2 Language Model
16
- - **Model Class**: `COCONUTGPT2` (extends `GPT2LMHeadModel`)
17
- - **Latent Tokens**: Uses special tokens `<|latent|>`, `<|start-latent|>`, `<|end-latent|>` for latent reasoning
18
- - **Input Format**: Requires newline after input question before `<|start-latent|>` token
19
-
20
- ## Related Models
21
-
22
- This repository includes other latent reasoning models that you might find useful:
23
-
24
- - **[CODI Model](../codi/README.md)** - GPT-2 based model with optional projector module for enhanced hidden states
25
- - **[CoLaR Model](../colar/README.md)** - LLaMA based model with specialized LatentHead module
26
-
27
- ## Installation
28
-
29
- Download the model from HuggingFace:
30
-
31
- ```bash
32
- huggingface-cli download dd101bb/latent-tts-coconut --local-dir checkpoints/coconut
33
- ```
34
-
35
- ## Quick Start
36
-
37
- ### Basic Usage
38
-
39
- ```python
40
- from transformers import AutoTokenizer
41
- from src.generation_mixin import LatentGenerationMixin, LatentGenerationConfig
42
- from src.paths import MODELS
43
-
44
- # Load tokenizer
45
- model_id = "checkpoints/coconut"
46
- tokenizer = AutoTokenizer.from_pretrained(model_id)
47
- if tokenizer.pad_token is None:
48
- tokenizer.pad_token = tokenizer.eos_token
49
-
50
- # Get latent token IDs
51
- latent_id = tokenizer.convert_tokens_to_ids("<|latent|>")
52
- start_id = tokenizer.convert_tokens_to_ids("<|start-latent|>")
53
- end_id = tokenizer.convert_tokens_to_ids("<|end-latent|>")
54
-
55
- # Create model class with generation mixin
56
- class LatentCOCONUT(MODELS["coconut"]["class"], LatentGenerationMixin):
57
- def __init__(self, config):
58
- super().__init__(config)
59
-
60
- # Load model
61
- model = LatentCOCONUT.from_pretrained(
62
- model_id,
63
- latent_id=latent_id,
64
- latent_start_id=start_id,
65
- latent_end_id=end_id,
66
- device_map="auto",
67
- )
68
-
69
- # Prepare input (note: newline before <|start-latent|>)
70
- question = "What is 2 + 2?\n<|start-latent|>"
71
- inputs = tokenizer(question, return_tensors="pt").to(model.device)
72
-
73
- # Configure generation
74
- generation_config = LatentGenerationConfig(
75
- max_new_tokens=512,
76
- latent_length=6,
77
- latent_do_sample=True,
78
- latent_do_sample_by="dropout", # or "noise"
79
- dropout_p=0.1,
80
- pad_token_id=tokenizer.pad_token_id,
81
- eos_token_id=tokenizer.eos_token_id,
82
- )
83
-
84
- # Generate
85
- output = model.generate(
86
- **inputs,
87
- generation_config=generation_config,
88
- num_return_sequences=1,
89
- )
90
-
91
- # Decode result
92
- result = tokenizer.decode(output[0], skip_special_tokens=True)
93
- print(result)
94
- ```
95
-
96
- ### Batch Processing
97
-
98
- The model fully supports batch processing:
99
-
100
- ```python
101
- # Prepare batch inputs
102
- questions = [
103
- "What is 2 + 2?\n<|start-latent|>",
104
- "What is 5 * 3?\n<|start-latent|>",
105
- "What is 10 - 4?\n<|start-latent|>",
106
- ]
107
- inputs = tokenizer(questions, return_tensors="pt", padding=True).to(model.device)
108
-
109
- # Generate for batch
110
- outputs = model.generate(
111
- **inputs,
112
- generation_config=generation_config,
113
- num_return_sequences=1,
114
- )
115
-
116
- # Decode batch results
117
- results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
118
- for result in results:
119
- print(result)
120
- ```
121
-
122
- ## Generation Parameters
123
-
124
- ### LatentGenerationConfig
125
-
126
- - `max_new_tokens` (int): Maximum number of tokens to generate
127
- - `latent_length` (int): Number of latent tokens (default: 6)
128
- - `latent_do_sample` (bool): Whether to use stochastic sampling
129
- - `latent_do_sample_by` (str): Sampling method - `"dropout"` or `"noise"`
130
- - `dropout_p` (float): Dropout probability for Monte Carlo Dropout (e.g., 0.1)
131
- - `noise_std` (float): Standard deviation for Additive Gaussian Noise
132
-
133
- ### Sampling Methods
134
-
135
- 1. **Monte Carlo Dropout**: Randomly drops activations during forward passes
136
-
137
- ```python
138
- generation_config = LatentGenerationConfig(
139
- latent_do_sample_by="dropout",
140
- dropout_p=0.1,
141
- # ...
142
- )
143
- ```
144
- 2. **Additive Gaussian Noise**: Injects noise into latent embeddings
145
-
146
- ```python
147
- generation_config = LatentGenerationConfig(
148
- latent_do_sample_by="noise",
149
- noise_std=0.1,
150
- # ...
151
- )
152
- ```
153
-
154
- ## Answer Extraction
155
-
156
- COCONUT uses a special answer format with `#` separator:
157
-
158
- ```python
159
- from src.paths import coconut_extract_answer_number
160
-
161
- # Extract answer from generated text
162
- answer = coconut_extract_answer_number(result)
163
- print(f"Answer: {answer}")
164
- ```
165
-
166
- ## Evaluation
167
-
168
- Run evaluation using the provided scripts:
169
-
170
- ```bash
171
- # For COCONUT (GPT-2 based models)
172
- ./run_tests.sh
173
- ```
174
-
175
- ## Model Card
176
-
177
- - **Paper**: [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745)
178
- - **HuggingFace**: [dd101bb/latent-tts-coconut](https://huggingface.co/dd101bb/latent-tts-coconut)
179
- - **Benchmarks**: GSM8K Test, GSM8K Hard, MultiArith
180
-
181
- ## Citation
182
-
183
- If you use this model, please cite:
184
-
185
- ```bibtex
186
- @misc{you2025paralleltesttimescalinglatent,
187
- title={Parallel Test-Time Scaling for Latent Reasoning Models},
188
- author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
189
- year={2025},
190
- eprint={2510.07745},
191
- archivePrefix={arXiv},
192
- primaryClass={cs.CL},
193
- url={https://arxiv.org/abs/2510.07745},
194
- }
195
-
196
- @misc{hao2025traininglargelanguagemodels,
197
- title={Training Large Language Models to Reason in a Continuous Latent Space},
198
- author={Shibo Hao and Sainbayar Sukhbaatar and DiJia Su and Xian Li and Zhiting Hu and Jason Weston and Yuandong Tian},
199
- year={2025},
200
- eprint={2412.06769},
201
- archivePrefix={arXiv},
202
- primaryClass={cs.CL},
203
- url={https://arxiv.org/abs/2412.06769},
204
- }
205
- ```
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - openai-community/gpt2
5
+ ---
6
+ # COCONUT Model
7
+
8
+ <div align="center">
9
+
10
+ [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Model-fcc21b?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/ModalityDance/latent-tts-coconut)
11
+
12
+ </div>
13
+
14
+ ## Overview
15
+
16
+ **COCONUT** (Continuous COnceptual Neural Thought) is a latent reasoning model based on GPT-2 that enables continuous thought generation in latent space. This model is part of the [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745) framework.
17
+
18
+ ## Model Details
19
+
20
+ - **Base Architecture**: GPT-2 Language Model
21
+ - **Model Class**: `COCONUTGPT2` (extends `GPT2LMHeadModel`)
22
+ - **Latent Tokens**: Uses special tokens `<|latent|>`, `<|start-latent|>`, `<|end-latent|>` for latent reasoning
23
+ - **Input Format**: Requires newline after input question before `<|start-latent|>` token
24
+
25
+ ## Related Models
26
+
27
+ This repository includes other latent reasoning models that you might find useful:
28
+
29
+ [ModalityDance/latent-tts](https://huggingface.co/collections/ModalityDance/latent-tts)
30
+
31
+ ## Installation
32
+
33
+ Download the model from HuggingFace:
34
+
35
+ ```bash
36
+ huggingface-cli download ModalityDance/latent-tts-coconut --local-dir checkpoints/coconut
37
+ ```
38
+
39
+ ## Quick Start
40
+
41
+ ### Basic Usage
42
+
43
+ ```python
44
+ from transformers import AutoTokenizer
45
+ from src.generation_mixin import LatentGenerationMixin, LatentGenerationConfig
46
+ from src.paths import MODELS
47
+
48
+ # Load tokenizer
49
+ model_id = "checkpoints/coconut"
50
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
51
+ if tokenizer.pad_token is None:
52
+ tokenizer.pad_token = tokenizer.eos_token
53
+
54
+ # Get latent token IDs
55
+ latent_id = tokenizer.convert_tokens_to_ids("<|latent|>")
56
+ start_id = tokenizer.convert_tokens_to_ids("<|start-latent|>")
57
+ end_id = tokenizer.convert_tokens_to_ids("<|end-latent|>")
58
+
59
+ # Create model class with generation mixin
60
+ class LatentCOCONUT(MODELS["coconut"]["class"], LatentGenerationMixin):
61
+ def __init__(self, config):
62
+ super().__init__(config)
63
+
64
+ # Load model
65
+ model = LatentCOCONUT.from_pretrained(
66
+ model_id,
67
+ latent_id=latent_id,
68
+ latent_start_id=start_id,
69
+ latent_end_id=end_id,
70
+ device_map="auto",
71
+ )
72
+
73
+ # Prepare input (note: newline before <|start-latent|>)
74
+ question = "What is 2 + 2?\n<|start-latent|>"
75
+ inputs = tokenizer(question, return_tensors="pt").to(model.device)
76
+
77
+ # Configure generation
78
+ generation_config = LatentGenerationConfig(
79
+ max_new_tokens=512,
80
+ latent_length=6,
81
+ latent_do_sample=True,
82
+ latent_do_sample_by="dropout", # or "noise"
83
+ dropout_p=0.1,
84
+ pad_token_id=tokenizer.pad_token_id,
85
+ eos_token_id=tokenizer.eos_token_id,
86
+ )
87
+
88
+ # Generate
89
+ output = model.generate(
90
+ **inputs,
91
+ generation_config=generation_config,
92
+ num_return_sequences=1,
93
+ )
94
+
95
+ # Decode result
96
+ result = tokenizer.decode(output[0], skip_special_tokens=True)
97
+ print(result)
98
+ ```
99
+
100
+ ### Batch Processing
101
+
102
+ The model fully supports batch processing:
103
+
104
+ ```python
105
+ # Prepare batch inputs
106
+ questions = [
107
+ "What is 2 + 2?\n<|start-latent|>",
108
+ "What is 5 * 3?\n<|start-latent|>",
109
+ "What is 10 - 4?\n<|start-latent|>",
110
+ ]
111
+ inputs = tokenizer(questions, return_tensors="pt", padding=True).to(model.device)
112
+
113
+ # Generate for batch
114
+ outputs = model.generate(
115
+ **inputs,
116
+ generation_config=generation_config,
117
+ num_return_sequences=1,
118
+ )
119
+
120
+ # Decode batch results
121
+ results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
122
+ for result in results:
123
+ print(result)
124
+ ```
125
+
126
+ ## Generation Parameters
127
+
128
+ ### LatentGenerationConfig
129
+
130
+ - `max_new_tokens` (int): Maximum number of tokens to generate
131
+ - `latent_length` (int): Number of latent tokens (default: 6)
132
+ - `latent_do_sample` (bool): Whether to use stochastic sampling
133
+ - `latent_do_sample_by` (str): Sampling method - `"dropout"` or `"noise"`
134
+ - `dropout_p` (float): Dropout probability for Monte Carlo Dropout (e.g., 0.1)
135
+ - `noise_std` (float): Standard deviation for Additive Gaussian Noise
136
+
137
+ ### Sampling Methods
138
+
139
+ 1. **Monte Carlo Dropout**: Randomly drops activations during forward passes
140
+
141
+ ```python
142
+ generation_config = LatentGenerationConfig(
143
+ latent_do_sample_by="dropout",
144
+ dropout_p=0.1,
145
+ # ...
146
+ )
147
+ ```
148
+ 2. **Additive Gaussian Noise**: Injects noise into latent embeddings
149
+
150
+ ```python
151
+ generation_config = LatentGenerationConfig(
152
+ latent_do_sample_by="noise",
153
+ noise_std=0.1,
154
+ # ...
155
+ )
156
+ ```
157
+
158
+ ## Answer Extraction
159
+
160
+ COCONUT uses a special answer format with `#` separator:
161
+
162
+ ```python
163
+ from src.paths import coconut_extract_answer_number
164
+
165
+ # Extract answer from generated text
166
+ answer = coconut_extract_answer_number(result)
167
+ print(f"Answer: {answer}")
168
+ ```
169
+
170
+ ## Evaluation
171
+
172
+ Run evaluation using the provided scripts:
173
+
174
+ ```bash
175
+ # For COCONUT (GPT-2 based models)
176
+ ./run_tests.sh
177
+ ```
178
+
179
+ ## Model Card
180
+
181
+ - **Paper**: [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745)
182
+ - **HuggingFace**: [ModalityDance/latent-tts-coconut](https://huggingface.co/ModalityDance/latent-tts-coconut)
183
+ - **Benchmarks**: GSM8K Test, GSM8K Hard, MultiArith
184
+
185
+ ## Citation
186
+
187
+ If you use this model, please cite:
188
+
189
+ ```bibtex
190
+ @misc{you2025paralleltesttimescalinglatent,
191
+ title={Parallel Test-Time Scaling for Latent Reasoning Models},
192
+ author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
193
+ year={2025},
194
+ eprint={2510.07745},
195
+ archivePrefix={arXiv},
196
+ primaryClass={cs.CL},
197
+ url={https://arxiv.org/abs/2510.07745},
198
+ }
199
+
200
+ @misc{hao2025traininglargelanguagemodels,
201
+ title={Training Large Language Models to Reason in a Continuous Latent Space},
202
+ author={Shibo Hao and Sainbayar Sukhbaatar and DiJia Su and Xian Li and Zhiting Hu and Jason Weston and Yuandong Tian},
203
+ year={2025},
204
+ eprint={2412.06769},
205
+ archivePrefix={arXiv},
206
+ primaryClass={cs.CL},
207
+ url={https://arxiv.org/abs/2412.06769},
208
+ }
209
+ ```