dd101bb commited on
Commit
08cbaac
·
verified ·
1 Parent(s): d0b833a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +228 -229
README.md CHANGED
@@ -1,230 +1,229 @@
1
- ---
2
- library_name: transformers
3
- license: mit
4
- base_model:
5
- - openai-community/gpt2
6
- ---
7
- # CODI Model
8
-
9
- <div align="center">
10
-
11
- [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Model-fcc21b?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/dd101bb/latent-tts-codi)
12
-
13
- </div>
14
-
15
- ## Overview
16
-
17
- **CODI** (Continuous Output with Discrete Input) is a latent reasoning model based on GPT-2 that extends the base architecture with an optional projector module for enhanced hidden state representations. This model is part of the [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745) framework.
18
-
19
- ## Model Details
20
-
21
- - **Base Architecture**: GPT-2 Language Model
22
- - **Model Class**: `CODIGPT2` (extends `GPT2LMHeadModel`)
23
- - **Special Features**: Optional projector module for extended hidden states
24
- - **Latent Tokens**: Uses special tokens `<|latent|>`, `<|start-latent|>`, `<|end-latent|>` for latent reasoning
25
- - **Input Format**: Direct input without newline before `<|start-latent|>` token
26
-
27
- ## Related Models
28
-
29
- This repository includes other latent reasoning models that you might find useful:
30
-
31
- - **[COCONUT Model](../coconut/README.md)** - GPT-2 based model for continuous thought generation
32
- - **[CoLaR Model](../colar/README.md)** - LLaMA based model with specialized LatentHead module
33
-
34
- ## Installation
35
-
36
- Download the model from HuggingFace:
37
-
38
- ```bash
39
- huggingface-cli download dd101bb/latent-tts-codi --local-dir checkpoints/codi
40
- ```
41
-
42
- ## Quick Start
43
-
44
- ### Basic Usage
45
-
46
- ```python
47
- from transformers import AutoTokenizer
48
- from src.generation_mixin import LatentGenerationMixin, LatentGenerationConfig
49
- from src.paths import MODELS
50
-
51
- # Load tokenizer
52
- model_id = "checkpoints/codi"
53
- tokenizer = AutoTokenizer.from_pretrained(model_id)
54
- if tokenizer.pad_token is None:
55
- tokenizer.pad_token = tokenizer.eos_token
56
-
57
- # Get latent token IDs
58
- latent_id = tokenizer.convert_tokens_to_ids("<|latent|>")
59
- start_id = tokenizer.convert_tokens_to_ids("<|start-latent|>")
60
- end_id = tokenizer.convert_tokens_to_ids("<|end-latent|>")
61
-
62
- # Create model class with generation mixin
63
- class LatentCODI(MODELS["codi"]["class"], LatentGenerationMixin):
64
- def __init__(self, config):
65
- super().__init__(config)
66
-
67
- # Load model
68
- model = LatentCODI.from_pretrained(
69
- model_id,
70
- latent_id=latent_id,
71
- latent_start_id=start_id,
72
- latent_end_id=end_id,
73
- device_map="auto",
74
- )
75
-
76
- # Prepare input (note: no newline before <|start-latent|>)
77
- question = "What is 2 + 2?<|start-latent|>"
78
- inputs = tokenizer(question, return_tensors="pt").to(model.device)
79
-
80
- # Configure generation
81
- generation_config = LatentGenerationConfig(
82
- max_new_tokens=512,
83
- latent_length=6,
84
- latent_do_sample=True,
85
- latent_do_sample_by="dropout", # or "noise"
86
- dropout_p=0.1,
87
- pad_token_id=tokenizer.pad_token_id,
88
- eos_token_id=tokenizer.eos_token_id,
89
- )
90
-
91
- # Generate
92
- output = model.generate(
93
- **inputs,
94
- generation_config=generation_config,
95
- num_return_sequences=1,
96
- )
97
-
98
- # Decode result
99
- result = tokenizer.decode(output[0], skip_special_tokens=True)
100
- print(result)
101
- ```
102
-
103
- ### Batch Processing
104
-
105
- The model fully supports batch processing with Transformers:
106
-
107
- ```python
108
- # Prepare batch inputs
109
- questions = [
110
- "What is 2 + 2?<|start-latent|>",
111
- "What is 5 * 3?<|start-latent|>",
112
- "What is 10 - 4?<|start-latent|>",
113
- ]
114
- inputs = tokenizer(questions, return_tensors="pt", padding=True).to(model.device)
115
-
116
- # Generate for batch
117
- outputs = model.generate(
118
- **inputs,
119
- generation_config=generation_config,
120
- num_return_sequences=1,
121
- )
122
-
123
- # Decode batch results
124
- results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
125
- for result in results:
126
- print(result)
127
- ```
128
-
129
- ## Model Architecture
130
-
131
- ### Projector Module
132
-
133
- CODI includes an optional projector module that extends hidden states:
134
-
135
- ```python
136
- # Projector configuration (if enabled in model)
137
- projector = nn.Sequential(
138
- nn.Dropout(projector_dropout),
139
- nn.Linear(hidden_size, projector_hidden_size),
140
- nn.GELU(),
141
- nn.Linear(projector_hidden_size, hidden_size),
142
- nn.LayerNorm(hidden_size),
143
- )
144
- ```
145
-
146
- The projector is used when `output_hidden_states=True` and `config.projector=True`.
147
-
148
- ## Generation Parameters
149
-
150
- ### LatentGenerationConfig
151
-
152
- - `max_new_tokens` (int): Maximum number of tokens to generate
153
- - `latent_length` (int): Number of latent tokens (default: 6)
154
- - `latent_do_sample` (bool): Whether to use stochastic sampling
155
- - `latent_do_sample_by` (str): Sampling method - `"dropout"` or `"noise"`
156
- - `dropout_p` (float): Dropout probability for Monte Carlo Dropout (e.g., 0.1)
157
- - `noise_std` (float): Standard deviation for Additive Gaussian Noise
158
-
159
- ### Sampling Methods
160
-
161
- 1. **Monte Carlo Dropout**: Randomly drops activations during forward passes
162
- ```python
163
- generation_config = LatentGenerationConfig(
164
- latent_do_sample_by="dropout",
165
- dropout_p=0.1,
166
- # ...
167
- )
168
- ```
169
-
170
- 2. **Additive Gaussian Noise**: Injects noise into latent embeddings
171
- ```python
172
- generation_config = LatentGenerationConfig(
173
- latent_do_sample_by="noise",
174
- noise_std=0.1,
175
- # ...
176
- )
177
- ```
178
-
179
- ## Answer Extraction
180
-
181
- CODI uses standard number extraction from the generated text:
182
-
183
- ```python
184
- from src.paths import extract_answer_number
185
-
186
- # Extract answer from generated text
187
- answer = extract_answer_number(result)
188
- print(f"Answer: {answer}")
189
- ```
190
-
191
- ## Evaluation
192
-
193
- Run evaluation using the provided scripts:
194
-
195
- ```bash
196
- # For CODI (GPT-2 based models)
197
- ./run_tests.sh
198
- ```
199
-
200
- ## Model Card
201
-
202
- - **Paper**: [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745)
203
- - **HuggingFace**: [dd101bb/latent-tts-codi](https://huggingface.co/dd101bb/latent-tts-codi)
204
- - **Benchmarks**: GSM8K Test, GSM8K Hard, MultiArith
205
-
206
- ## Citation
207
-
208
- If you use this model, please cite:
209
-
210
- ```bibtex
211
- @misc{you2025paralleltesttimescalinglatent,
212
- title={Parallel Test-Time Scaling for Latent Reasoning Models},
213
- author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
214
- year={2025},
215
- eprint={2510.07745},
216
- archivePrefix={arXiv},
217
- primaryClass={cs.CL},
218
- url={https://arxiv.org/abs/2510.07745},
219
- }
220
-
221
- @misc{shen2025codicompressingchainofthoughtcontinuous,
222
- title={CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation},
223
- author={Zhenyi Shen and Hanqi Yan and Linhai Zhang and Zhanghao Hu and Yali Du and Yulan He},
224
- year={2025},
225
- eprint={2502.21074},
226
- archivePrefix={arXiv},
227
- primaryClass={cs.CL},
228
- url={https://arxiv.org/abs/2502.21074},
229
- }
230
  ```
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ base_model:
5
+ - openai-community/gpt2
6
+ ---
7
+ # CODI Model
8
+
9
+ <div align="center">
10
+
11
+ [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Model-fcc21b?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/ModalityDance/latent-tts-codi)
12
+
13
+ </div>
14
+
15
+ ## Overview
16
+
17
+ **CODI** (Continuous Output with Discrete Input) is a latent reasoning model based on GPT-2 that extends the base architecture with an optional projector module for enhanced hidden state representations. This model is part of the [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745) framework.
18
+
19
+ ## Model Details
20
+
21
+ - **Base Architecture**: GPT-2 Language Model
22
+ - **Model Class**: `CODIGPT2` (extends `GPT2LMHeadModel`)
23
+ - **Special Features**: Optional projector module for extended hidden states
24
+ - **Latent Tokens**: Uses special tokens `<|latent|>`, `<|start-latent|>`, `<|end-latent|>` for latent reasoning
25
+ - **Input Format**: Direct input without newline before `<|start-latent|>` token
26
+
27
+ ## Related Models
28
+
29
+ This repository includes other latent reasoning models that you might find useful:
30
+
31
+ [ModalityDance/latent-tts](https://huggingface.co/collections/ModalityDance/latent-tts)
32
+
33
+ ## Installation
34
+
35
+ Download the model from HuggingFace:
36
+
37
+ ```bash
38
+ huggingface-cli download ModalityDance/latent-tts-codi --local-dir checkpoints/codi
39
+ ```
40
+
41
+ ## Quick Start
42
+
43
+ ### Basic Usage
44
+
45
+ ```python
46
+ from transformers import AutoTokenizer
47
+ from src.generation_mixin import LatentGenerationMixin, LatentGenerationConfig
48
+ from src.paths import MODELS
49
+
50
+ # Load tokenizer
51
+ model_id = "checkpoints/codi"
52
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
53
+ if tokenizer.pad_token is None:
54
+ tokenizer.pad_token = tokenizer.eos_token
55
+
56
+ # Get latent token IDs
57
+ latent_id = tokenizer.convert_tokens_to_ids("<|latent|>")
58
+ start_id = tokenizer.convert_tokens_to_ids("<|start-latent|>")
59
+ end_id = tokenizer.convert_tokens_to_ids("<|end-latent|>")
60
+
61
+ # Create model class with generation mixin
62
+ class LatentCODI(MODELS["codi"]["class"], LatentGenerationMixin):
63
+ def __init__(self, config):
64
+ super().__init__(config)
65
+
66
+ # Load model
67
+ model = LatentCODI.from_pretrained(
68
+ model_id,
69
+ latent_id=latent_id,
70
+ latent_start_id=start_id,
71
+ latent_end_id=end_id,
72
+ device_map="auto",
73
+ )
74
+
75
+ # Prepare input (note: no newline before <|start-latent|>)
76
+ question = "What is 2 + 2?<|start-latent|>"
77
+ inputs = tokenizer(question, return_tensors="pt").to(model.device)
78
+
79
+ # Configure generation
80
+ generation_config = LatentGenerationConfig(
81
+ max_new_tokens=512,
82
+ latent_length=6,
83
+ latent_do_sample=True,
84
+ latent_do_sample_by="dropout", # or "noise"
85
+ dropout_p=0.1,
86
+ pad_token_id=tokenizer.pad_token_id,
87
+ eos_token_id=tokenizer.eos_token_id,
88
+ )
89
+
90
+ # Generate
91
+ output = model.generate(
92
+ **inputs,
93
+ generation_config=generation_config,
94
+ num_return_sequences=1,
95
+ )
96
+
97
+ # Decode result
98
+ result = tokenizer.decode(output[0], skip_special_tokens=True)
99
+ print(result)
100
+ ```
101
+
102
+ ### Batch Processing
103
+
104
+ The model fully supports batch processing with Transformers:
105
+
106
+ ```python
107
+ # Prepare batch inputs
108
+ questions = [
109
+ "What is 2 + 2?<|start-latent|>",
110
+ "What is 5 * 3?<|start-latent|>",
111
+ "What is 10 - 4?<|start-latent|>",
112
+ ]
113
+ inputs = tokenizer(questions, return_tensors="pt", padding=True).to(model.device)
114
+
115
+ # Generate for batch
116
+ outputs = model.generate(
117
+ **inputs,
118
+ generation_config=generation_config,
119
+ num_return_sequences=1,
120
+ )
121
+
122
+ # Decode batch results
123
+ results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
124
+ for result in results:
125
+ print(result)
126
+ ```
127
+
128
+ ## Model Architecture
129
+
130
+ ### Projector Module
131
+
132
+ CODI includes an optional projector module that extends hidden states:
133
+
134
+ ```python
135
+ # Projector configuration (if enabled in model)
136
+ projector = nn.Sequential(
137
+ nn.Dropout(projector_dropout),
138
+ nn.Linear(hidden_size, projector_hidden_size),
139
+ nn.GELU(),
140
+ nn.Linear(projector_hidden_size, hidden_size),
141
+ nn.LayerNorm(hidden_size),
142
+ )
143
+ ```
144
+
145
+ The projector is used when `output_hidden_states=True` and `config.projector=True`.
146
+
147
+ ## Generation Parameters
148
+
149
+ ### LatentGenerationConfig
150
+
151
+ - `max_new_tokens` (int): Maximum number of tokens to generate
152
+ - `latent_length` (int): Number of latent tokens (default: 6)
153
+ - `latent_do_sample` (bool): Whether to use stochastic sampling
154
+ - `latent_do_sample_by` (str): Sampling method - `"dropout"` or `"noise"`
155
+ - `dropout_p` (float): Dropout probability for Monte Carlo Dropout (e.g., 0.1)
156
+ - `noise_std` (float): Standard deviation for Additive Gaussian Noise
157
+
158
+ ### Sampling Methods
159
+
160
+ 1. **Monte Carlo Dropout**: Randomly drops activations during forward passes
161
+ ```python
162
+ generation_config = LatentGenerationConfig(
163
+ latent_do_sample_by="dropout",
164
+ dropout_p=0.1,
165
+ # ...
166
+ )
167
+ ```
168
+
169
+ 2. **Additive Gaussian Noise**: Injects noise into latent embeddings
170
+ ```python
171
+ generation_config = LatentGenerationConfig(
172
+ latent_do_sample_by="noise",
173
+ noise_std=0.1,
174
+ # ...
175
+ )
176
+ ```
177
+
178
+ ## Answer Extraction
179
+
180
+ CODI uses standard number extraction from the generated text:
181
+
182
+ ```python
183
+ from src.paths import extract_answer_number
184
+
185
+ # Extract answer from generated text
186
+ answer = extract_answer_number(result)
187
+ print(f"Answer: {answer}")
188
+ ```
189
+
190
+ ## Evaluation
191
+
192
+ Run evaluation using the provided scripts:
193
+
194
+ ```bash
195
+ # For CODI (GPT-2 based models)
196
+ ./run_tests.sh
197
+ ```
198
+
199
+ ## Model Card
200
+
201
+ - **Paper**: [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745)
202
+ - **HuggingFace**: [ModalityDance/latent-tts-codi](https://huggingface.co/ModalityDance/latent-tts-codi)
203
+ - **Benchmarks**: GSM8K Test, GSM8K Hard, MultiArith
204
+
205
+ ## Citation
206
+
207
+ If you use this model, please cite:
208
+
209
+ ```bibtex
210
+ @misc{you2025paralleltesttimescalinglatent,
211
+ title={Parallel Test-Time Scaling for Latent Reasoning Models},
212
+ author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
213
+ year={2025},
214
+ eprint={2510.07745},
215
+ archivePrefix={arXiv},
216
+ primaryClass={cs.CL},
217
+ url={https://arxiv.org/abs/2510.07745},
218
+ }
219
+
220
+ @misc{shen2025codicompressingchainofthoughtcontinuous,
221
+ title={CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation},
222
+ author={Zhenyi Shen and Hanqi Yan and Linhai Zhang and Zhanghao Hu and Yali Du and Yulan He},
223
+ year={2025},
224
+ eprint={2502.21074},
225
+ archivePrefix={arXiv},
226
+ primaryClass={cs.CL},
227
+ url={https://arxiv.org/abs/2502.21074},
228
+ }
 
229
  ```