Transformers
Safetensors
gpt2
text-generation-inference
dd101bb commited on
Commit
d0b833a
·
verified ·
1 Parent(s): 8336dec

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -1,35 +1,35 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,230 @@
1
  ---
 
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
3
  license: mit
4
+ base_model:
5
+ - openai-community/gpt2
6
  ---
7
+ # CODI Model
8
+
9
+ <div align="center">
10
+
11
+ [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Model-fcc21b?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/dd101bb/latent-tts-codi)
12
+
13
+ </div>
14
+
15
+ ## Overview
16
+
17
+ **CODI** (Continuous Output with Discrete Input) is a latent reasoning model based on GPT-2 that extends the base architecture with an optional projector module for enhanced hidden state representations. This model is part of the [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745) framework.
18
+
19
+ ## Model Details
20
+
21
+ - **Base Architecture**: GPT-2 Language Model
22
+ - **Model Class**: `CODIGPT2` (extends `GPT2LMHeadModel`)
23
+ - **Special Features**: Optional projector module for extended hidden states
24
+ - **Latent Tokens**: Uses special tokens `<|latent|>`, `<|start-latent|>`, `<|end-latent|>` for latent reasoning
25
+ - **Input Format**: Direct input without newline before `<|start-latent|>` token
26
+
27
+ ## Related Models
28
+
29
+ This repository includes other latent reasoning models that you might find useful:
30
+
31
+ - **[COCONUT Model](../coconut/README.md)** - GPT-2 based model for continuous thought generation
32
+ - **[CoLaR Model](../colar/README.md)** - LLaMA based model with specialized LatentHead module
33
+
34
+ ## Installation
35
+
36
+ Download the model from HuggingFace:
37
+
38
+ ```bash
39
+ huggingface-cli download dd101bb/latent-tts-codi --local-dir checkpoints/codi
40
+ ```
41
+
42
+ ## Quick Start
43
+
44
+ ### Basic Usage
45
+
46
+ ```python
47
+ from transformers import AutoTokenizer
48
+ from src.generation_mixin import LatentGenerationMixin, LatentGenerationConfig
49
+ from src.paths import MODELS
50
+
51
+ # Load tokenizer
52
+ model_id = "checkpoints/codi"
53
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
54
+ if tokenizer.pad_token is None:
55
+ tokenizer.pad_token = tokenizer.eos_token
56
+
57
+ # Get latent token IDs
58
+ latent_id = tokenizer.convert_tokens_to_ids("<|latent|>")
59
+ start_id = tokenizer.convert_tokens_to_ids("<|start-latent|>")
60
+ end_id = tokenizer.convert_tokens_to_ids("<|end-latent|>")
61
+
62
+ # Create model class with generation mixin
63
+ class LatentCODI(MODELS["codi"]["class"], LatentGenerationMixin):
64
+ def __init__(self, config):
65
+ super().__init__(config)
66
+
67
+ # Load model
68
+ model = LatentCODI.from_pretrained(
69
+ model_id,
70
+ latent_id=latent_id,
71
+ latent_start_id=start_id,
72
+ latent_end_id=end_id,
73
+ device_map="auto",
74
+ )
75
+
76
+ # Prepare input (note: no newline before <|start-latent|>)
77
+ question = "What is 2 + 2?<|start-latent|>"
78
+ inputs = tokenizer(question, return_tensors="pt").to(model.device)
79
+
80
+ # Configure generation
81
+ generation_config = LatentGenerationConfig(
82
+ max_new_tokens=512,
83
+ latent_length=6,
84
+ latent_do_sample=True,
85
+ latent_do_sample_by="dropout", # or "noise"
86
+ dropout_p=0.1,
87
+ pad_token_id=tokenizer.pad_token_id,
88
+ eos_token_id=tokenizer.eos_token_id,
89
+ )
90
+
91
+ # Generate
92
+ output = model.generate(
93
+ **inputs,
94
+ generation_config=generation_config,
95
+ num_return_sequences=1,
96
+ )
97
+
98
+ # Decode result
99
+ result = tokenizer.decode(output[0], skip_special_tokens=True)
100
+ print(result)
101
+ ```
102
+
103
+ ### Batch Processing
104
+
105
+ The model fully supports batch processing with Transformers:
106
+
107
+ ```python
108
+ # Prepare batch inputs
109
+ questions = [
110
+ "What is 2 + 2?<|start-latent|>",
111
+ "What is 5 * 3?<|start-latent|>",
112
+ "What is 10 - 4?<|start-latent|>",
113
+ ]
114
+ inputs = tokenizer(questions, return_tensors="pt", padding=True).to(model.device)
115
+
116
+ # Generate for batch
117
+ outputs = model.generate(
118
+ **inputs,
119
+ generation_config=generation_config,
120
+ num_return_sequences=1,
121
+ )
122
+
123
+ # Decode batch results
124
+ results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
125
+ for result in results:
126
+ print(result)
127
+ ```
128
+
129
+ ## Model Architecture
130
+
131
+ ### Projector Module
132
+
133
+ CODI includes an optional projector module that extends hidden states:
134
+
135
+ ```python
136
+ # Projector configuration (if enabled in model)
137
+ projector = nn.Sequential(
138
+ nn.Dropout(projector_dropout),
139
+ nn.Linear(hidden_size, projector_hidden_size),
140
+ nn.GELU(),
141
+ nn.Linear(projector_hidden_size, hidden_size),
142
+ nn.LayerNorm(hidden_size),
143
+ )
144
+ ```
145
+
146
+ The projector is used when `output_hidden_states=True` and `config.projector=True`.
147
+
148
+ ## Generation Parameters
149
+
150
+ ### LatentGenerationConfig
151
+
152
+ - `max_new_tokens` (int): Maximum number of tokens to generate
153
+ - `latent_length` (int): Number of latent tokens (default: 6)
154
+ - `latent_do_sample` (bool): Whether to use stochastic sampling
155
+ - `latent_do_sample_by` (str): Sampling method - `"dropout"` or `"noise"`
156
+ - `dropout_p` (float): Dropout probability for Monte Carlo Dropout (e.g., 0.1)
157
+ - `noise_std` (float): Standard deviation for Additive Gaussian Noise
158
+
159
+ ### Sampling Methods
160
+
161
+ 1. **Monte Carlo Dropout**: Randomly drops activations during forward passes
162
+ ```python
163
+ generation_config = LatentGenerationConfig(
164
+ latent_do_sample_by="dropout",
165
+ dropout_p=0.1,
166
+ # ...
167
+ )
168
+ ```
169
+
170
+ 2. **Additive Gaussian Noise**: Injects noise into latent embeddings
171
+ ```python
172
+ generation_config = LatentGenerationConfig(
173
+ latent_do_sample_by="noise",
174
+ noise_std=0.1,
175
+ # ...
176
+ )
177
+ ```
178
+
179
+ ## Answer Extraction
180
+
181
+ CODI uses standard number extraction from the generated text:
182
+
183
+ ```python
184
+ from src.paths import extract_answer_number
185
+
186
+ # Extract answer from generated text
187
+ answer = extract_answer_number(result)
188
+ print(f"Answer: {answer}")
189
+ ```
190
+
191
+ ## Evaluation
192
+
193
+ Run evaluation using the provided scripts:
194
+
195
+ ```bash
196
+ # For CODI (GPT-2 based models)
197
+ ./run_tests.sh
198
+ ```
199
+
200
+ ## Model Card
201
+
202
+ - **Paper**: [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745)
203
+ - **HuggingFace**: [dd101bb/latent-tts-codi](https://huggingface.co/dd101bb/latent-tts-codi)
204
+ - **Benchmarks**: GSM8K Test, GSM8K Hard, MultiArith
205
+
206
+ ## Citation
207
+
208
+ If you use this model, please cite:
209
+
210
+ ```bibtex
211
+ @misc{you2025paralleltesttimescalinglatent,
212
+ title={Parallel Test-Time Scaling for Latent Reasoning Models},
213
+ author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
214
+ year={2025},
215
+ eprint={2510.07745},
216
+ archivePrefix={arXiv},
217
+ primaryClass={cs.CL},
218
+ url={https://arxiv.org/abs/2510.07745},
219
+ }
220
+
221
+ @misc{shen2025codicompressingchainofthoughtcontinuous,
222
+ title={CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation},
223
+ author={Zhenyi Shen and Hanqi Yan and Linhai Zhang and Zhanghao Hu and Yali Du and Yulan He},
224
+ year={2025},
225
+ eprint={2502.21074},
226
+ archivePrefix={arXiv},
227
+ primaryClass={cs.CL},
228
+ url={https://arxiv.org/abs/2502.21074},
229
+ }
230
+ ```
added_tokens.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "<|end-latent|>": 50259,
3
+ "<|latent|>": 50260,
4
+ "<|start-latent|>": 50258,
5
+ "[PAD]": 50257
6
+ }
config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_function": "gelu_new",
3
+ "architectures": [
4
+ "CODIGPT2"
5
+ ],
6
+ "attn_pdrop": 0.1,
7
+ "bos_token_id": 50256,
8
+ "embd_pdrop": 0.1,
9
+ "eos_token_id": 50256,
10
+ "initializer_range": 0.02,
11
+ "latent_end_id": -100,
12
+ "latent_id": -100,
13
+ "latent_start_id": -100,
14
+ "layer_norm_epsilon": 1e-05,
15
+ "model_type": "gpt2",
16
+ "n_ctx": 1024,
17
+ "n_embd": 768,
18
+ "n_head": 12,
19
+ "n_inner": null,
20
+ "n_layer": 12,
21
+ "n_positions": 1024,
22
+ "projector": true,
23
+ "projector_dropout": 0.0,
24
+ "projector_hidden_size": 768,
25
+ "reorder_and_upcast_attn": false,
26
+ "resid_pdrop": 0.1,
27
+ "scale_attn_by_inverse_layer_idx": false,
28
+ "scale_attn_weights": true,
29
+ "summary_activation": null,
30
+ "summary_first_dropout": 0.1,
31
+ "summary_proj_to_labels": true,
32
+ "summary_type": "cls_index",
33
+ "summary_use_proj": true,
34
+ "task_specific_params": {
35
+ "text-generation": {
36
+ "do_sample": true,
37
+ "max_length": 50
38
+ }
39
+ },
40
+ "torch_dtype": "float32",
41
+ "transformers_version": "4.52.4",
42
+ "use_cache": true,
43
+ "vocab_size": 50260
44
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50256,
4
+ "eos_token_id": 50256,
5
+ "transformers_version": "4.52.4"
6
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20c3730ebd562f317c56d7e1ab0eaa9fb41ab5228fa2bba32b7ac4189e7c1554
3
+ size 502514824
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "50256": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "50257": {
13
+ "content": "[PAD]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "50258": {
21
+ "content": "<|start-latent|>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": false
27
+ },
28
+ "50259": {
29
+ "content": "<|end-latent|>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": false
35
+ },
36
+ "50260": {
37
+ "content": "<|latent|>",
38
+ "lstrip": false,
39
+ "normalized": true,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": false
43
+ }
44
+ },
45
+ "bos_token": "<|endoftext|>",
46
+ "clean_up_tokenization_spaces": false,
47
+ "eos_token": "<|endoftext|>",
48
+ "extra_special_tokens": {},
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "padding_side": "left",
52
+ "tokenizer_class": "GPT2Tokenizer",
53
+ "unk_token": "<|endoftext|>"
54
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff