File size: 15,558 Bytes
bd0677e
 
c23cdd9
 
 
 
 
 
 
 
 
 
 
 
bd0677e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a988abd
 
637630a
 
 
bd0677e
 
 
 
 
a988abd
 
bd0677e
 
 
 
 
 
 
 
 
122d947
e6945e3
122d947
 
 
 
 
e6945e3
 
 
122d947
 
 
e6945e3
122d947
 
 
 
 
 
e6945e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2d5275c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122d947
 
 
2f3d4d8
122d947
 
 
 
2f3d4d8
122d947
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e6945e3
2d5275c
 
 
 
 
 
 
 
 
 
 
e6945e3
122d947
 
 
 
 
bd0677e
6b2d1c9
bd0677e
6b2d1c9
bd0677e
 
 
 
a916581
bd0677e
 
 
a916581
 
 
 
 
 
 
 
bd0677e
a916581
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
---
library_name: transformers
license: mit
datasets:
- Zhengping/UNLI
- Zhengping/UNLI-style-synthetic
language:
- en
metrics:
- pearsonr
- spearmanr
- accuracy
base_model:
- Qwen/Qwen2.5-14B-Instruct
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** Liaoyaqi Wang, Zhengping Jiang, Anqi Liu, Benjamin Van Durme
- **Model type:** Decoding-based Regression Model (Classification)
- **Language(s) (NLP):** `en`
- **License:** mit
- **Finetuned from model [optional]:** Qwen/Qwen2.5-14B-Instruct

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [Decoding-based Regression](https://github.com/zipJiang/decoding-based-regression.git)
- **Paper [optional]:** [Always Tell Me The Odds: Fine-grained Conditional Probability Estimation](https://arxiv.org/pdf/2505.01595)

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

```python
import enum
import transformers
import torch
from transformers.pipelines import PIPELINE_REGISTRY
from transformers import (
    pipeline,
    Pipeline,
    TextGenerationPipeline,
    PreTrainedTokenizer,
    AutoModelForCausalLM,
    PreTrainedTokenizer
)
from transformers.pipelines.text_generation import Chat, ReturnType
from typing import (
    Dict,
    Callable,
    Tuple,
    List,
)


class LevelToScorePipeline(TextGenerationPipeline):
    
    def __init__(
        self,
        level_to_score_func: Callable[[Tuple[torch.FloatTensor], PreTrainedTokenizer], Tuple[List[float], List[List[float]]]],
        *args,
        **kwargs
    ):
        super().__init__(*args, **kwargs)
        self._level_to_score_func = level_to_score_func
        
    def preprocess(
        self,
        prompt_text,
        prefix="",
        handle_long_generation=None,
        add_special_tokens=None,
        truncation=None,
        padding=None,
        max_length=None,
        continue_final_message=None,
        **generate_kwargs,
    ):
        # Only set non-None tokenizer kwargs, so as to rely on the tokenizer's defaults
        tokenizer_kwargs = {
            "add_special_tokens": add_special_tokens,
            "truncation": truncation,
            "padding": padding,
            "max_length": max_length,
        }
        tokenizer_kwargs = {key: value for key, value in tokenizer_kwargs.items() if value is not None}

        if isinstance(prompt_text, Chat):
            tokenizer_kwargs.pop("add_special_tokens", None)  # ignore add_special_tokens on chats
            # If the user passes a chat that ends in an assistant message, we treat it as a prefill by default
            # because very few models support multiple separate, consecutive assistant messages
            if continue_final_message is None:
                continue_final_message = prompt_text.messages[-1]["role"] == "assistant"
            inputs = self.tokenizer.apply_chat_template(
                prompt_text.messages,
                add_generation_prompt=not continue_final_message,
                continue_final_message=continue_final_message,
                return_dict=True,
                return_tensors=self.framework,
                **tokenizer_kwargs,
            )
        else:
            inputs = self.tokenizer(prefix + prompt_text, return_tensors=self.framework, **tokenizer_kwargs)

        inputs["prompt_text"] = prompt_text

        if handle_long_generation == "hole":
            cur_len = inputs["input_ids"].shape[-1]
            if "max_new_tokens" in generate_kwargs:
                new_tokens = generate_kwargs["max_new_tokens"]
            else:
                new_tokens = generate_kwargs.get("max_length", self.generation_config.max_length) - cur_len
                if new_tokens < 0:
                    raise ValueError("We cannot infer how many new tokens are expected")
            if cur_len + new_tokens > self.tokenizer.model_max_length:
                keep_length = self.tokenizer.model_max_length - new_tokens
                if keep_length <= 0:
                    raise ValueError(
                        "We cannot use `hole` to handle this generation the number of desired tokens exceeds the"
                        " models max length"
                    )

                inputs["input_ids"] = inputs["input_ids"][:, -keep_length:]
                if "attention_mask" in inputs:
                    inputs["attention_mask"] = inputs["attention_mask"][:, -keep_length:]

        return inputs
    
    def _forward(self, model_inputs, **generate_kwargs):
        input_ids = model_inputs["input_ids"]
        attention_mask = model_inputs.get("attention_mask", None)
        # Allow empty prompts
        if input_ids.shape[1] == 0:
            input_ids = None
            attention_mask = None
            in_b = 1
        else:
            in_b = input_ids.shape[0]
        prompt_text = model_inputs.pop("prompt_text")

        # If there is a prefix, we may need to adjust the generation length. Do so without permanently modifying
        # generate_kwargs, as some of the parameterization may come from the initialization of the pipeline.
        prefix_length = generate_kwargs.pop("prefix_length", 0)
        if prefix_length > 0:
            has_max_new_tokens = "max_new_tokens" in generate_kwargs or (
                "generation_config" in generate_kwargs
                and generate_kwargs["generation_config"].max_new_tokens is not None
            )
            if not has_max_new_tokens:
                generate_kwargs["max_length"] = generate_kwargs.get("max_length") or self.generation_config.max_length
                generate_kwargs["max_length"] += prefix_length
            has_min_new_tokens = "min_new_tokens" in generate_kwargs or (
                "generation_config" in generate_kwargs
                and generate_kwargs["generation_config"].min_new_tokens is not None
            )
            if not has_min_new_tokens and "min_length" in generate_kwargs:
                generate_kwargs["min_length"] += prefix_length

        # User-defined `generation_config` passed to the pipeline call take precedence
        if "generation_config" not in generate_kwargs:
            generate_kwargs["generation_config"] = self.generation_config
            
        generate_kwargs["output_scores"] = not generate_kwargs.get("do_sample", False)
        generate_kwargs["return_dict_in_generate"] = True

        generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
        
        logits = None
        
        # TODO: check good default
        if generate_kwargs.get("return_scores", True):
            assert not generate_kwargs.get("do_sample", False), "return_logits=True is only supported for do_sample=False"
            
            # Proceed to process logits and convert to score average.
            # next_token_logits is [batch_size, vocab_size]
            # raw_logits is a tuple of ([next_token_logits, past_key_values])

            logits = generated_sequence.scores

        out_b = generated_sequence.sequences.shape[0]
        if self.framework == "pt":
            generated_sequence = generated_sequence.sequences.reshape(in_b, out_b // in_b, *generated_sequence.sequences.shape[1:])
        # elif self.framework == "tf":
        #     generated_sequence = tf.reshape(generated_sequence, (in_b, out_b // in_b, *generated_sequence.shape[1:]))
        return {"generated_sequence": generated_sequence, "input_ids": input_ids, "prompt_text": prompt_text, "logits": logits}
    
    def postprocess(
        self,
        model_outputs,
        return_type=ReturnType.FULL_TEXT,
        clean_up_tokenization_spaces=True,
        continue_final_message=None,
    ):
        generated_sequence = model_outputs["generated_sequence"][0]
        input_ids = model_outputs["input_ids"]
        prompt_text = model_outputs["prompt_text"]
        logits = model_outputs["logits"]
        
        #TODO: This is now making many assumptions about how the logits are ordered,
        # Should think about how to make this explicit
        scores, selective_logits = self._level_to_score_func(logits, self.tokenizer)
        
        generated_sequence = generated_sequence.numpy().tolist()
        records = []
        for sequence in generated_sequence:
            if return_type == ReturnType.TENSORS:
                record = {"generated_token_ids": sequence}
            elif return_type in {ReturnType.NEW_TEXT, ReturnType.FULL_TEXT}:
                # Decode text
                text = self.tokenizer.decode(
                    sequence,
                    skip_special_tokens=True,
                    clean_up_tokenization_spaces=clean_up_tokenization_spaces,
                )

                # Remove PADDING prompt of the sequence if XLNet or Transfo-XL model is used
                if input_ids is None:
                    prompt_length = 0
                else:
                    prompt_length = len(
                        self.tokenizer.decode(
                            input_ids[0],
                            skip_special_tokens=True,
                            clean_up_tokenization_spaces=clean_up_tokenization_spaces,
                        )
                    )

                all_text = text[prompt_length:]
                if return_type == ReturnType.FULL_TEXT:
                    if isinstance(prompt_text, str):
                        all_text = prompt_text + all_text
                    elif isinstance(prompt_text, Chat):
                        if continue_final_message is None:
                            # If the user passes a chat ending in an assistant message, we treat it as a prefill by
                            # default because very few models support multiple separate, consecutive assistant messages
                            continue_final_message = prompt_text.messages[-1]["role"] == "assistant"
                        if continue_final_message:
                            # With assistant prefill, concat onto the end of the last message
                            all_text = list(prompt_text.messages)[:-1] + [
                                {
                                    "role": prompt_text.messages[-1]["role"],
                                    "content": prompt_text.messages[-1]["content"] + all_text,
                                }
                            ]
                        else:
                            # When we're not starting from a prefill, the output is a new assistant message
                            all_text = list(prompt_text.messages) + [{"role": "assistant", "content": all_text}]
                record = {
                    "generated_text": all_text,
                    "score": scores[0],
                    "selective_logits": selective_logits[0]
                }
            records.append(record)

        return records


class SingleLabelRankDict:
    def __init__(
        self,
        rank_dict: Dict[Text, Any]
    ):
        self._rank_dict = rank_dict
        
    def __len__(self) -> int:
        return len(self._rank_dict)
        
    def get_rank_dict(self, tokenizer: PreTrainedTokenizer) -> Dict[int, Any]:
        return {tokenizer.convert_tokens_to_ids([token])[0]: value for token, value in self._rank_dict.items()}
    
    def to_tokenizer(self, tokenizer: PreTrainedTokenizer) -> PreTrainedTokenizer:
        """Augment tokenizer vocab with `rank_dict` IN-PLACE.
        """
        vocabs: List[Text] = self._rank_dict.keys()
        new_vocab = [vocab for vocab in vocabs if vocab not in tokenizer.get_vocab()]
        tokenizer.add_tokens(new_vocab)
        return tokenizer
        def from_tokenizer(cls, tokenizer: PreTrainedTokenizer) -> "SingleLabelRankDict":
        vocab = tokenizer.get_vocab()
        rank_dict = {}
        pattern = re.compile(r" <\|label_level_(\d+)\|>")
        
        for token in vocab.keys():
            match = pattern.match(token)
            if match:
                value = int(match.group(1))
                # normalized_value = value / (len(vocab) - 1)
                rank_dict[token] = value
                
        # normalize rank_values
        num_levels = max(rank_dict.values()) + 1
        for token in rank_dict.keys():
            rank_dict[token] = 1. / num_levels * (rank_dict[token] + 0.5)
        
        return cls(rank_dict=rank_dict)


model = transformers.AutoModelForCausalLM.from_pretrained(
    "Zhengping/conditional-probability-regression",
    torch_dtype="auto",
    attn_implementation="flash_attention_2",
)
tokenizer = transformers.AutoTokenizer.from_pretrained(
    "Zhengping/conditional-probability-regression",
)

rank_dict = SingleLabelRankDict.from_tokenizer(tokenizer)

PIPELINE_REGISTRY.register_pipeline(
    "level-to-score",
    pipeline_class=LevelToScorePipeline,
    pt_model=AutoModelForCausalLM
)

# This allows fine-grained labeling, the greedy decoding gives a coarse score,
# one can also attach their own level-to-score function to the pipeline, e.g. using UNLI
# label transformation to get it more binarized
def _level_to_score_func(
    logits: Tuple[torch.FloatTensor],
    tokenizer: PreTrainedTokenizer
) -> Tuple[List[float], List[float]]:
    """ """
    logits = logits[0]
    num_labels = len(rank_dict)
    considering_ids = tokenizer.convert_tokens_to_ids([f" <|label_level_{i}|>" for i in range(num_labels)])
    selective_logits = torch.index_select(logits, 1, torch.tensor(considering_ids, device=logits.device))
    step_size = 1 / num_labels
    expectation = torch.tensor([[i * step_size + 1 / 2 * step_size for i in range(num_labels)]], device=selective_logits.device)
    scores = torch.softmax(selective_logits, dim=-1) @ expectation.T
    scores = scores.squeeze(-1).tolist()
    return scores, selective_logits.tolist()

pipe = pipeline(
    "level-to-score",
    model=model,
    max_new_tokens=2,
    tokenizer=tokenizer,
    device=0,
    level_to_score_func=_level_to_score_func,
    torch_dtype=torch.bfloat16,
)

template = UNLITemplate()

premise = "Sam is sleeping."
hypothesis = "Sam is awake."

inputs = [
    {
        "role": "user",
        "content": "### Question: Given the premise \"{premise}\", how likely is it that the hypothesis \"{hypothesis}\" is true?\n\n".format(
            premise=premise,
            hypothesis=hypothesis
        )
    },
    {
        "role": "assitant",
        "content": "### Answer:"
    }
]
    
result = pipe(inputs)
print(result)
```


## Use with vLLM

`TODO`


#### Summary

LLM-based Fine-grained Conditional Probability Estimation

## Citation [optional]

```bibtex
@article{wang2025always,
  title={Always Tell Me The Odds: Fine-grained Conditional Probability Estimation},
  author={Wang, Liaoyaqi and Jiang, Zhengping and Liu, Anqi and Van Durme, Benjamin},
  journal={arXiv preprint arXiv:2505.01595},
  year={2025}
}
```

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->