kashif
/

DeepConf

Transformers

custom_generate

sampling

Model card Files Files and versions

xet

Community

kashif HF Staff commited on Oct 20, 2025

Commit

cfa4f52

1 Parent(s): 9ed69b6

fix generation

Browse files

Files changed (2) hide show

README.md +25 -1
custom_generate/generate.py +10 -20

README.md CHANGED Viewed

@@ -13,13 +13,14 @@ This repository implements the DeepCONF (Deep Confidence-based Early Stopping) g
 ## Overview
-DeepCONF monitors the confidence of generated tokens and stops generation when confidence falls below a threshold.
 ## Parameters
 - `enable_conf` (bool): Whether to enable the DeepCONF strategy. Defaults to `False`.
 - `window_size` (int): Size of the sliding window for confidence calculation. Defaults to `2048`.
 - `threshold` (float): Confidence threshold for early stopping. Defaults to `17.0`.
 - `output_confidences` (bool): If `True` and `return_dict_in_generate=True`, returns a per-step confidence tensor alongside generated sequences for debugging/visualization.
 ## Usage
@@ -108,6 +109,29 @@ out = model.generate(
 )
 ```
 ## Requirements
 - PyTorch >= 1.13.0

 ## Overview
+DeepCONF monitors the confidence of generated tokens and stops generation when confidence falls below a threshold. The confidence is calculated as the negative mean log probability of the top-k tokens from the full vocabulary (before sampling/filtering is applied), following the methodology from the [official DeepConf implementation](https://github.com/facebookresearch/deepconf).
 ## Parameters
 - `enable_conf` (bool): Whether to enable the DeepCONF strategy. Defaults to `False`.
 - `window_size` (int): Size of the sliding window for confidence calculation. Defaults to `2048`.
 - `threshold` (float): Confidence threshold for early stopping. Defaults to `17.0`.
+- `conf_topk` (int): Number of top tokens to use for confidence calculation from the full vocabulary. Defaults to `20` (matches official implementation).
 - `output_confidences` (bool): If `True` and `return_dict_in_generate=True`, returns a per-step confidence tensor alongside generated sequences for debugging/visualization.
 ## Usage
 )
 ```
+## Technical Details
+### Confidence Calculation
+The confidence score for each generated token is calculated as follows:
+1. **Extract top-k tokens**: Get the top-k (default: 20) tokens with highest probabilities from the full vocabulary
+2. **Compute log probabilities**: Calculate log probabilities for these top-k tokens
+3. **Average**: The confidence score is `-mean(log_probs)` of the top-k tokens
+This approach:
+- Uses the **full probability distribution** (before any top-k/top-p/temperature filtering)
+- Always considers a **fixed number of tokens** (conf_topk=20)
+- Naturally **includes the sampled token** if it's in the top-k
+- Matches the **official DeepConf implementation** exactly
+### Online Stopping
+The online method uses a sliding window of confidence scores:
+- Maintains a window of the last `window_size` (default: 2048) confidence scores
+- Calculates the mean confidence over this window
+- Stops generation when: `mean_confidence < threshold`
 ## Requirements
 - PyTorch >= 1.13.0

custom_generate/generate.py CHANGED Viewed

@@ -51,6 +51,7 @@ def generate(
     enable_conf = getattr(generation_config, "enable_conf", False)
     window_size = getattr(generation_config, "window_size", 2048)
     threshold = getattr(generation_config, "threshold", 17.0)  # Default threshold for confidence (positive value)
     # If DeepCONF is not enabled, fall back to standard sampling
     if not enable_conf:
@@ -197,11 +198,10 @@ def generate(
         else:
             next_tokens = torch.argmax(next_token_scores, dim=-1)
-        # Calculate confidence using only top-k/top-p filtered candidates (post-logits processors),
-        # excluding the sampled token.
-        # We consider candidates where logits are finite after warpers (e.g., top-k/top-p/temperature).
-        logprobs = F.log_softmax(next_token_scores, dim=-1)
-        candidate_mask = torch.isfinite(next_token_scores)
         deepconf_stopping = torch.ones(batch_size, dtype=torch.bool, device=input_ids.device)
         step_conf_values = [0.0] * batch_size  # collect per-sequence confidences for this step (full batch)
@@ -210,21 +210,11 @@ def generate(
             if not unfinished_sequences[i]:
                 continue
-            # Count valid candidates
-            num_candidates = int(candidate_mask[i].sum().item())
-            if num_candidates <= 1:
-                conf = 0.0
-            else:
-                # Sum logprobs over valid candidates and exclude the sampled token's logprob
-                total_lp = torch.sum(logprobs[i][candidate_mask[i]])
-                selected_lp = (
-                    logprobs[i, next_tokens[i]]
-                    if candidate_mask[i, next_tokens[i]]
-                    else torch.tensor(0.0, device=logprobs.device)
-                )
-                denom = num_candidates - 1
-                # Negative mean of non-selected candidate logprobs
-                conf = -((total_lp - selected_lp) / denom).item()
             # Update tracking structures
             if len(conf_group_lists[i]) >= window_size:

     enable_conf = getattr(generation_config, "enable_conf", False)
     window_size = getattr(generation_config, "window_size", 2048)
     threshold = getattr(generation_config, "threshold", 17.0)  # Default threshold for confidence (positive value)
+    conf_topk = getattr(generation_config, "conf_topk", 20)  # Number of top tokens for confidence calculation
     # If DeepCONF is not enabled, fall back to standard sampling
     if not enable_conf:
         else:
             next_tokens = torch.argmax(next_token_scores, dim=-1)
+        # Calculate confidence using top-k tokens from the full probability distribution
+        # (before any filtering), following the official DeepConf implementation.
+        # This uses the raw logits (next_token_logits) before warpers are applied.
+        probs = F.softmax(next_token_logits, dim=-1)
         deepconf_stopping = torch.ones(batch_size, dtype=torch.bool, device=input_ids.device)
         step_conf_values = [0.0] * batch_size  # collect per-sequence confidences for this step (full batch)
             if not unfinished_sequences[i]:
                 continue
+            # Get top-k tokens from full probability distribution
+            top_probs, _ = torch.topk(probs[i], k=conf_topk, dim=-1)
+            log_probs = torch.log(top_probs)
+            # Confidence is negative mean of log probabilities of top-k tokens
+            conf = -log_probs.mean().item()
             # Update tracking structures
             if len(conf_group_lists[i]) >= window_size: