Update Readme.md

Files changed (3) hide show

.gitignore DELETED Viewed

	@@ -1 +0,0 @@
1	- .env

README.md CHANGED Viewed

@@ -4,6 +4,8 @@ tags:
   - custom_generate
 ---
 ## Overview
 Most output token sampling techniques operate on the probability scores post temperature being applied. The softmax function distorts the underlying logit scores distribution making it hard to know a meaningful top-p/top-k value to set.
@@ -35,6 +37,8 @@ This implementation of Top-NSigma requires the user to pass in a new argument `n
 We'll use this to filter out tokens whose logit scores are `n_sigma` number of standard deviations below the max logit score.
 ## Output Type changes
 (none)
@@ -48,6 +52,17 @@ model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", devic
 inputs = tokenizer(["The quick brown"], return_tensors="pt").to(model.device)
 # There is a print message hardcoded in the custom generation method
-gen_out = model.generate(**inputs, left_padding=5, custom_generate="Pramodith/topN_sigma_generation", trust_remote_code=True)
 print(tokenizer.batch_decode(gen_out))
 ```

   - custom_generate
 ---
 ## Overview
+This generation sampling method is based on the paper [Top-N Sigma: A Simple and Effective Sampling Method for Language Models](https://openreview.net/pdf/1e221c8eedaf42558abc5dca4637b3378297582b.pdf).
 Most output token sampling techniques operate on the probability scores post temperature being applied. The softmax function distorts the underlying logit scores distribution making it hard to know a meaningful top-p/top-k value to set.
 We'll use this to filter out tokens whose logit scores are `n_sigma` number of standard deviations below the max logit score.
+The authors recommend using `n_sigma=1.0` for most use cases, but you can experiment with values in the range **(0.0, 2√3]**.
 ## Output Type changes
 (none)
 inputs = tokenizer(["The quick brown"], return_tensors="pt").to(model.device)
 # There is a print message hardcoded in the custom generation method
+gen_out = model.generate(**inputs, n_sigma=1.0, custom_generate="Pramodith/topN_sigma_generation", trust_remote_code=True)
 print(tokenizer.batch_decode(gen_out))
 ```
+### Citation
+```bibtex
+@inproceedings{tang2025top,
+    title={Top-n𝜎: Eliminating Noise in Logit Space for Robust Token Sampling of LLM},
+    author={Tang, Chenxia and Liu, Jianchun and Xu, Hongli and Huang, Liusheng},
+    booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+    pages={10758--10774},
+    year={2025}
+}
+```

custom_generate/generate.py CHANGED Viewed

@@ -1,13 +1,13 @@
 import torch
-def top_n_sigma_sampling(logits, temperature, n_sigma=4):
     """
     Perform topN-sigma sampling on the logits.
     Args:
         logits (torch.Tensor): The logits from the model of shape (batch_size, vocab_size).
         temperature (float): The temperature to apply to the logits.
-        n_sigma (int): The number of standard deviations to use for filtering.
     Returns:
         torch.Tensor: The filtered logits after applying topN-sigma sampling.
@@ -20,7 +20,7 @@ def top_n_sigma_sampling(logits, temperature, n_sigma=4):
     return filtered_logits
 @torch.inference_mode()
-def generate(model, input_ids, generation_config=None, n_sigma=4, **kwargs):
     """
     Generate text using topN-sigma sampling based on the paper:
     https://openreview.net/pdf/1e221c8eedaf42558abc5dca4637b3378297582b.pdf

 import torch
+def top_n_sigma_sampling(logits:torch.Tensor, temperature:float, n_sigma:float) -> torch.Tensor:
     """
     Perform topN-sigma sampling on the logits.
     Args:
         logits (torch.Tensor): The logits from the model of shape (batch_size, vocab_size).
         temperature (float): The temperature to apply to the logits.
+        n_sigma (float): The number of standard deviations to use for filtering.
     Returns:
         torch.Tensor: The filtered logits after applying topN-sigma sampling.
     return filtered_logits
 @torch.inference_mode()
+def generate(model, input_ids, generation_config=None, n_sigma:float=1.0, **kwargs):
     """
     Generate text using topN-sigma sampling based on the paper:
     https://openreview.net/pdf/1e221c8eedaf42558abc5dca4637b3378297582b.pdf