kashif
/

DeepConf

Transformers

custom_generate

sampling

Model card Files Files and versions

xet

Community

kashif HF Staff commited on Sep 1, 2025

Commit

4a2373b

verified ·

1 Parent(s): 93f1f4c

Update README.md

Browse files

Files changed (1) hide show

README.md +62 -0

README.md CHANGED Viewed

@@ -46,7 +46,69 @@ outputs = model.generate(
     custom_generate="kashif/DeepConf",  # Hugging Face Hub repo
     trust_remote_code=True
 )
 ```
 ## Requirements

     custom_generate="kashif/DeepConf",  # Hugging Face Hub repo
     trust_remote_code=True
 )
+```
+## Calibration (DeepConf-low/high)
+DeepConf’s online stopping threshold is derived from a short warmup phase. You collect warmup trace confidences, then pass them into the generator to auto-derive the threshold for either DeepConf-low (aggressive) or DeepConf-high (permissive).
+) Warmup (num_return_sequences): collect per-trace confidences (Ct = min(step_confidences))
+```python
+from transformers import GenerationConfig
+prompt = "Explain artificial intelligence."
+Ninit = 8  # number of warmup traces
+warmup_C = []
+warm_cfg = GenerationConfig.from_model_config(model.config)
+warm_cfg.do_sample = True
+warm_cfg.temperature = 0.7
+warm_cfg.top_p = 0.95
+warm_cfg.max_new_tokens = 64
+warm_cfg.enable_conf = True
+warm_cfg.return_dict_in_generate = True
+warm_cfg.output_confidences = True
+warm_cfg.num_return_sequences = Ninit
+# IMPORTANT: Do not set `warm_cfg.threshold` here. Warmup should not apply online early stopping.
+out = model.generate(
+    **tokenizer(prompt, return_tensors="pt"),
+    generation_config=warm_cfg,
+    custom_generate=kashif/DeepConf",
+    trust_remote_code=True,
+)
+# Per-trace Ct = min over steps
+warmup_C = out.confidences.min(dim=1).values.tolist()
+```
+2) Online: pass warmup confidences to auto-derive threshold
+```python
+gen_cfg = GenerationConfig.from_model_config(model.config)
+gen_cfg.enable_conf = True
+gen_cfg.return_dict_in_generate = True
+gen_cfg.output_confidences = True
+# Choose a variant:
+# - DeepConf-low (aggressive): eta=0.1 → 90th percentile threshold
+# - DeepConf-high (permissive): eta=0.9 → 10th percentile threshold
+gen_cfg.deepconf_variant = "low"  # or "high"
+# Optional: override eta explicitly
+# gen_cfg.deepconf_eta = 0.1  # defaults: 0.1 for low, 0.9 for high
+# Provide warmup confidences; the threshold will be derived internally
+gen_cfg.deepconf_warmup_confidences = warmup_C
+out = model.generate(
+    **tokenizer(prompt, return_tensors="pt"),
+    custom_generate="kashif/DeepConf",
+    trust_remote_code=True,
+    generation_config=gen_cfg,
+    max_new_tokens=128,
+)
 ```
 ## Requirements