maxholsman
/

fuzzy-spec-dec

Model card Files Files and versions

xet

Community

maxholsman commited on 29 days ago

Commit

2a51a8b

verified ·

1 Parent(s): 4a9570a

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +83 -0

README.md CHANGED Viewed

@@ -1,3 +1,86 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+# Fuzzy Speculative Decoding
+Custom generate function for fuzzy speculative decoding with support for KL divergence, Jensen-Shannon divergence, and draft token-based acceptance criteria. This implementation extends the standard speculative decoding algorithm with additional divergence metrics for more flexible candidate acceptance.
+## Features
+- **Fuzzy Speculative Decoding (FSD)**: Accepts candidate tokens based on distribution divergence thresholds
+- **Multiple Divergence Types**:
+  - `kl`: KL divergence between candidate and target distributions
+  - `js`: Jensen-Shannon divergence
+  - `draft_tokens`: Absolute difference in draft token probabilities
+- **Standard Speculative Decoding**: Falls back to standard speculative decoding acceptance when FSD threshold is not met
+- **Raw Logits Support**: Returns both processed and raw logits for advanced use cases
+## Installation
+```bash
+pip install -r custom_generate/requirements.txt
+```
+## Usage
+### Basic Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load models
+target_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
+assistant_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
+# Prepare input
+prompt = "What is the capital of France?"
+inputs = tokenizer(prompt, return_tensors="pt")
+# Generate with custom fuzzy speculative decoding
+outputs = target_model.generate(
+    **inputs,
+    assistant_model=assistant_model,
+    custom_generate="maxholsman/fuzzy-spec-dec",
+    trust_remote_code=True,
+    fsd_threshold=0.0,        # FSD acceptance threshold
+    fsd_div_type="kl",        # Divergence type: "kl", "js", or "draft_tokens"
+    do_sample=True,
+    temperature=0.7,
+    max_new_tokens=100,
+    output_logits=True,       # Enable raw logits output
+)
+# Decode result
+generated_text = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
+print(generated_text)
+```
+### Custom Parameters
+- **`fsd_threshold`** (float, default: 0.0): Threshold for fuzzy speculative decoding acceptance. Tokens with divergence below this threshold are automatically accepted.
+- **`fsd_div_type`** (str, default: "kl"): Type of divergence metric to use:
+  - `"kl"`: KL divergence (D_KL(candidate || target))
+  - `"js"`: Jensen-Shannon divergence
+  - `"draft_tokens"`: Absolute difference in draft token probabilities
+### How It Works
+1. The assistant model generates candidate tokens
+2. The target model evaluates these candidates
+3. For each candidate position:
+   - If FSD divergence ≤ threshold: token is accepted
+   - Otherwise: standard speculative decoding acceptance is applied
+4. Accepted tokens are kept, rejected tokens trigger resampling from the target model
+## Requirements
+- `torch>=2.0.0`
+- `transformers>=4.40.0`
+- `scikit-learn` (optional, for confidence threshold features)
+## License
+Apache 2.0