maxholsman commited on
Commit
abc7703
·
verified ·
1 Parent(s): 86ac9f6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +18 -6
README.md CHANGED
@@ -4,9 +4,21 @@ license: apache-2.0
4
 
5
  # Fuzzy Speculative Decoding
6
 
7
- Custom generate function for fuzzy speculative decoding with support for KL divergence, Jensen-Shannon divergence, and draft token-based acceptance criteria. This implementation extends the standard speculative decoding algorithm with additional divergence metrics for more flexible candidate acceptance.
8
 
9
- This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [References](#citation) section below for full citation details.
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  ## Features
12
 
@@ -62,11 +74,11 @@ print(generated_text)
62
 
63
  ### Custom Parameters
64
 
65
- - **`fsd_threshold`** (float, default: 0.0): Threshold for fuzzy speculative decoding acceptance. Tokens with divergence below this threshold are automatically accepted.
66
  - **`fsd_div_type`** (str, default: "kl"): Type of divergence metric to use:
67
- - `"kl"`: KL divergence (D_KL(candidate || target))
68
- - `"js"`: Jensen-Shannon divergence
69
- - `"draft_tokens"`: Absolute difference in draft token probabilities
70
 
71
  ### How It Works
72
 
 
4
 
5
  # Fuzzy Speculative Decoding
6
 
7
+ Fuzzy Speculative Decoding (FSD) is a decoding algorithm that generalizes standard Speculative Decoding (SD) by accepting candidate tokens based on distribution divergence thresholds rather than enforcing strict distributional equivalence. This enables a **tunable tradeoff between generation quality and inference speed**, allowing users to flexibly balance accuracy and runtime based on their specific needs.
8
 
9
+ ## Motivation
10
+
11
+ Standard Speculative Decoding enforces strict distributional equivalence to the target model, which limits potential speedups. However, distributions of near-equivalence often achieve comparable outcomes in practice. By allowing controlled divergence from the target model distribution, FSD enables users to trade small deviations in generation quality for significant inference speed gains.
12
+
13
+ **Key Benefits:**
14
+ - **Tunable Quality-Speed Tradeoff**: Adjust the `fsd_threshold` parameter to control the balance between generation quality and inference speed
15
+ - **Significant Speed Improvements**: Achieve runtime improvements of over 5 tokens per second faster than standard SD with only an approximate 2% absolute reduction in benchmark accuracy
16
+ - **Maintained Performance**: In many cases, FSD matches standard SD benchmark accuracy while running over 2 tokens per second faster
17
+ - **Flexible Acceptance Criteria**: Choose from multiple divergence metrics (KL divergence, Jensen-Shannon divergence, or draft token probabilities) to best suit your use case
18
+
19
+ This implementation extends the standard speculative decoding algorithm with additional divergence metrics for more flexible candidate acceptance, supporting KL divergence, Jensen-Shannon divergence, and draft token-based acceptance criteria.
20
+
21
+ This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [Citation](#citation) section below for full citation details.
22
 
23
  ## Features
24
 
 
74
 
75
  ### Custom Parameters
76
 
77
+ - **`fsd_threshold`** (float, default: 0.0): Threshold for fuzzy speculative decoding acceptance. Tokens with divergence below this threshold are automatically accepted. **Lower values** enforce stricter equivalence (closer to standard SD, higher quality but slower), while **higher values** allow more divergence (faster inference with potential quality tradeoffs). Tune this parameter to achieve your desired quality-speed tradeoff.
78
  - **`fsd_div_type`** (str, default: "kl"): Type of divergence metric to use:
79
+ - `"kl"`: KL divergence (D_KL(candidate || target)) - measures how much information is lost when using the candidate distribution to approximate the target
80
+ - `"js"`: Jensen-Shannon divergence - a symmetric and bounded measure of distribution similarity
81
+ - `"draft_tokens"`: Absolute difference in draft token probabilities - simpler metric based on probability differences
82
 
83
  ### How It Works
84