maxholsman
/

fuzzy-spec-dec

Model card Files Files and versions

xet

Community

maxholsman commited on Jan 6

Commit

ffe950a

verified ·

1 Parent(s): 2bc36c6

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ Standard Speculative Decoding enforces strict distributional equivalence to the
 - **Tunable Quality-Speed Tradeoff**: Adjust the `fsd_threshold` parameter to control the balance between generation quality and inference speed
 - **Significant Speed Improvements**: Achieve runtime improvements of over 5 tokens per second faster than standard SD with only an approximate 2% absolute reduction in benchmark accuracy
 - **Maintained Performance**: In many cases, FSD can match standard SD benchmark accuracy while running over 2 tokens per second faster
-- **Flexible Acceptance Criteria**: Choose from multiple divergence metrics (KL divergence, Jensen-Shannon divergence, or draft token probabilities) to best suit your use case
 This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [Citation](#citation) section below for full citation details.
@@ -23,7 +23,7 @@ This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tu
 - **Fuzzy Speculative Decoding (FSD)**: Accepts candidate tokens based on distribution divergence thresholds
 - **Multiple Divergence Types**:
   - `kl`: KL divergence between candidate and target distributions
-  - `js`: Jensen-Shannon divergence
   - `draft_tokens`: Absolute difference in draft token probabilities
 - **Standard Speculative Decoding**: Falls back to standard speculative decoding acceptance when FSD threshold is not met
@@ -67,7 +67,7 @@ print(generated_text)
 - **`fsd_threshold`** (float, default: 0.0): Threshold for fuzzy speculative decoding acceptance. Tokens with divergence below this threshold are automatically accepted. **Lower values** enforce stricter equivalence (closer to standard SD, higher quality but slower), while **higher values** allow more divergence (faster inference with potential quality tradeoffs). Tune this parameter to achieve your desired quality-speed tradeoff.
 - **`fsd_div_type`** (str, default: "kl"): Type of divergence metric to use:
   - `"kl"`: KL divergence (D_KL(candidate || target)) - measures how much information is lost when using the candidate distribution to approximate the target
-  - `"js"`: Jensen-Shannon divergence - a symmetric and bounded measure of distribution similarity
   - `"draft_tokens"`: Absolute difference between draft and target model probability of drafted token
 - **`track_acceptance_metrics`** (bool, default: False): Whether to track and return draft token acceptance statistics. When enabled, the output includes:
   - `draft_token_acceptance_rate`: Ratio of accepted draft tokens to total draft tokens

 - **Tunable Quality-Speed Tradeoff**: Adjust the `fsd_threshold` parameter to control the balance between generation quality and inference speed
 - **Significant Speed Improvements**: Achieve runtime improvements of over 5 tokens per second faster than standard SD with only an approximate 2% absolute reduction in benchmark accuracy
 - **Maintained Performance**: In many cases, FSD can match standard SD benchmark accuracy while running over 2 tokens per second faster
+- **Flexible Acceptance Criteria**: Choose from multiple divergence metrics (KL divergence, JS divergence variant, or draft token probabilities) to best suit your use case
 This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [Citation](#citation) section below for full citation details.
 - **Fuzzy Speculative Decoding (FSD)**: Accepts candidate tokens based on distribution divergence thresholds
 - **Multiple Divergence Types**:
   - `kl`: KL divergence between candidate and target distributions
+  - `js`: JS divergence variant (computed using KL divergence with midpoint distribution)
   - `draft_tokens`: Absolute difference in draft token probabilities
 - **Standard Speculative Decoding**: Falls back to standard speculative decoding acceptance when FSD threshold is not met
 - **`fsd_threshold`** (float, default: 0.0): Threshold for fuzzy speculative decoding acceptance. Tokens with divergence below this threshold are automatically accepted. **Lower values** enforce stricter equivalence (closer to standard SD, higher quality but slower), while **higher values** allow more divergence (faster inference with potential quality tradeoffs). Tune this parameter to achieve your desired quality-speed tradeoff.
 - **`fsd_div_type`** (str, default: "kl"): Type of divergence metric to use:
   - `"kl"`: KL divergence (D_KL(candidate || target)) - measures how much information is lost when using the candidate distribution to approximate the target
+  - `"js"`: JS divergence variant - computed using KL divergence with a midpoint distribution, providing a symmetric and bounded measure of distribution similarity
   - `"draft_tokens"`: Absolute difference between draft and target model probability of drafted token
 - **`track_acceptance_metrics`** (bool, default: False): Whether to track and return draft token acceptance statistics. When enabled, the output includes:
   - `draft_token_acceptance_rate`: Ratio of accepted draft tokens to total draft tokens