maxholsman commited on
Commit
ffe950a
·
verified ·
1 Parent(s): 2bc36c6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -14,7 +14,7 @@ Standard Speculative Decoding enforces strict distributional equivalence to the
14
  - **Tunable Quality-Speed Tradeoff**: Adjust the `fsd_threshold` parameter to control the balance between generation quality and inference speed
15
  - **Significant Speed Improvements**: Achieve runtime improvements of over 5 tokens per second faster than standard SD with only an approximate 2% absolute reduction in benchmark accuracy
16
  - **Maintained Performance**: In many cases, FSD can match standard SD benchmark accuracy while running over 2 tokens per second faster
17
- - **Flexible Acceptance Criteria**: Choose from multiple divergence metrics (KL divergence, Jensen-Shannon divergence, or draft token probabilities) to best suit your use case
18
 
19
  This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [Citation](#citation) section below for full citation details.
20
 
@@ -23,7 +23,7 @@ This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tu
23
  - **Fuzzy Speculative Decoding (FSD)**: Accepts candidate tokens based on distribution divergence thresholds
24
  - **Multiple Divergence Types**:
25
  - `kl`: KL divergence between candidate and target distributions
26
- - `js`: Jensen-Shannon divergence
27
  - `draft_tokens`: Absolute difference in draft token probabilities
28
  - **Standard Speculative Decoding**: Falls back to standard speculative decoding acceptance when FSD threshold is not met
29
 
@@ -67,7 +67,7 @@ print(generated_text)
67
  - **`fsd_threshold`** (float, default: 0.0): Threshold for fuzzy speculative decoding acceptance. Tokens with divergence below this threshold are automatically accepted. **Lower values** enforce stricter equivalence (closer to standard SD, higher quality but slower), while **higher values** allow more divergence (faster inference with potential quality tradeoffs). Tune this parameter to achieve your desired quality-speed tradeoff.
68
  - **`fsd_div_type`** (str, default: "kl"): Type of divergence metric to use:
69
  - `"kl"`: KL divergence (D_KL(candidate || target)) - measures how much information is lost when using the candidate distribution to approximate the target
70
- - `"js"`: Jensen-Shannon divergence - a symmetric and bounded measure of distribution similarity
71
  - `"draft_tokens"`: Absolute difference between draft and target model probability of drafted token
72
  - **`track_acceptance_metrics`** (bool, default: False): Whether to track and return draft token acceptance statistics. When enabled, the output includes:
73
  - `draft_token_acceptance_rate`: Ratio of accepted draft tokens to total draft tokens
 
14
  - **Tunable Quality-Speed Tradeoff**: Adjust the `fsd_threshold` parameter to control the balance between generation quality and inference speed
15
  - **Significant Speed Improvements**: Achieve runtime improvements of over 5 tokens per second faster than standard SD with only an approximate 2% absolute reduction in benchmark accuracy
16
  - **Maintained Performance**: In many cases, FSD can match standard SD benchmark accuracy while running over 2 tokens per second faster
17
+ - **Flexible Acceptance Criteria**: Choose from multiple divergence metrics (KL divergence, JS divergence variant, or draft token probabilities) to best suit your use case
18
 
19
  This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [Citation](#citation) section below for full citation details.
20
 
 
23
  - **Fuzzy Speculative Decoding (FSD)**: Accepts candidate tokens based on distribution divergence thresholds
24
  - **Multiple Divergence Types**:
25
  - `kl`: KL divergence between candidate and target distributions
26
+ - `js`: JS divergence variant (computed using KL divergence with midpoint distribution)
27
  - `draft_tokens`: Absolute difference in draft token probabilities
28
  - **Standard Speculative Decoding**: Falls back to standard speculative decoding acceptance when FSD threshold is not met
29
 
 
67
  - **`fsd_threshold`** (float, default: 0.0): Threshold for fuzzy speculative decoding acceptance. Tokens with divergence below this threshold are automatically accepted. **Lower values** enforce stricter equivalence (closer to standard SD, higher quality but slower), while **higher values** allow more divergence (faster inference with potential quality tradeoffs). Tune this parameter to achieve your desired quality-speed tradeoff.
68
  - **`fsd_div_type`** (str, default: "kl"): Type of divergence metric to use:
69
  - `"kl"`: KL divergence (D_KL(candidate || target)) - measures how much information is lost when using the candidate distribution to approximate the target
70
+ - `"js"`: JS divergence variant - computed using KL divergence with a midpoint distribution, providing a symmetric and bounded measure of distribution similarity
71
  - `"draft_tokens"`: Absolute difference between draft and target model probability of drafted token
72
  - **`track_acceptance_metrics`** (bool, default: False): Whether to track and return draft token acceptance statistics. When enabled, the output includes:
73
  - `draft_token_acceptance_rate`: Ratio of accepted draft tokens to total draft tokens