maxholsman commited on
Commit
0cef019
·
verified ·
1 Parent(s): dc6e82a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +5 -7
README.md CHANGED
@@ -13,14 +13,12 @@ Standard Speculative Decoding enforces strict distributional equivalence to the
13
  **Key Benefits:**
14
  - **Tunable Quality-Speed Tradeoff**: Adjust the `fsd_threshold` parameter to control the balance between generation quality and inference speed
15
  - **Significant Speed Improvements**: Achieve runtime improvements of over 5 tokens per second faster than standard SD with only an approximate 2% absolute reduction in benchmark accuracy
16
- - **Maintained Performance**: In many cases, FSD matches standard SD benchmark accuracy while running over 2 tokens per second faster
17
  - **Flexible Acceptance Criteria**: Choose from multiple divergence metrics (KL divergence, Jensen-Shannon divergence, or draft token probabilities) to best suit your use case
18
 
19
- This implementation extends the standard speculative decoding algorithm with additional divergence metrics for more flexible candidate acceptance, supporting KL divergence, Jensen-Shannon divergence, and draft token-based acceptance criteria.
20
-
21
  This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [Citation](#citation) section below for full citation details.
22
 
23
- ## Features
24
 
25
  - **Fuzzy Speculative Decoding (FSD)**: Accepts candidate tokens based on distribution divergence thresholds
26
  - **Multiple Divergence Types**:
@@ -74,10 +72,10 @@ print(generated_text)
74
 
75
  ### How It Works
76
 
77
- 1. The assistant model generates candidate tokens
78
- 2. The target model evaluates these candidates
79
  3. For each candidate position:
80
- - If FSD divergence threshold: token is accepted
81
  - Otherwise: standard speculative decoding acceptance is applied
82
  4. Accepted tokens are kept, rejected tokens trigger resampling from the target model
83
 
 
13
  **Key Benefits:**
14
  - **Tunable Quality-Speed Tradeoff**: Adjust the `fsd_threshold` parameter to control the balance between generation quality and inference speed
15
  - **Significant Speed Improvements**: Achieve runtime improvements of over 5 tokens per second faster than standard SD with only an approximate 2% absolute reduction in benchmark accuracy
16
+ - **Maintained Performance**: In many cases, FSD can match standard SD benchmark accuracy while running over 2 tokens per second faster
17
  - **Flexible Acceptance Criteria**: Choose from multiple divergence metrics (KL divergence, Jensen-Shannon divergence, or draft token probabilities) to best suit your use case
18
 
 
 
19
  This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [Citation](#citation) section below for full citation details.
20
 
21
+ ## How it works
22
 
23
  - **Fuzzy Speculative Decoding (FSD)**: Accepts candidate tokens based on distribution divergence thresholds
24
  - **Multiple Divergence Types**:
 
72
 
73
  ### How It Works
74
 
75
+ 1. The assistant model generates candidate tokens (just like standard SD)
76
+ 2. The target model evaluates these candidates, generating distributions for all draft tokens
77
  3. For each candidate position:
78
+ - If FSD divergence between the target and draft model distributions is less that the fsd_threshold: token is accepted
79
  - Otherwise: standard speculative decoding acceptance is applied
80
  4. Accepted tokens are kept, rejected tokens trigger resampling from the target model
81