Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ Standard Speculative Decoding enforces strict distributional equivalence to the
|
|
| 14 |
- **Tunable Quality-Speed Tradeoff**: Adjust the `fsd_threshold` parameter to control the balance between generation quality and inference speed
|
| 15 |
- **Significant Speed Improvements**: Achieve runtime improvements of over 5 tokens per second faster than standard SD with only an approximate 2% absolute reduction in benchmark accuracy
|
| 16 |
- **Maintained Performance**: In many cases, FSD can match standard SD benchmark accuracy while running over 2 tokens per second faster
|
| 17 |
-
- **Flexible Acceptance Criteria**: Choose from multiple divergence metrics (KL divergence,
|
| 18 |
|
| 19 |
This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [Citation](#citation) section below for full citation details.
|
| 20 |
|
|
@@ -23,7 +23,7 @@ This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tu
|
|
| 23 |
- **Fuzzy Speculative Decoding (FSD)**: Accepts candidate tokens based on distribution divergence thresholds
|
| 24 |
- **Multiple Divergence Types**:
|
| 25 |
- `kl`: KL divergence between candidate and target distributions
|
| 26 |
-
- `js`:
|
| 27 |
- `draft_tokens`: Absolute difference in draft token probabilities
|
| 28 |
- **Standard Speculative Decoding**: Falls back to standard speculative decoding acceptance when FSD threshold is not met
|
| 29 |
|
|
@@ -67,7 +67,7 @@ print(generated_text)
|
|
| 67 |
- **`fsd_threshold`** (float, default: 0.0): Threshold for fuzzy speculative decoding acceptance. Tokens with divergence below this threshold are automatically accepted. **Lower values** enforce stricter equivalence (closer to standard SD, higher quality but slower), while **higher values** allow more divergence (faster inference with potential quality tradeoffs). Tune this parameter to achieve your desired quality-speed tradeoff.
|
| 68 |
- **`fsd_div_type`** (str, default: "kl"): Type of divergence metric to use:
|
| 69 |
- `"kl"`: KL divergence (D_KL(candidate || target)) - measures how much information is lost when using the candidate distribution to approximate the target
|
| 70 |
-
- `"js"`:
|
| 71 |
- `"draft_tokens"`: Absolute difference between draft and target model probability of drafted token
|
| 72 |
- **`track_acceptance_metrics`** (bool, default: False): Whether to track and return draft token acceptance statistics. When enabled, the output includes:
|
| 73 |
- `draft_token_acceptance_rate`: Ratio of accepted draft tokens to total draft tokens
|
|
|
|
| 14 |
- **Tunable Quality-Speed Tradeoff**: Adjust the `fsd_threshold` parameter to control the balance between generation quality and inference speed
|
| 15 |
- **Significant Speed Improvements**: Achieve runtime improvements of over 5 tokens per second faster than standard SD with only an approximate 2% absolute reduction in benchmark accuracy
|
| 16 |
- **Maintained Performance**: In many cases, FSD can match standard SD benchmark accuracy while running over 2 tokens per second faster
|
| 17 |
+
- **Flexible Acceptance Criteria**: Choose from multiple divergence metrics (KL divergence, JS divergence variant, or draft token probabilities) to best suit your use case
|
| 18 |
|
| 19 |
This implementation is based on the paper **"Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff"** (ACL Findings 2025). See the [Citation](#citation) section below for full citation details.
|
| 20 |
|
|
|
|
| 23 |
- **Fuzzy Speculative Decoding (FSD)**: Accepts candidate tokens based on distribution divergence thresholds
|
| 24 |
- **Multiple Divergence Types**:
|
| 25 |
- `kl`: KL divergence between candidate and target distributions
|
| 26 |
+
- `js`: JS divergence variant (computed using KL divergence with midpoint distribution)
|
| 27 |
- `draft_tokens`: Absolute difference in draft token probabilities
|
| 28 |
- **Standard Speculative Decoding**: Falls back to standard speculative decoding acceptance when FSD threshold is not met
|
| 29 |
|
|
|
|
| 67 |
- **`fsd_threshold`** (float, default: 0.0): Threshold for fuzzy speculative decoding acceptance. Tokens with divergence below this threshold are automatically accepted. **Lower values** enforce stricter equivalence (closer to standard SD, higher quality but slower), while **higher values** allow more divergence (faster inference with potential quality tradeoffs). Tune this parameter to achieve your desired quality-speed tradeoff.
|
| 68 |
- **`fsd_div_type`** (str, default: "kl"): Type of divergence metric to use:
|
| 69 |
- `"kl"`: KL divergence (D_KL(candidate || target)) - measures how much information is lost when using the candidate distribution to approximate the target
|
| 70 |
+
- `"js"`: JS divergence variant - computed using KL divergence with a midpoint distribution, providing a symmetric and bounded measure of distribution similarity
|
| 71 |
- `"draft_tokens"`: Absolute difference between draft and target model probability of drafted token
|
| 72 |
- **`track_acceptance_metrics`** (bool, default: False): Whether to track and return draft token acceptance statistics. When enabled, the output includes:
|
| 73 |
- `draft_token_acceptance_rate`: Ratio of accepted draft tokens to total draft tokens
|