Spaces:

valcore
/

Dssd_Demo

Sleeping

App Files Files Community

Dssd_Demo / README.md

valcore

Update README.md

1e1f0a5 verified 17 days ago

preview code

raw

history blame contribute delete

1.44 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: DSSD Demo
emoji: 🚀
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false
license: apache-2.0

🚀 Dynamic Self-Speculative Decoding (DSSD) Demo

This demo showcases early exit inference with true speculative decoding. Tokens are generated from intermediate layers when the model is confident, resulting in faster generation while guaranteeing output identical to the full model.

Features

Speculative Decoding: Uses early exit heads to draft tokens, then verifies them with the full model.
Streaming Output: Watch the generation process live, including drafting and verification statuses.
Model Comparison: Compare performance and output between DSSD and the full model side-by-side.
Color-coded Visualization: Each token is colored based on which head/layer generated it.

How it works

Draft Phase: The model tries to predict the next token(s) using early exit heads placed at intermediate layers.
Verification Phase: The full model checks the drafted tokens in a single forward pass.
Acceptance: Matching tokens are kept. The first mismatch is corrected, and the process restarts.

Models

Llama 3 8B: Using 3 auxiliary heads at layers 8, 16, and 24.
Qwen 3 0.6B: Using 4 auxiliary heads at layers 5, 11, 16, and 22.

Quick Start (Local)

pip install -r requirements.txt
python app.py