Spaces:

Nav772
/

llm-decoding-strategies

Sleeping

App Files Files Community

llm-decoding-strategies / README.md

Nav772

Update README.md

10b2e29 verified about 1 month ago

preview code

raw

history blame contribute delete

7.18 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: LLM Decoding Strategy Analyzer
emoji: 🔬
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.5.0
app_file: app.py
pinned: false
license: mit

🔬 LLM Decoding Strategy Analyzer

An interactive tool to compare 5 text generation decoding strategies side-by-side using GPT-2.

🎯 Purpose

This project demonstrates how different decoding strategies affect text generation quality. Rather than just showing that language models can generate text, it explains why the outputs look the way they do based on the underlying algorithms.

Target Audience: AI/ML practitioners, researchers, and anyone interested in understanding LLM text generation mechanics.

🏗️ Architecture

User Prompt → GPT-2 Tokenizer → GPT-2 Model (124M params)
                                      ↓
                    ┌─────────────────┴─────────────────┐
                    ↓         ↓         ↓         ↓         ↓
                 Greedy    Beam     Top-K    Top-P    Temp+Top-P
                    ↓         ↓         ↓         ↓         ↓
                    └─────────────────┬─────────────────┘
                                      ↓
                           Side-by-Side Comparison UI

📊 Decoding Strategies Explained

1. Greedy Decoding

How it works: Always selects the token with the highest probability
Parameters: do_sample=False
Behavior: Deterministic — same input always produces same output
Problem: Tends to produce repetitive, "safe" text that often loops

2. Beam Search

How it works: Maintains top-k hypotheses at each step, exploring multiple paths simultaneously
Parameters: num_beams=5, no_repeat_ngram_size=2
Behavior: Deterministic but explores more possibilities than greedy
Problem: Still conservative; produces coherent but predictable text

3. Top-K Sampling

How it works: Samples randomly from the K most likely tokens
Parameters: top_k=50, temperature=1.0
Behavior: Stochastic — adds variety to outputs
Problem: Fixed K doesn't adapt to the shape of the probability distribution

4. Top-P (Nucleus) Sampling

How it works: Samples from the smallest set of tokens whose cumulative probability exceeds P
Parameters: top_p=0.95, temperature=1.0
Behavior: Adapts to distribution shape — uses fewer tokens when model is confident
Advantage: Best balance of creativity and coherence
Reference: Holtzman et al. (2019)

5. Temperature + Top-P

How it works: Scales logits by temperature before applying top-p sampling
Parameters: temperature=0.7, top_p=0.95
Behavior: Lower temperature sharpens distribution; combined with top-p for quality
Advantage: Fine-grained control over creativity vs. focus

📚 Research Foundation

This project is grounded in academic research on neural text generation:

Paper	Authors	Year	Key Contribution
The Curious Case of Neural Text Degeneration	Holtzman et al.	2019	Introduced Nucleus (Top-P) Sampling; explained why maximization-based decoding fails
If beam search is the answer, what was the question?	Meister et al.	2020	Analyzed why beam search works despite high search error; uniform information density
Mirostat: A Neural Text Decoding Algorithm	Basu et al.	2020	Mathematical analysis of top-k, top-p, and temperature; identified "boredom trap" and "confusion trap"
Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model	Pandey et al.	2024	Comprehensive comparison of decoding strategies on GPT-2

🔍 Key Observations

From testing these strategies, we consistently observe:

Strategy	Repetition Risk	Creativity	Coherence	Recommended Use
Greedy	❌ High	❌ Low	⚠️ Degrades	Baseline comparison only
Beam Search	✅ Low	❌ Low	✅ High	Factual/structured tasks
Top-K	✅ Low	✅ High	⚠️ Variable	Creative exploration
Top-P	✅ Low	✅ High	✅ High	General purpose
Temp + Top-P	✅ Low	✅ Moderate	✅ High	Production applications

Example of Greedy Degeneration:

Input: "In a distant galaxy, a lone astronaut discovered"
Output: "...The astronaut was able to see the planet's gravity.
        The astronaut was able to see the planet's gravity.
        The astronaut was able to see the planet's gravity..."

This repetition loop is the "text degeneration" phenomenon identified by Holtzman et al.

🛠️ Technical Implementation

Component	Choice	Rationale
Model	GPT-2 (124M)	Large enough for quality differences; small enough for CPU inference
Framework	Hugging Face Transformers	Industry standard; well-documented
Interface	Gradio	Quick deployment; interactive comparison
Timing	`torch.cuda.synchronize()`	Accurate GPU timing measurement

🧪 Development Challenges

Challenge 1: Gradio Theme Deprecation

Problem: DeprecationWarning: The 'theme' parameter in Blocks constructor will be removed in Gradio 6.0

Investigation: Gradio 5.x/6.x changed where the theme parameter should be placed.

Solution: Kept theme in constructor for compatibility with current SDK version; documented for future migration.

📁 Repository Structure

llm-decoding-strategies/
├── app.py              # Main application with all strategies
├── requirements.txt    # Python dependencies
└── README.md          # This documentation

🚀 Local Development

# Clone the repository
git clone https://huggingface.co/spaces/Nav772/llm-decoding-strategies

# Install dependencies
pip install -r requirements.txt

# Run locally
python app.py

📝 Limitations

Model Size: GPT-2 (124M) is relatively small by modern standards; larger models would show more nuanced differences
CPU Inference: Running on CPU free tier; generation is slower than GPU
Context Length: Limited to 1024 tokens (GPT-2 maximum)
English Only: GPT-2 was primarily trained on English text

👤 Author

Nav772 — Built as part of an AI Engineering portfolio demonstrating understanding of LLM text generation mechanics.

📚 Related Projects

RAG Document Q&A — Retrieval-Augmented Generation system
Movie Sentiment Analyzer — NLP classification
Food Image Classifier — Computer vision

📄 License

MIT License — See LICENSE file for details.