Nav772's picture
Update README.md
10b2e29 verified

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: LLM Decoding Strategy Analyzer
emoji: πŸ”¬
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.5.0
app_file: app.py
pinned: false
license: mit

πŸ”¬ LLM Decoding Strategy Analyzer

An interactive tool to compare 5 text generation decoding strategies side-by-side using GPT-2.

🎯 Purpose

This project demonstrates how different decoding strategies affect text generation quality. Rather than just showing that language models can generate text, it explains why the outputs look the way they do based on the underlying algorithms.

Target Audience: AI/ML practitioners, researchers, and anyone interested in understanding LLM text generation mechanics.

πŸ—οΈ Architecture

User Prompt β†’ GPT-2 Tokenizer β†’ GPT-2 Model (124M params)
                                      ↓
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    ↓         ↓         ↓         ↓         ↓
                 Greedy    Beam     Top-K    Top-P    Temp+Top-P
                    ↓         ↓         ↓         ↓         ↓
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      ↓
                           Side-by-Side Comparison UI

πŸ“Š Decoding Strategies Explained

1. Greedy Decoding

  • How it works: Always selects the token with the highest probability
  • Parameters: do_sample=False
  • Behavior: Deterministic β€” same input always produces same output
  • Problem: Tends to produce repetitive, "safe" text that often loops

2. Beam Search

  • How it works: Maintains top-k hypotheses at each step, exploring multiple paths simultaneously
  • Parameters: num_beams=5, no_repeat_ngram_size=2
  • Behavior: Deterministic but explores more possibilities than greedy
  • Problem: Still conservative; produces coherent but predictable text

3. Top-K Sampling

  • How it works: Samples randomly from the K most likely tokens
  • Parameters: top_k=50, temperature=1.0
  • Behavior: Stochastic β€” adds variety to outputs
  • Problem: Fixed K doesn't adapt to the shape of the probability distribution

4. Top-P (Nucleus) Sampling

  • How it works: Samples from the smallest set of tokens whose cumulative probability exceeds P
  • Parameters: top_p=0.95, temperature=1.0
  • Behavior: Adapts to distribution shape β€” uses fewer tokens when model is confident
  • Advantage: Best balance of creativity and coherence
  • Reference: Holtzman et al. (2019)

5. Temperature + Top-P

  • How it works: Scales logits by temperature before applying top-p sampling
  • Parameters: temperature=0.7, top_p=0.95
  • Behavior: Lower temperature sharpens distribution; combined with top-p for quality
  • Advantage: Fine-grained control over creativity vs. focus

πŸ“š Research Foundation

This project is grounded in academic research on neural text generation:

Paper Authors Year Key Contribution
The Curious Case of Neural Text Degeneration Holtzman et al. 2019 Introduced Nucleus (Top-P) Sampling; explained why maximization-based decoding fails
If beam search is the answer, what was the question? Meister et al. 2020 Analyzed why beam search works despite high search error; uniform information density
Mirostat: A Neural Text Decoding Algorithm Basu et al. 2020 Mathematical analysis of top-k, top-p, and temperature; identified "boredom trap" and "confusion trap"
Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model Pandey et al. 2024 Comprehensive comparison of decoding strategies on GPT-2

πŸ” Key Observations

From testing these strategies, we consistently observe:

Strategy Repetition Risk Creativity Coherence Recommended Use
Greedy ❌ High ❌ Low ⚠️ Degrades Baseline comparison only
Beam Search βœ… Low ❌ Low βœ… High Factual/structured tasks
Top-K βœ… Low βœ… High ⚠️ Variable Creative exploration
Top-P βœ… Low βœ… High βœ… High General purpose
Temp + Top-P βœ… Low βœ… Moderate βœ… High Production applications

Example of Greedy Degeneration:

Input: "In a distant galaxy, a lone astronaut discovered"
Output: "...The astronaut was able to see the planet's gravity.
        The astronaut was able to see the planet's gravity.
        The astronaut was able to see the planet's gravity..."

This repetition loop is the "text degeneration" phenomenon identified by Holtzman et al.

πŸ› οΈ Technical Implementation

Component Choice Rationale
Model GPT-2 (124M) Large enough for quality differences; small enough for CPU inference
Framework Hugging Face Transformers Industry standard; well-documented
Interface Gradio Quick deployment; interactive comparison
Timing torch.cuda.synchronize() Accurate GPU timing measurement

πŸ§ͺ Development Challenges

Challenge 1: Gradio Theme Deprecation

Problem: DeprecationWarning: The 'theme' parameter in Blocks constructor will be removed in Gradio 6.0

Investigation: Gradio 5.x/6.x changed where the theme parameter should be placed.

Solution: Kept theme in constructor for compatibility with current SDK version; documented for future migration.

πŸ“ Repository Structure

llm-decoding-strategies/
β”œβ”€β”€ app.py              # Main application with all strategies
β”œβ”€β”€ requirements.txt    # Python dependencies
└── README.md          # This documentation

πŸš€ Local Development

# Clone the repository
git clone https://huggingface.co/spaces/Nav772/llm-decoding-strategies

# Install dependencies
pip install -r requirements.txt

# Run locally
python app.py

πŸ“ Limitations

  • Model Size: GPT-2 (124M) is relatively small by modern standards; larger models would show more nuanced differences
  • CPU Inference: Running on CPU free tier; generation is slower than GPU
  • Context Length: Limited to 1024 tokens (GPT-2 maximum)
  • English Only: GPT-2 was primarily trained on English text

πŸ‘€ Author

Nav772 β€” Built as part of an AI Engineering portfolio demonstrating understanding of LLM text generation mechanics.

πŸ“š Related Projects

πŸ“„ License

MIT License β€” See LICENSE file for details.