Nav772's picture
Update README.md
10b2e29 verified
---
title: LLM Decoding Strategy Analyzer
emoji: πŸ”¬
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.5.0
app_file: app.py
pinned: false
license: mit
---
# πŸ”¬ LLM Decoding Strategy Analyzer
An interactive tool to compare 5 text generation decoding strategies side-by-side using GPT-2.
## 🎯 Purpose
This project demonstrates **how different decoding strategies affect text generation quality**. Rather than just showing that language models can generate text, it explains *why* the outputs look the way they do based on the underlying algorithms.
**Target Audience:** AI/ML practitioners, researchers, and anyone interested in understanding LLM text generation mechanics.
## πŸ—οΈ Architecture
```
User Prompt β†’ GPT-2 Tokenizer β†’ GPT-2 Model (124M params)
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
↓ ↓ ↓ ↓ ↓
Greedy Beam Top-K Top-P Temp+Top-P
↓ ↓ ↓ ↓ ↓
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
Side-by-Side Comparison UI
```
## πŸ“Š Decoding Strategies Explained
### 1. Greedy Decoding
- **How it works:** Always selects the token with the highest probability
- **Parameters:** `do_sample=False`
- **Behavior:** Deterministic β€” same input always produces same output
- **Problem:** Tends to produce repetitive, "safe" text that often loops
### 2. Beam Search
- **How it works:** Maintains top-k hypotheses at each step, exploring multiple paths simultaneously
- **Parameters:** `num_beams=5, no_repeat_ngram_size=2`
- **Behavior:** Deterministic but explores more possibilities than greedy
- **Problem:** Still conservative; produces coherent but predictable text
### 3. Top-K Sampling
- **How it works:** Samples randomly from the K most likely tokens
- **Parameters:** `top_k=50, temperature=1.0`
- **Behavior:** Stochastic β€” adds variety to outputs
- **Problem:** Fixed K doesn't adapt to the shape of the probability distribution
### 4. Top-P (Nucleus) Sampling
- **How it works:** Samples from the smallest set of tokens whose cumulative probability exceeds P
- **Parameters:** `top_p=0.95, temperature=1.0`
- **Behavior:** Adapts to distribution shape β€” uses fewer tokens when model is confident
- **Advantage:** Best balance of creativity and coherence
- **Reference:** Holtzman et al. (2019)
### 5. Temperature + Top-P
- **How it works:** Scales logits by temperature before applying top-p sampling
- **Parameters:** `temperature=0.7, top_p=0.95`
- **Behavior:** Lower temperature sharpens distribution; combined with top-p for quality
- **Advantage:** Fine-grained control over creativity vs. focus
## πŸ“š Research Foundation
This project is grounded in academic research on neural text generation:
| Paper | Authors | Year | Key Contribution |
|-------|---------|------|------------------|
| [The Curious Case of Neural Text Degeneration](https://arxiv.org/abs/1904.09751) | Holtzman et al. | 2019 | Introduced Nucleus (Top-P) Sampling; explained why maximization-based decoding fails |
| [If beam search is the answer, what was the question?](https://arxiv.org/abs/2010.02650) | Meister et al. | 2020 | Analyzed why beam search works despite high search error; uniform information density |
| [Mirostat: A Neural Text Decoding Algorithm](https://arxiv.org/abs/2007.14966) | Basu et al. | 2020 | Mathematical analysis of top-k, top-p, and temperature; identified "boredom trap" and "confusion trap" |
| [Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model](https://arxiv.org/abs/2404.01786) | Pandey et al. | 2024 | Comprehensive comparison of decoding strategies on GPT-2 |
## πŸ” Key Observations
From testing these strategies, we consistently observe:
| Strategy | Repetition Risk | Creativity | Coherence | Recommended Use |
|----------|-----------------|------------|-----------|-----------------|
| Greedy | ❌ High | ❌ Low | ⚠️ Degrades | Baseline comparison only |
| Beam Search | βœ… Low | ❌ Low | βœ… High | Factual/structured tasks |
| Top-K | βœ… Low | βœ… High | ⚠️ Variable | Creative exploration |
| Top-P | βœ… Low | βœ… High | βœ… High | General purpose |
| Temp + Top-P | βœ… Low | βœ… Moderate | βœ… High | Production applications |
**Example of Greedy Degeneration:**
```
Input: "In a distant galaxy, a lone astronaut discovered"
Output: "...The astronaut was able to see the planet's gravity.
The astronaut was able to see the planet's gravity.
The astronaut was able to see the planet's gravity..."
```
This repetition loop is the "text degeneration" phenomenon identified by Holtzman et al.
## πŸ› οΈ Technical Implementation
| Component | Choice | Rationale |
|-----------|--------|-----------|
| Model | GPT-2 (124M) | Large enough for quality differences; small enough for CPU inference |
| Framework | Hugging Face Transformers | Industry standard; well-documented |
| Interface | Gradio | Quick deployment; interactive comparison |
| Timing | `torch.cuda.synchronize()` | Accurate GPU timing measurement |
## πŸ§ͺ Development Challenges
### Challenge 1: Gradio Theme Deprecation
**Problem:** `DeprecationWarning: The 'theme' parameter in Blocks constructor will be removed in Gradio 6.0`
**Investigation:** Gradio 5.x/6.x changed where the theme parameter should be placed.
**Solution:** Kept theme in constructor for compatibility with current SDK version; documented for future migration.
## πŸ“ Repository Structure
```
llm-decoding-strategies/
β”œβ”€β”€ app.py # Main application with all strategies
β”œβ”€β”€ requirements.txt # Python dependencies
└── README.md # This documentation
```
## πŸš€ Local Development
```bash
# Clone the repository
git clone https://huggingface.co/spaces/Nav772/llm-decoding-strategies
# Install dependencies
pip install -r requirements.txt
# Run locally
python app.py
```
## πŸ“ Limitations
- **Model Size:** GPT-2 (124M) is relatively small by modern standards; larger models would show more nuanced differences
- **CPU Inference:** Running on CPU free tier; generation is slower than GPU
- **Context Length:** Limited to 1024 tokens (GPT-2 maximum)
- **English Only:** GPT-2 was primarily trained on English text
## πŸ‘€ Author
**[Nav772](https://huggingface.co/Nav772)** β€” Built as part of an AI Engineering portfolio demonstrating understanding of LLM text generation mechanics.
## πŸ“š Related Projects
- [RAG Document Q&A](https://huggingface.co/spaces/Nav772/rag-document-qa) β€” Retrieval-Augmented Generation system
- [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer) β€” NLP classification
- [Food Image Classifier](https://huggingface.co/spaces/Nav772/food-image-classifier) β€” Computer vision
## πŸ“„ License
MIT License β€” See LICENSE file for details.