--- title: LLM Decoding Strategy Analyzer emoji: ๐Ÿ”ฌ colorFrom: green colorTo: blue sdk: gradio sdk_version: 6.5.0 app_file: app.py pinned: false license: mit --- # ๐Ÿ”ฌ LLM Decoding Strategy Analyzer An interactive tool to compare 5 text generation decoding strategies side-by-side using GPT-2. ## ๐ŸŽฏ Purpose This project demonstrates **how different decoding strategies affect text generation quality**. Rather than just showing that language models can generate text, it explains *why* the outputs look the way they do based on the underlying algorithms. **Target Audience:** AI/ML practitioners, researchers, and anyone interested in understanding LLM text generation mechanics. ## ๐Ÿ—๏ธ Architecture ``` User Prompt โ†’ GPT-2 Tokenizer โ†’ GPT-2 Model (124M params) โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ†“ โ†“ โ†“ โ†“ โ†“ Greedy Beam Top-K Top-P Temp+Top-P โ†“ โ†“ โ†“ โ†“ โ†“ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ†“ Side-by-Side Comparison UI ``` ## ๐Ÿ“Š Decoding Strategies Explained ### 1. Greedy Decoding - **How it works:** Always selects the token with the highest probability - **Parameters:** `do_sample=False` - **Behavior:** Deterministic โ€” same input always produces same output - **Problem:** Tends to produce repetitive, "safe" text that often loops ### 2. Beam Search - **How it works:** Maintains top-k hypotheses at each step, exploring multiple paths simultaneously - **Parameters:** `num_beams=5, no_repeat_ngram_size=2` - **Behavior:** Deterministic but explores more possibilities than greedy - **Problem:** Still conservative; produces coherent but predictable text ### 3. Top-K Sampling - **How it works:** Samples randomly from the K most likely tokens - **Parameters:** `top_k=50, temperature=1.0` - **Behavior:** Stochastic โ€” adds variety to outputs - **Problem:** Fixed K doesn't adapt to the shape of the probability distribution ### 4. Top-P (Nucleus) Sampling - **How it works:** Samples from the smallest set of tokens whose cumulative probability exceeds P - **Parameters:** `top_p=0.95, temperature=1.0` - **Behavior:** Adapts to distribution shape โ€” uses fewer tokens when model is confident - **Advantage:** Best balance of creativity and coherence - **Reference:** Holtzman et al. (2019) ### 5. Temperature + Top-P - **How it works:** Scales logits by temperature before applying top-p sampling - **Parameters:** `temperature=0.7, top_p=0.95` - **Behavior:** Lower temperature sharpens distribution; combined with top-p for quality - **Advantage:** Fine-grained control over creativity vs. focus ## ๐Ÿ“š Research Foundation This project is grounded in academic research on neural text generation: | Paper | Authors | Year | Key Contribution | |-------|---------|------|------------------| | [The Curious Case of Neural Text Degeneration](https://arxiv.org/abs/1904.09751) | Holtzman et al. | 2019 | Introduced Nucleus (Top-P) Sampling; explained why maximization-based decoding fails | | [If beam search is the answer, what was the question?](https://arxiv.org/abs/2010.02650) | Meister et al. | 2020 | Analyzed why beam search works despite high search error; uniform information density | | [Mirostat: A Neural Text Decoding Algorithm](https://arxiv.org/abs/2007.14966) | Basu et al. | 2020 | Mathematical analysis of top-k, top-p, and temperature; identified "boredom trap" and "confusion trap" | | [Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model](https://arxiv.org/abs/2404.01786) | Pandey et al. | 2024 | Comprehensive comparison of decoding strategies on GPT-2 | ## ๐Ÿ” Key Observations From testing these strategies, we consistently observe: | Strategy | Repetition Risk | Creativity | Coherence | Recommended Use | |----------|-----------------|------------|-----------|-----------------| | Greedy | โŒ High | โŒ Low | โš ๏ธ Degrades | Baseline comparison only | | Beam Search | โœ… Low | โŒ Low | โœ… High | Factual/structured tasks | | Top-K | โœ… Low | โœ… High | โš ๏ธ Variable | Creative exploration | | Top-P | โœ… Low | โœ… High | โœ… High | General purpose | | Temp + Top-P | โœ… Low | โœ… Moderate | โœ… High | Production applications | **Example of Greedy Degeneration:** ``` Input: "In a distant galaxy, a lone astronaut discovered" Output: "...The astronaut was able to see the planet's gravity. The astronaut was able to see the planet's gravity. The astronaut was able to see the planet's gravity..." ``` This repetition loop is the "text degeneration" phenomenon identified by Holtzman et al. ## ๐Ÿ› ๏ธ Technical Implementation | Component | Choice | Rationale | |-----------|--------|-----------| | Model | GPT-2 (124M) | Large enough for quality differences; small enough for CPU inference | | Framework | Hugging Face Transformers | Industry standard; well-documented | | Interface | Gradio | Quick deployment; interactive comparison | | Timing | `torch.cuda.synchronize()` | Accurate GPU timing measurement | ## ๐Ÿงช Development Challenges ### Challenge 1: Gradio Theme Deprecation **Problem:** `DeprecationWarning: The 'theme' parameter in Blocks constructor will be removed in Gradio 6.0` **Investigation:** Gradio 5.x/6.x changed where the theme parameter should be placed. **Solution:** Kept theme in constructor for compatibility with current SDK version; documented for future migration. ## ๐Ÿ“ Repository Structure ``` llm-decoding-strategies/ โ”œโ”€โ”€ app.py # Main application with all strategies โ”œโ”€โ”€ requirements.txt # Python dependencies โ””โ”€โ”€ README.md # This documentation ``` ## ๐Ÿš€ Local Development ```bash # Clone the repository git clone https://huggingface.co/spaces/Nav772/llm-decoding-strategies # Install dependencies pip install -r requirements.txt # Run locally python app.py ``` ## ๐Ÿ“ Limitations - **Model Size:** GPT-2 (124M) is relatively small by modern standards; larger models would show more nuanced differences - **CPU Inference:** Running on CPU free tier; generation is slower than GPU - **Context Length:** Limited to 1024 tokens (GPT-2 maximum) - **English Only:** GPT-2 was primarily trained on English text ## ๐Ÿ‘ค Author **[Nav772](https://huggingface.co/Nav772)** โ€” Built as part of an AI Engineering portfolio demonstrating understanding of LLM text generation mechanics. ## ๐Ÿ“š Related Projects - [RAG Document Q&A](https://huggingface.co/spaces/Nav772/rag-document-qa) โ€” Retrieval-Augmented Generation system - [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer) โ€” NLP classification - [Food Image Classifier](https://huggingface.co/spaces/Nav772/food-image-classifier) โ€” Computer vision ## ๐Ÿ“„ License MIT License โ€” See LICENSE file for details.