Spaces:
Sleeping
Sleeping
| title: LLM Decoding Strategy Analyzer | |
| emoji: π¬ | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 6.5.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # π¬ LLM Decoding Strategy Analyzer | |
| An interactive tool to compare 5 text generation decoding strategies side-by-side using GPT-2. | |
| ## π― Purpose | |
| This project demonstrates **how different decoding strategies affect text generation quality**. Rather than just showing that language models can generate text, it explains *why* the outputs look the way they do based on the underlying algorithms. | |
| **Target Audience:** AI/ML practitioners, researchers, and anyone interested in understanding LLM text generation mechanics. | |
| ## ποΈ Architecture | |
| ``` | |
| User Prompt β GPT-2 Tokenizer β GPT-2 Model (124M params) | |
| β | |
| βββββββββββββββββββ΄ββββββββββββββββββ | |
| β β β β β | |
| Greedy Beam Top-K Top-P Temp+Top-P | |
| β β β β β | |
| βββββββββββββββββββ¬ββββββββββββββββββ | |
| β | |
| Side-by-Side Comparison UI | |
| ``` | |
| ## π Decoding Strategies Explained | |
| ### 1. Greedy Decoding | |
| - **How it works:** Always selects the token with the highest probability | |
| - **Parameters:** `do_sample=False` | |
| - **Behavior:** Deterministic β same input always produces same output | |
| - **Problem:** Tends to produce repetitive, "safe" text that often loops | |
| ### 2. Beam Search | |
| - **How it works:** Maintains top-k hypotheses at each step, exploring multiple paths simultaneously | |
| - **Parameters:** `num_beams=5, no_repeat_ngram_size=2` | |
| - **Behavior:** Deterministic but explores more possibilities than greedy | |
| - **Problem:** Still conservative; produces coherent but predictable text | |
| ### 3. Top-K Sampling | |
| - **How it works:** Samples randomly from the K most likely tokens | |
| - **Parameters:** `top_k=50, temperature=1.0` | |
| - **Behavior:** Stochastic β adds variety to outputs | |
| - **Problem:** Fixed K doesn't adapt to the shape of the probability distribution | |
| ### 4. Top-P (Nucleus) Sampling | |
| - **How it works:** Samples from the smallest set of tokens whose cumulative probability exceeds P | |
| - **Parameters:** `top_p=0.95, temperature=1.0` | |
| - **Behavior:** Adapts to distribution shape β uses fewer tokens when model is confident | |
| - **Advantage:** Best balance of creativity and coherence | |
| - **Reference:** Holtzman et al. (2019) | |
| ### 5. Temperature + Top-P | |
| - **How it works:** Scales logits by temperature before applying top-p sampling | |
| - **Parameters:** `temperature=0.7, top_p=0.95` | |
| - **Behavior:** Lower temperature sharpens distribution; combined with top-p for quality | |
| - **Advantage:** Fine-grained control over creativity vs. focus | |
| ## π Research Foundation | |
| This project is grounded in academic research on neural text generation: | |
| | Paper | Authors | Year | Key Contribution | | |
| |-------|---------|------|------------------| | |
| | [The Curious Case of Neural Text Degeneration](https://arxiv.org/abs/1904.09751) | Holtzman et al. | 2019 | Introduced Nucleus (Top-P) Sampling; explained why maximization-based decoding fails | | |
| | [If beam search is the answer, what was the question?](https://arxiv.org/abs/2010.02650) | Meister et al. | 2020 | Analyzed why beam search works despite high search error; uniform information density | | |
| | [Mirostat: A Neural Text Decoding Algorithm](https://arxiv.org/abs/2007.14966) | Basu et al. | 2020 | Mathematical analysis of top-k, top-p, and temperature; identified "boredom trap" and "confusion trap" | | |
| | [Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model](https://arxiv.org/abs/2404.01786) | Pandey et al. | 2024 | Comprehensive comparison of decoding strategies on GPT-2 | | |
| ## π Key Observations | |
| From testing these strategies, we consistently observe: | |
| | Strategy | Repetition Risk | Creativity | Coherence | Recommended Use | | |
| |----------|-----------------|------------|-----------|-----------------| | |
| | Greedy | β High | β Low | β οΈ Degrades | Baseline comparison only | | |
| | Beam Search | β Low | β Low | β High | Factual/structured tasks | | |
| | Top-K | β Low | β High | β οΈ Variable | Creative exploration | | |
| | Top-P | β Low | β High | β High | General purpose | | |
| | Temp + Top-P | β Low | β Moderate | β High | Production applications | | |
| **Example of Greedy Degeneration:** | |
| ``` | |
| Input: "In a distant galaxy, a lone astronaut discovered" | |
| Output: "...The astronaut was able to see the planet's gravity. | |
| The astronaut was able to see the planet's gravity. | |
| The astronaut was able to see the planet's gravity..." | |
| ``` | |
| This repetition loop is the "text degeneration" phenomenon identified by Holtzman et al. | |
| ## π οΈ Technical Implementation | |
| | Component | Choice | Rationale | | |
| |-----------|--------|-----------| | |
| | Model | GPT-2 (124M) | Large enough for quality differences; small enough for CPU inference | | |
| | Framework | Hugging Face Transformers | Industry standard; well-documented | | |
| | Interface | Gradio | Quick deployment; interactive comparison | | |
| | Timing | `torch.cuda.synchronize()` | Accurate GPU timing measurement | | |
| ## π§ͺ Development Challenges | |
| ### Challenge 1: Gradio Theme Deprecation | |
| **Problem:** `DeprecationWarning: The 'theme' parameter in Blocks constructor will be removed in Gradio 6.0` | |
| **Investigation:** Gradio 5.x/6.x changed where the theme parameter should be placed. | |
| **Solution:** Kept theme in constructor for compatibility with current SDK version; documented for future migration. | |
| ## π Repository Structure | |
| ``` | |
| llm-decoding-strategies/ | |
| βββ app.py # Main application with all strategies | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This documentation | |
| ``` | |
| ## π Local Development | |
| ```bash | |
| # Clone the repository | |
| git clone https://huggingface.co/spaces/Nav772/llm-decoding-strategies | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run locally | |
| python app.py | |
| ``` | |
| ## π Limitations | |
| - **Model Size:** GPT-2 (124M) is relatively small by modern standards; larger models would show more nuanced differences | |
| - **CPU Inference:** Running on CPU free tier; generation is slower than GPU | |
| - **Context Length:** Limited to 1024 tokens (GPT-2 maximum) | |
| - **English Only:** GPT-2 was primarily trained on English text | |
| ## π€ Author | |
| **[Nav772](https://huggingface.co/Nav772)** β Built as part of an AI Engineering portfolio demonstrating understanding of LLM text generation mechanics. | |
| ## π Related Projects | |
| - [RAG Document Q&A](https://huggingface.co/spaces/Nav772/rag-document-qa) β Retrieval-Augmented Generation system | |
| - [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer) β NLP classification | |
| - [Food Image Classifier](https://huggingface.co/spaces/Nav772/food-image-classifier) β Computer vision | |
| ## π License | |
| MIT License β See LICENSE file for details. |