| # Local LLM Clue Generation Prototype | |
| This prototype integrates the existing thematic word generation with local LLM-based clue generation using `google/flan-t5-small`. | |
| ## Files | |
| - **`llm_clue_generator.py`** - Core LLM clue generator using flan-t5-small | |
| - **`test_clue_generation.py`** - Integration test script combining word + clue generation | |
| - **`requirements.txt`** - Dependencies for the prototype | |
| - **`README_clue_generation.md`** - This documentation | |
| ## Quick Start | |
| 1. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Test LLM clue generator only:** | |
| ```bash | |
| python llm_clue_generator.py | |
| ``` | |
| 3. **Test full integration (word + clue generation):** | |
| ```bash | |
| python test_clue_generation.py | |
| ``` | |
| ## Key Features | |
| ### LLM Clue Generator (`llm_clue_generator.py`) | |
| - Uses `google/flan-t5-small` (~250MB) optimized for CPU inference | |
| - Generates multiple clue candidates and selects the best one | |
| - Supports different clue styles: definition, trivia, description, category | |
| - Includes fallback templates when LLM generation fails | |
| - Batch processing capability for efficiency | |
| ### Integration Test (`test_clue_generation.py`) | |
| - **Single Topic Test**: Generate words + clues for one topic | |
| - **Multi-Topic Test**: Handle multiple themes with contextual clues | |
| - **Custom Sentence Test**: Personal sentence to themed word-clue pairs | |
| - **Difficulty Comparison**: Same words with easy/medium/hard clue complexity | |
| - **Performance Analysis**: Speed and memory usage metrics | |
| ## Expected Performance (HF Spaces) | |
| - **Initialization**: ~30-60s (model download + word embeddings) | |
| - **Word Generation**: ~1-3s for 10 words | |
| - **Clue Generation**: ~2-5s per clue (depends on complexity) | |
| - **Memory Usage**: ~1-2GB (model + embeddings + vocabulary) | |
| ## Sample Output | |
| ``` | |
| Topic: 'animals' | |
| 1. ELEPHANT (8 letters) - Large mammal with trunk and tusks | |
| 2. TIGER (5 letters) - Striped big cat from Asia | |
| 3. PENGUIN (7 letters) - Flightless Antarctic bird | |
| ... | |
| ``` | |
| ## Integration with Backend | |
| To integrate with the main crossword application: | |
| 1. **Add to ThematicWordService**: Include LLMClueGenerator as optional component | |
| 2. **Async Support**: Wrap clue generation in async methods | |
| 3. **Caching**: Cache generated clues to avoid regeneration | |
| 4. **Fallback Chain**: LLM → Enhanced Templates → Basic Templates | |
| ## Configuration Options | |
| ### LLM Settings | |
| - `model_name`: Change model (default: "google/flan-t5-small") | |
| - `max_length`: Maximum clue length (default: 50) | |
| - `temperature`: Generation creativity (default: 0.7) | |
| - `num_candidates`: Clue candidates to generate (default: 3) | |
| ### Performance Tuning | |
| - `cache_dir`: Model cache location | |
| - `batch_size`: For batch processing | |
| - `device`: CPU (-1) or GPU (0, 1, ...) | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **"transformers not available"** | |
| - Install: `pip install transformers torch` | |
| 2. **"Model download failed"** | |
| - Check internet connection | |
| - Verify cache directory permissions | |
| - Try: `huggingface_hub.snapshot_download('google/flan-t5-small')` | |
| 3. **"Out of memory"** | |
| - Reduce vocabulary size in thematic generator | |
| - Use smaller batch sizes | |
| - Consider model quantization | |
| 4. **Slow generation** | |
| - First run downloads model (~250MB) | |
| - Subsequent runs use cached model | |
| - CPU inference is slower than GPU but more compatible | |
| ## Production Considerations | |
| ### For Hugging Face Spaces | |
| - ✅ Model size (~250MB) fits in HF Spaces | |
| - ✅ CPU-only inference supported | |
| - ✅ No external API dependencies | |
| - ⚠️ Startup time includes model download | |
| - ⚠️ Generation time may be noticeable in UI | |
| ### Recommendations | |
| 1. **Preload models** during app startup | |
| 2. **Cache clues** aggressively to avoid regeneration | |
| 3. **Show loading indicators** during clue generation | |
| 4. **Implement timeouts** for clue generation (fallback to templates) | |
| 5. **Consider async processing** for better UX | |
| ## Alternative Models | |
| If `flan-t5-small` doesn't meet requirements: | |
| - **Smaller**: `distilgpt2` (~320MB, faster but lower quality) | |
| - **Larger**: `google/flan-t5-base` (~850MB, better quality but slower) | |
| - **Specialized**: `microsoft/DialoGPT-small` (~350MB, conversational style) | |
| ## Next Steps | |
| 1. Run tests to evaluate performance on your hardware | |
| 2. Compare clue quality with existing template system | |
| 3. Measure actual memory usage in HF Spaces environment | |
| 4. Integrate with main crossword application if results are satisfactory |