| # Crossword Generation Code Walkthrough | |
| This document provides a detailed, line-by-line walkthrough of how crossword puzzle generation works in the Python backend, tracing the complete flow from API request to finalized words and clues. | |
| ## Overview | |
| The Python backend implements AI-powered crossword generation using: | |
| - **FastAPI** for web framework and API endpoints | |
| - **Vector similarity search** with sentence-transformers and FAISS for intelligent word discovery | |
| - **Backtracking algorithm** for word placement in the crossword grid | |
| - **Multi-layer caching system** for performance and fallback mechanisms | |
| ## 1. Application Startup (`app.py`) | |
| ### Entry Point and Service Initialization | |
| ```python | |
| # Lines 66-95: Async lifespan context manager | |
| @asynccontextmanager | |
| async def lifespan(app: FastAPI): | |
| global vector_service | |
| # Startup initialization | |
| vector_service = VectorSearchService() | |
| await vector_service.initialize() | |
| app.state.vector_service = vector_service | |
| ``` | |
| **Flow:** | |
| 1. FastAPI application starts with lifespan context manager | |
| 2. Creates global `VectorSearchService` instance (line 79) | |
| 3. Calls `vector_service.initialize()` (line 82) which: | |
| - Loads sentence-transformer model (~30-60 seconds) | |
| - Builds or loads cached FAISS index for word embeddings | |
| - Initializes word cache manager for fallbacks | |
| 4. Makes vector service available to all API routes via `app.state` | |
| ## 2. API Request Handling (`src/routes/api.py`) | |
| ### Crossword Generation Endpoint | |
| ```python | |
| # Lines 77-118: Main crossword generation endpoint | |
| @router.post("/generate", response_model=PuzzleResponse) | |
| async def generate_puzzle( | |
| request: GeneratePuzzleRequest, | |
| crossword_gen: CrosswordGenerator = Depends(get_crossword_generator) | |
| ): | |
| ``` | |
| **Flow:** | |
| 1. Client POST request to `/api/generate` with JSON body: | |
| ```json | |
| { | |
| "topics": ["animals", "science"], | |
| "difficulty": "medium", | |
| } | |
| ``` | |
| 2. FastAPI validates request against `GeneratePuzzleRequest` model (lines 20-23) | |
| 3. Dependency injection calls `get_crossword_generator()` (lines 57-63): | |
| ```python | |
| def get_crossword_generator(request: Request) -> CrosswordGenerator: | |
| global generator | |
| if generator is None: | |
| vector_service = getattr(request.app.state, 'vector_service', None) | |
| generator = CrosswordGenerator(vector_service) | |
| return generator | |
| ``` | |
| 4. Creates or reuses `CrosswordGenerator` wrapper with vector service reference | |
| ## 3. Crossword Generation Wrapper (`src/services/crossword_generator_wrapper.py`) | |
| ### Simple Delegation Layer | |
| ```python | |
| # Lines 20-51: Generate puzzle method | |
| async def generate_puzzle( | |
| self, | |
| topics: List[str], | |
| difficulty: str = "medium", | |
| ) -> Dict[str, Any]: | |
| ``` | |
| **Flow:** | |
| 1. Wrapper receives request from API route | |
| 2. **Line 41**: Imports actual generator to avoid circular imports: | |
| ```python | |
| from .crossword_generator import CrosswordGenerator as ActualGenerator | |
| ``` | |
| 3. **Line 42**: Creates actual generator instance with vector service | |
| 4. **Line 44**: Delegates to actual generator's `generate_puzzle()` method | |
| ## 4. Core Crossword Generation (`src/services/crossword_generator.py`) | |
| ### 4.1 Main Generation Method | |
| ```python | |
| # Lines 22-66: Core puzzle generation | |
| async def generate_puzzle(self, topics: List[str], difficulty: str = "medium", = False): | |
| ``` | |
| **Key steps:** | |
| 1. **Line 37**: Select words using AI or static sources: | |
| ```python | |
| words = await self._select_words(topics, difficulty, use_ai) | |
| ``` | |
| 2. **Line 44**: Create crossword grid from selected words: | |
| ```python | |
| grid_result = self._create_grid(words) | |
| ``` | |
| 3. **Lines 52-62**: Return structured response with grid, clues, and metadata | |
| ### 4.2 Word Selection Process | |
| ```python | |
| # Lines 68-97: Word selection logic | |
| async def _select_words(self, topics: List[str], difficulty: str, ): | |
| ``` | |
| **Flow branches based on `use_ai` parameter:** | |
| #### AI-Powered Word Selection (Lines 72-83): | |
| ```python | |
| if use_ai and self.vector_service: | |
| for topic in topics: | |
| ai_words = await self.vector_service.find_similar_words( | |
| topic, difficulty, self.max_words // len(topics) | |
| ) | |
| all_words.extend(ai_words) | |
| ``` | |
| #### Fallback to Cached Words (Lines 86-91): | |
| ```python | |
| if self.vector_service: | |
| for topic in topics: | |
| cached_words = await self.vector_service._get_cached_fallback( | |
| topic, difficulty, self.max_words // len(topics) | |
| ) | |
| ``` | |
| #### Final Fallback to Static JSON Files (Lines 93-95): | |
| ```python | |
| else: | |
| all_words = await self._get_static_words(topics, difficulty) | |
| ``` | |
| ### 4.3 Word Sorting for Crossword Viability | |
| ```python | |
| # Lines 129-168: Sort words by crossword suitability | |
| def _sort_words_for_crossword(self, words: List[Dict[str, Any]]): | |
| ``` | |
| **Scoring algorithm:** | |
| - **Lines 138-147**: Length-based scoring (shorter words preferred) | |
| - **Lines 150-153**: Common letter bonus (E, A, R, I, O, T, N, S) | |
| - **Lines 156-158**: Vowel distribution bonus | |
| - **Lines 161-162**: Penalty for very long words | |
| ## 5. AI Word Discovery (`src/services/vector_search.py`) | |
| ### 5.1 Vector Search Initialization | |
| ```python | |
| # Lines 71-143: Service initialization | |
| async def initialize(self): | |
| ``` | |
| **Initialization flow:** | |
| 1. **Lines 90-92**: Load sentence-transformer model | |
| 2. **Lines 95-119**: Build or load cached FAISS index | |
| 3. **Lines 124-134**: Initialize word cache manager | |
| ### 5.2 Core Word Finding Algorithm | |
| ```python | |
| # Lines 279-374: Main word discovery method | |
| async def find_similar_words( | |
| self, | |
| topic: str, | |
| difficulty: str = "medium", | |
| max_words: int = 15 | |
| ): | |
| ``` | |
| **Search strategy branches:** | |
| #### Hierarchical Search (Lines 296-325): | |
| ```python | |
| if self.use_hierarchical_search: | |
| all_candidates = await self._hierarchical_search(topic, difficulty, max_words) | |
| combined_results = self._combine_hierarchical_results(all_candidates, max_words * 2) | |
| ``` | |
| #### Traditional Single Search (Lines 328-337): | |
| ```python | |
| else: | |
| traditional_results = await self._traditional_single_search(topic, difficulty, max_words * 2) | |
| ``` | |
| ### 5.3 Hierarchical Search Process | |
| ```python | |
| # Lines 639-748: Multi-phase hierarchical search | |
| async def _hierarchical_search(self, topic: str, difficulty: str, max_words: int): | |
| ``` | |
| **Three-phase approach:** | |
| #### Phase 1: Topic Variations (Lines 652-694) | |
| ```python | |
| topic_variations = self._expand_topic_variations(topic) # "Animal" β ["Animal", "Animals"] | |
| for variation in topic_variations: | |
| topic_embedding = self.model.encode([variation], convert_to_numpy=True) | |
| scores, indices = self.faiss_index.search(topic_embedding, search_size) | |
| variation_candidates = self._collect_candidates_with_threshold(scores, indices, threshold, variation, difficulty) | |
| ``` | |
| #### Phase 2: Subcategory Identification (Lines 697-700) | |
| ```python | |
| subcategories = self._identify_subcategories(main_topic_candidates, topic) | |
| ``` | |
| #### Phase 3: Subcategory Search (Lines 703-733) | |
| ```python | |
| for subcategory in subcategories: | |
| subcat_embedding = self.model.encode([subcategory], convert_to_numpy=True) | |
| sub_scores, sub_indices = self.faiss_index.search(subcat_embedding, sub_search_size) | |
| ``` | |
| ### 5.4 Word Quality Filtering | |
| ```python | |
| # Lines 1164-1215: Candidate collection with filtering | |
| def _collect_candidates_with_threshold(self, scores, indices, threshold, topic, difficulty): | |
| ``` | |
| **Multi-stage filtering:** | |
| 1. **Line 1176**: Similarity threshold check | |
| 2. **Line 1183**: Difficulty matching (word length) | |
| 3. **Line 1185**: Interest and topic relevance check: | |
| ```python | |
| if self._is_interesting_word(word, topic) and self._is_topic_relevant(word, topic): | |
| ``` | |
| ## 6. Grid Creation and Word Placement | |
| ### 6.1 Grid Generation Entry Point | |
| ```python | |
| # Lines 170-243: Main grid creation method | |
| def _create_grid(self, words: List[Dict[str, Any]]): | |
| ``` | |
| **Flow:** | |
| 1. **Lines 184-203**: Process and sort words by length (longest first) | |
| 2. **Lines 209-213**: Calculate appropriate grid size | |
| 3. **Lines 212-237**: Multiple placement attempts with increasing grid size | |
| 4. **Lines 240-241**: Fallback to simple two-word cross | |
| ### 6.2 Word Placement Algorithm | |
| ```python | |
| # Lines 259-295: Backtracking word placement | |
| def _place_words_in_grid(self, word_list: List[str], word_objs: List[Dict[str, Any]], size: int): | |
| ``` | |
| **Setup:** | |
| - **Line 263**: Initialize empty grid with dots | |
| - **Line 270**: Call recursive backtracking algorithm | |
| - **Lines 272-287**: Generate clues and assign crossword numbers | |
| ### 6.3 Backtracking Algorithm | |
| ```python | |
| # Lines 297-357: Recursive backtracking placement | |
| def _backtrack_placement(self, grid, word_list, word_objs, word_index, placed_words, start_time, timeout): | |
| ``` | |
| **Algorithm flow:** | |
| #### Base Cases: | |
| - **Lines 302-303**: Timeout check every 50 calls | |
| - **Lines 305-306**: Success when all words placed | |
| #### First Word Placement (Lines 312-332): | |
| ```python | |
| if word_index == 0: | |
| center_row = size // 2 | |
| center_col = (size - len(word)) // 2 | |
| if self._can_place_word(grid, word, center_row, center_col, "horizontal"): | |
| original_state = self._place_word(grid, word, center_row, center_col, "horizontal") | |
| ``` | |
| #### Subsequent Word Placement (Lines 334-356): | |
| ```python | |
| all_placements = self._find_all_intersection_placements(grid, word, placed_words) | |
| all_placements.sort(key=lambda p: p["score"], reverse=True) | |
| for placement in all_placements: | |
| if self._can_place_word(grid, word, row, col, direction): | |
| # Try placement and recurse | |
| if self._backtrack_placement(...): | |
| return True | |
| # Backtrack if failed | |
| self._remove_word(grid, original_state) | |
| ``` | |
| ### 6.4 Word Placement Validation | |
| ```python | |
| # Lines 359-417: Comprehensive placement validation | |
| def _can_place_word(self, grid: List[List[str]], word: str, row: int, col: int, direction: str): | |
| ``` | |
| **Critical validation checks:** | |
| 1. **Lines 364-365**: Boundary checks | |
| 2. **Lines 372-375**: Word boundary enforcement (no adjacent letters) | |
| 3. **Lines 378-390**: Letter-by-letter placement validation | |
| 4. **Lines 388-390**: Perpendicular intersection validation | |
| ### 6.5 Intersection Finding | |
| ```python | |
| # Lines 486-505: Find all possible intersections | |
| def _find_all_intersection_placements(self, grid, word, placed_words): | |
| ``` | |
| **Process:** | |
| 1. **Lines 491-502**: For each placed word, find letter intersections | |
| 2. **Lines 496-502**: Calculate placement position for each intersection | |
| 3. **Lines 499-502**: Score placement quality | |
| ## 7. Clue Generation and Final Assembly | |
| ### 7.1 Grid Trimming and Optimization | |
| ```python | |
| # Lines 589-642: Remove excess empty space | |
| def _trim_grid(self, grid, placed_words): | |
| ``` | |
| **Process:** | |
| 1. **Lines 595-610**: Find bounding box of all placed words | |
| 2. **Lines 612-631**: Create trimmed grid with padding | |
| 3. **Lines 634-641**: Update word positions relative to new grid | |
| ### 7.2 Crossword Numbering | |
| ```python | |
| # Lines 698-750: Assign proper crossword numbers and create clues | |
| def _assign_numbers_and_clues(self, placed_words, clues_data): | |
| ``` | |
| **Crossword numbering rules:** | |
| 1. **Lines 710-714**: Group words by starting position | |
| 2. **Lines 716**: Sort by reading order (top-to-bottom, left-to-right) | |
| 3. **Lines 725-749**: Assign shared numbers for words starting at same position | |
| 4. **Lines 738-745**: Create clue objects with proper formatting | |
| ### 7.3 Final Response Assembly | |
| **Back in `crossword_generator.py` lines 52-62:** | |
| ```python | |
| return { | |
| "grid": grid_result["grid"], | |
| "clues": grid_result["clues"], | |
| "metadata": { | |
| "topics": topics, | |
| "difficulty": difficulty, | |
| "wordCount": len(grid_result["placed_words"]), | |
| "size": len(grid_result["grid"]), | |
| "aiGenerated": } | |
| } | |
| ``` | |
| ## 8. Caching System (`src/services/word_cache.py`) | |
| ### 8.1 Cache Initialization | |
| ```python | |
| # Lines 75-113: Load existing cache files | |
| async def initialize(self): | |
| ``` | |
| **Process:** | |
| 1. **Lines 86-108**: Load all `.json` cache files from disk | |
| 2. **Lines 99-102**: Validate cache structure and load into memory | |
| 3. **Lines 110**: Report loaded cache statistics | |
| ### 8.2 Word Caching | |
| ```python | |
| # Lines 166-224: Cache successful word discoveries | |
| async def cache_words(self, topic, difficulty, words, source="vector_search"): | |
| ``` | |
| **Storage process:** | |
| 1. **Lines 186-193**: Enhance words with caching metadata | |
| 2. **Lines 196-207**: Create structured cache data with expiration | |
| 3. **Lines 210-213**: Save to disk (if permissions allow) | |
| 4. **Lines 216-217**: Update in-memory cache | |
| ## Complete Data Flow Summary | |
| 1. **API Request** β `/api/generate` with topics, difficulty, 2. **Route Handler** β Validates request, injects dependencies | |
| 3. **Wrapper** β Delegates to actual generator with vector service | |
| 4. **Word Selection** β AI vector search OR cached words OR static JSON fallback | |
| 5. **Vector Search** (if AI enabled): | |
| - Load sentence-transformer model | |
| - Perform hierarchical semantic search | |
| - Filter by similarity threshold, difficulty, relevance | |
| - Apply word exclusions and variety filtering | |
| 6. **Grid Creation**: | |
| - Sort words by crossword viability | |
| - Calculate appropriate grid size | |
| - Use backtracking algorithm to place words | |
| - Validate word boundaries and intersections | |
| 7. **Grid Optimization**: | |
| - Trim excess empty space | |
| - Assign proper crossword numbers | |
| - Generate clue objects | |
| 8. **Response Assembly** β Return grid, clues, and metadata | |
| 9. **Caching** β Store successful AI discoveries for future use | |
| The system gracefully degrades from AI β cached words β static words, ensuring crossword generation always succeeds even when AI components fail. |