Crossword Generation Code Walkthrough
This document provides a detailed, line-by-line walkthrough of how crossword puzzle generation works in the Python backend, tracing the complete flow from API request to finalized words and clues.
Overview
The Python backend implements AI-powered crossword generation using:
- FastAPI for web framework and API endpoints
- Vector similarity search with sentence-transformers and FAISS for intelligent word discovery
- Backtracking algorithm for word placement in the crossword grid
- Multi-layer caching system for performance and fallback mechanisms
1. Application Startup (app.py)
Entry Point and Service Initialization
# Lines 66-95: Async lifespan context manager
@asynccontextmanager
async def lifespan(app: FastAPI):
global vector_service
# Startup initialization
vector_service = VectorSearchService()
await vector_service.initialize()
app.state.vector_service = vector_service
Flow:
- FastAPI application starts with lifespan context manager
- Creates global
VectorSearchServiceinstance (line 79) - Calls
vector_service.initialize()(line 82) which:- Loads sentence-transformer model (~30-60 seconds)
- Builds or loads cached FAISS index for word embeddings
- Initializes word cache manager for fallbacks
- Makes vector service available to all API routes via
app.state
2. API Request Handling (src/routes/api.py)
Crossword Generation Endpoint
# Lines 77-118: Main crossword generation endpoint
@router.post("/generate", response_model=PuzzleResponse)
async def generate_puzzle(
request: GeneratePuzzleRequest,
crossword_gen: CrosswordGenerator = Depends(get_crossword_generator)
):
Flow:
Client POST request to
/api/generatewith JSON body:{ "topics": ["animals", "science"], "difficulty": "medium", }FastAPI validates request against
GeneratePuzzleRequestmodel (lines 20-23)Dependency injection calls
get_crossword_generator()(lines 57-63):def get_crossword_generator(request: Request) -> CrosswordGenerator: global generator if generator is None: vector_service = getattr(request.app.state, 'vector_service', None) generator = CrosswordGenerator(vector_service) return generatorCreates or reuses
CrosswordGeneratorwrapper with vector service reference
3. Crossword Generation Wrapper (src/services/crossword_generator_wrapper.py)
Simple Delegation Layer
# Lines 20-51: Generate puzzle method
async def generate_puzzle(
self,
topics: List[str],
difficulty: str = "medium",
) -> Dict[str, Any]:
Flow:
- Wrapper receives request from API route
- Line 41: Imports actual generator to avoid circular imports:
from .crossword_generator import CrosswordGenerator as ActualGenerator - Line 42: Creates actual generator instance with vector service
- Line 44: Delegates to actual generator's
generate_puzzle()method
4. Core Crossword Generation (src/services/crossword_generator.py)
4.1 Main Generation Method
# Lines 22-66: Core puzzle generation
async def generate_puzzle(self, topics: List[str], difficulty: str = "medium", = False):
Key steps:
Line 37: Select words using AI or static sources:
words = await self._select_words(topics, difficulty, use_ai)Line 44: Create crossword grid from selected words:
grid_result = self._create_grid(words)Lines 52-62: Return structured response with grid, clues, and metadata
4.2 Word Selection Process
# Lines 68-97: Word selection logic
async def _select_words(self, topics: List[str], difficulty: str, ):
Flow branches based on use_ai parameter:
AI-Powered Word Selection (Lines 72-83):
if use_ai and self.vector_service:
for topic in topics:
ai_words = await self.vector_service.find_similar_words(
topic, difficulty, self.max_words // len(topics)
)
all_words.extend(ai_words)
Fallback to Cached Words (Lines 86-91):
if self.vector_service:
for topic in topics:
cached_words = await self.vector_service._get_cached_fallback(
topic, difficulty, self.max_words // len(topics)
)
Final Fallback to Static JSON Files (Lines 93-95):
else:
all_words = await self._get_static_words(topics, difficulty)
4.3 Word Sorting for Crossword Viability
# Lines 129-168: Sort words by crossword suitability
def _sort_words_for_crossword(self, words: List[Dict[str, Any]]):
Scoring algorithm:
- Lines 138-147: Length-based scoring (shorter words preferred)
- Lines 150-153: Common letter bonus (E, A, R, I, O, T, N, S)
- Lines 156-158: Vowel distribution bonus
- Lines 161-162: Penalty for very long words
5. AI Word Discovery (src/services/vector_search.py)
5.1 Vector Search Initialization
# Lines 71-143: Service initialization
async def initialize(self):
Initialization flow:
- Lines 90-92: Load sentence-transformer model
- Lines 95-119: Build or load cached FAISS index
- Lines 124-134: Initialize word cache manager
5.2 Core Word Finding Algorithm
# Lines 279-374: Main word discovery method
async def find_similar_words(
self,
topic: str,
difficulty: str = "medium",
max_words: int = 15
):
Search strategy branches:
Hierarchical Search (Lines 296-325):
if self.use_hierarchical_search:
all_candidates = await self._hierarchical_search(topic, difficulty, max_words)
combined_results = self._combine_hierarchical_results(all_candidates, max_words * 2)
Traditional Single Search (Lines 328-337):
else:
traditional_results = await self._traditional_single_search(topic, difficulty, max_words * 2)
5.3 Hierarchical Search Process
# Lines 639-748: Multi-phase hierarchical search
async def _hierarchical_search(self, topic: str, difficulty: str, max_words: int):
Three-phase approach:
Phase 1: Topic Variations (Lines 652-694)
topic_variations = self._expand_topic_variations(topic) # "Animal" β ["Animal", "Animals"]
for variation in topic_variations:
topic_embedding = self.model.encode([variation], convert_to_numpy=True)
scores, indices = self.faiss_index.search(topic_embedding, search_size)
variation_candidates = self._collect_candidates_with_threshold(scores, indices, threshold, variation, difficulty)
Phase 2: Subcategory Identification (Lines 697-700)
subcategories = self._identify_subcategories(main_topic_candidates, topic)
Phase 3: Subcategory Search (Lines 703-733)
for subcategory in subcategories:
subcat_embedding = self.model.encode([subcategory], convert_to_numpy=True)
sub_scores, sub_indices = self.faiss_index.search(subcat_embedding, sub_search_size)
5.4 Word Quality Filtering
# Lines 1164-1215: Candidate collection with filtering
def _collect_candidates_with_threshold(self, scores, indices, threshold, topic, difficulty):
Multi-stage filtering:
- Line 1176: Similarity threshold check
- Line 1183: Difficulty matching (word length)
- Line 1185: Interest and topic relevance check:
if self._is_interesting_word(word, topic) and self._is_topic_relevant(word, topic):
6. Grid Creation and Word Placement
6.1 Grid Generation Entry Point
# Lines 170-243: Main grid creation method
def _create_grid(self, words: List[Dict[str, Any]]):
Flow:
- Lines 184-203: Process and sort words by length (longest first)
- Lines 209-213: Calculate appropriate grid size
- Lines 212-237: Multiple placement attempts with increasing grid size
- Lines 240-241: Fallback to simple two-word cross
6.2 Word Placement Algorithm
# Lines 259-295: Backtracking word placement
def _place_words_in_grid(self, word_list: List[str], word_objs: List[Dict[str, Any]], size: int):
Setup:
- Line 263: Initialize empty grid with dots
- Line 270: Call recursive backtracking algorithm
- Lines 272-287: Generate clues and assign crossword numbers
6.3 Backtracking Algorithm
# Lines 297-357: Recursive backtracking placement
def _backtrack_placement(self, grid, word_list, word_objs, word_index, placed_words, start_time, timeout):
Algorithm flow:
Base Cases:
- Lines 302-303: Timeout check every 50 calls
- Lines 305-306: Success when all words placed
First Word Placement (Lines 312-332):
if word_index == 0:
center_row = size // 2
center_col = (size - len(word)) // 2
if self._can_place_word(grid, word, center_row, center_col, "horizontal"):
original_state = self._place_word(grid, word, center_row, center_col, "horizontal")
Subsequent Word Placement (Lines 334-356):
all_placements = self._find_all_intersection_placements(grid, word, placed_words)
all_placements.sort(key=lambda p: p["score"], reverse=True)
for placement in all_placements:
if self._can_place_word(grid, word, row, col, direction):
# Try placement and recurse
if self._backtrack_placement(...):
return True
# Backtrack if failed
self._remove_word(grid, original_state)
6.4 Word Placement Validation
# Lines 359-417: Comprehensive placement validation
def _can_place_word(self, grid: List[List[str]], word: str, row: int, col: int, direction: str):
Critical validation checks:
- Lines 364-365: Boundary checks
- Lines 372-375: Word boundary enforcement (no adjacent letters)
- Lines 378-390: Letter-by-letter placement validation
- Lines 388-390: Perpendicular intersection validation
6.5 Intersection Finding
# Lines 486-505: Find all possible intersections
def _find_all_intersection_placements(self, grid, word, placed_words):
Process:
- Lines 491-502: For each placed word, find letter intersections
- Lines 496-502: Calculate placement position for each intersection
- Lines 499-502: Score placement quality
7. Clue Generation and Final Assembly
7.1 Grid Trimming and Optimization
# Lines 589-642: Remove excess empty space
def _trim_grid(self, grid, placed_words):
Process:
- Lines 595-610: Find bounding box of all placed words
- Lines 612-631: Create trimmed grid with padding
- Lines 634-641: Update word positions relative to new grid
7.2 Crossword Numbering
# Lines 698-750: Assign proper crossword numbers and create clues
def _assign_numbers_and_clues(self, placed_words, clues_data):
Crossword numbering rules:
- Lines 710-714: Group words by starting position
- Lines 716: Sort by reading order (top-to-bottom, left-to-right)
- Lines 725-749: Assign shared numbers for words starting at same position
- Lines 738-745: Create clue objects with proper formatting
7.3 Final Response Assembly
Back in crossword_generator.py lines 52-62:
return {
"grid": grid_result["grid"],
"clues": grid_result["clues"],
"metadata": {
"topics": topics,
"difficulty": difficulty,
"wordCount": len(grid_result["placed_words"]),
"size": len(grid_result["grid"]),
"aiGenerated": }
}
8. Caching System (src/services/word_cache.py)
8.1 Cache Initialization
# Lines 75-113: Load existing cache files
async def initialize(self):
Process:
- Lines 86-108: Load all
.jsoncache files from disk - Lines 99-102: Validate cache structure and load into memory
- Lines 110: Report loaded cache statistics
8.2 Word Caching
# Lines 166-224: Cache successful word discoveries
async def cache_words(self, topic, difficulty, words, source="vector_search"):
Storage process:
- Lines 186-193: Enhance words with caching metadata
- Lines 196-207: Create structured cache data with expiration
- Lines 210-213: Save to disk (if permissions allow)
- Lines 216-217: Update in-memory cache
Complete Data Flow Summary
- API Request β
/api/generatewith topics, difficulty, 2. Route Handler β Validates request, injects dependencies - Wrapper β Delegates to actual generator with vector service
- Word Selection β AI vector search OR cached words OR static JSON fallback
- Vector Search (if AI enabled):
- Load sentence-transformer model
- Perform hierarchical semantic search
- Filter by similarity threshold, difficulty, relevance
- Apply word exclusions and variety filtering
- Grid Creation:
- Sort words by crossword viability
- Calculate appropriate grid size
- Use backtracking algorithm to place words
- Validate word boundaries and intersections
- Grid Optimization:
- Trim excess empty space
- Assign proper crossword numbers
- Generate clue objects
- Response Assembly β Return grid, clues, and metadata
- Caching β Store successful AI discoveries for future use
The system gracefully degrades from AI β cached words β static words, ensuring crossword generation always succeeds even when AI components fail.