File size: 13,593 Bytes
486eff6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 |
# Crossword Generation Code Walkthrough
This document provides a detailed, line-by-line walkthrough of how crossword puzzle generation works in the Python backend, tracing the complete flow from API request to finalized words and clues.
## Overview
The Python backend implements AI-powered crossword generation using:
- **FastAPI** for web framework and API endpoints
- **Vector similarity search** with sentence-transformers and FAISS for intelligent word discovery
- **Backtracking algorithm** for word placement in the crossword grid
- **Multi-layer caching system** for performance and fallback mechanisms
## 1. Application Startup (`app.py`)
### Entry Point and Service Initialization
```python
# Lines 66-95: Async lifespan context manager
@asynccontextmanager
async def lifespan(app: FastAPI):
global vector_service
# Startup initialization
vector_service = VectorSearchService()
await vector_service.initialize()
app.state.vector_service = vector_service
```
**Flow:**
1. FastAPI application starts with lifespan context manager
2. Creates global `VectorSearchService` instance (line 79)
3. Calls `vector_service.initialize()` (line 82) which:
- Loads sentence-transformer model (~30-60 seconds)
- Builds or loads cached FAISS index for word embeddings
- Initializes word cache manager for fallbacks
4. Makes vector service available to all API routes via `app.state`
## 2. API Request Handling (`src/routes/api.py`)
### Crossword Generation Endpoint
```python
# Lines 77-118: Main crossword generation endpoint
@router.post("/generate", response_model=PuzzleResponse)
async def generate_puzzle(
request: GeneratePuzzleRequest,
crossword_gen: CrosswordGenerator = Depends(get_crossword_generator)
):
```
**Flow:**
1. Client POST request to `/api/generate` with JSON body:
```json
{
"topics": ["animals", "science"],
"difficulty": "medium",
}
```
2. FastAPI validates request against `GeneratePuzzleRequest` model (lines 20-23)
3. Dependency injection calls `get_crossword_generator()` (lines 57-63):
```python
def get_crossword_generator(request: Request) -> CrosswordGenerator:
global generator
if generator is None:
vector_service = getattr(request.app.state, 'vector_service', None)
generator = CrosswordGenerator(vector_service)
return generator
```
4. Creates or reuses `CrosswordGenerator` wrapper with vector service reference
## 3. Crossword Generation Wrapper (`src/services/crossword_generator_wrapper.py`)
### Simple Delegation Layer
```python
# Lines 20-51: Generate puzzle method
async def generate_puzzle(
self,
topics: List[str],
difficulty: str = "medium",
) -> Dict[str, Any]:
```
**Flow:**
1. Wrapper receives request from API route
2. **Line 41**: Imports actual generator to avoid circular imports:
```python
from .crossword_generator import CrosswordGenerator as ActualGenerator
```
3. **Line 42**: Creates actual generator instance with vector service
4. **Line 44**: Delegates to actual generator's `generate_puzzle()` method
## 4. Core Crossword Generation (`src/services/crossword_generator.py`)
### 4.1 Main Generation Method
```python
# Lines 22-66: Core puzzle generation
async def generate_puzzle(self, topics: List[str], difficulty: str = "medium", = False):
```
**Key steps:**
1. **Line 37**: Select words using AI or static sources:
```python
words = await self._select_words(topics, difficulty, use_ai)
```
2. **Line 44**: Create crossword grid from selected words:
```python
grid_result = self._create_grid(words)
```
3. **Lines 52-62**: Return structured response with grid, clues, and metadata
### 4.2 Word Selection Process
```python
# Lines 68-97: Word selection logic
async def _select_words(self, topics: List[str], difficulty: str, ):
```
**Flow branches based on `use_ai` parameter:**
#### AI-Powered Word Selection (Lines 72-83):
```python
if use_ai and self.vector_service:
for topic in topics:
ai_words = await self.vector_service.find_similar_words(
topic, difficulty, self.max_words // len(topics)
)
all_words.extend(ai_words)
```
#### Fallback to Cached Words (Lines 86-91):
```python
if self.vector_service:
for topic in topics:
cached_words = await self.vector_service._get_cached_fallback(
topic, difficulty, self.max_words // len(topics)
)
```
#### Final Fallback to Static JSON Files (Lines 93-95):
```python
else:
all_words = await self._get_static_words(topics, difficulty)
```
### 4.3 Word Sorting for Crossword Viability
```python
# Lines 129-168: Sort words by crossword suitability
def _sort_words_for_crossword(self, words: List[Dict[str, Any]]):
```
**Scoring algorithm:**
- **Lines 138-147**: Length-based scoring (shorter words preferred)
- **Lines 150-153**: Common letter bonus (E, A, R, I, O, T, N, S)
- **Lines 156-158**: Vowel distribution bonus
- **Lines 161-162**: Penalty for very long words
## 5. AI Word Discovery (`src/services/vector_search.py`)
### 5.1 Vector Search Initialization
```python
# Lines 71-143: Service initialization
async def initialize(self):
```
**Initialization flow:**
1. **Lines 90-92**: Load sentence-transformer model
2. **Lines 95-119**: Build or load cached FAISS index
3. **Lines 124-134**: Initialize word cache manager
### 5.2 Core Word Finding Algorithm
```python
# Lines 279-374: Main word discovery method
async def find_similar_words(
self,
topic: str,
difficulty: str = "medium",
max_words: int = 15
):
```
**Search strategy branches:**
#### Hierarchical Search (Lines 296-325):
```python
if self.use_hierarchical_search:
all_candidates = await self._hierarchical_search(topic, difficulty, max_words)
combined_results = self._combine_hierarchical_results(all_candidates, max_words * 2)
```
#### Traditional Single Search (Lines 328-337):
```python
else:
traditional_results = await self._traditional_single_search(topic, difficulty, max_words * 2)
```
### 5.3 Hierarchical Search Process
```python
# Lines 639-748: Multi-phase hierarchical search
async def _hierarchical_search(self, topic: str, difficulty: str, max_words: int):
```
**Three-phase approach:**
#### Phase 1: Topic Variations (Lines 652-694)
```python
topic_variations = self._expand_topic_variations(topic) # "Animal" β ["Animal", "Animals"]
for variation in topic_variations:
topic_embedding = self.model.encode([variation], convert_to_numpy=True)
scores, indices = self.faiss_index.search(topic_embedding, search_size)
variation_candidates = self._collect_candidates_with_threshold(scores, indices, threshold, variation, difficulty)
```
#### Phase 2: Subcategory Identification (Lines 697-700)
```python
subcategories = self._identify_subcategories(main_topic_candidates, topic)
```
#### Phase 3: Subcategory Search (Lines 703-733)
```python
for subcategory in subcategories:
subcat_embedding = self.model.encode([subcategory], convert_to_numpy=True)
sub_scores, sub_indices = self.faiss_index.search(subcat_embedding, sub_search_size)
```
### 5.4 Word Quality Filtering
```python
# Lines 1164-1215: Candidate collection with filtering
def _collect_candidates_with_threshold(self, scores, indices, threshold, topic, difficulty):
```
**Multi-stage filtering:**
1. **Line 1176**: Similarity threshold check
2. **Line 1183**: Difficulty matching (word length)
3. **Line 1185**: Interest and topic relevance check:
```python
if self._is_interesting_word(word, topic) and self._is_topic_relevant(word, topic):
```
## 6. Grid Creation and Word Placement
### 6.1 Grid Generation Entry Point
```python
# Lines 170-243: Main grid creation method
def _create_grid(self, words: List[Dict[str, Any]]):
```
**Flow:**
1. **Lines 184-203**: Process and sort words by length (longest first)
2. **Lines 209-213**: Calculate appropriate grid size
3. **Lines 212-237**: Multiple placement attempts with increasing grid size
4. **Lines 240-241**: Fallback to simple two-word cross
### 6.2 Word Placement Algorithm
```python
# Lines 259-295: Backtracking word placement
def _place_words_in_grid(self, word_list: List[str], word_objs: List[Dict[str, Any]], size: int):
```
**Setup:**
- **Line 263**: Initialize empty grid with dots
- **Line 270**: Call recursive backtracking algorithm
- **Lines 272-287**: Generate clues and assign crossword numbers
### 6.3 Backtracking Algorithm
```python
# Lines 297-357: Recursive backtracking placement
def _backtrack_placement(self, grid, word_list, word_objs, word_index, placed_words, start_time, timeout):
```
**Algorithm flow:**
#### Base Cases:
- **Lines 302-303**: Timeout check every 50 calls
- **Lines 305-306**: Success when all words placed
#### First Word Placement (Lines 312-332):
```python
if word_index == 0:
center_row = size // 2
center_col = (size - len(word)) // 2
if self._can_place_word(grid, word, center_row, center_col, "horizontal"):
original_state = self._place_word(grid, word, center_row, center_col, "horizontal")
```
#### Subsequent Word Placement (Lines 334-356):
```python
all_placements = self._find_all_intersection_placements(grid, word, placed_words)
all_placements.sort(key=lambda p: p["score"], reverse=True)
for placement in all_placements:
if self._can_place_word(grid, word, row, col, direction):
# Try placement and recurse
if self._backtrack_placement(...):
return True
# Backtrack if failed
self._remove_word(grid, original_state)
```
### 6.4 Word Placement Validation
```python
# Lines 359-417: Comprehensive placement validation
def _can_place_word(self, grid: List[List[str]], word: str, row: int, col: int, direction: str):
```
**Critical validation checks:**
1. **Lines 364-365**: Boundary checks
2. **Lines 372-375**: Word boundary enforcement (no adjacent letters)
3. **Lines 378-390**: Letter-by-letter placement validation
4. **Lines 388-390**: Perpendicular intersection validation
### 6.5 Intersection Finding
```python
# Lines 486-505: Find all possible intersections
def _find_all_intersection_placements(self, grid, word, placed_words):
```
**Process:**
1. **Lines 491-502**: For each placed word, find letter intersections
2. **Lines 496-502**: Calculate placement position for each intersection
3. **Lines 499-502**: Score placement quality
## 7. Clue Generation and Final Assembly
### 7.1 Grid Trimming and Optimization
```python
# Lines 589-642: Remove excess empty space
def _trim_grid(self, grid, placed_words):
```
**Process:**
1. **Lines 595-610**: Find bounding box of all placed words
2. **Lines 612-631**: Create trimmed grid with padding
3. **Lines 634-641**: Update word positions relative to new grid
### 7.2 Crossword Numbering
```python
# Lines 698-750: Assign proper crossword numbers and create clues
def _assign_numbers_and_clues(self, placed_words, clues_data):
```
**Crossword numbering rules:**
1. **Lines 710-714**: Group words by starting position
2. **Lines 716**: Sort by reading order (top-to-bottom, left-to-right)
3. **Lines 725-749**: Assign shared numbers for words starting at same position
4. **Lines 738-745**: Create clue objects with proper formatting
### 7.3 Final Response Assembly
**Back in `crossword_generator.py` lines 52-62:**
```python
return {
"grid": grid_result["grid"],
"clues": grid_result["clues"],
"metadata": {
"topics": topics,
"difficulty": difficulty,
"wordCount": len(grid_result["placed_words"]),
"size": len(grid_result["grid"]),
"aiGenerated": }
}
```
## 8. Caching System (`src/services/word_cache.py`)
### 8.1 Cache Initialization
```python
# Lines 75-113: Load existing cache files
async def initialize(self):
```
**Process:**
1. **Lines 86-108**: Load all `.json` cache files from disk
2. **Lines 99-102**: Validate cache structure and load into memory
3. **Lines 110**: Report loaded cache statistics
### 8.2 Word Caching
```python
# Lines 166-224: Cache successful word discoveries
async def cache_words(self, topic, difficulty, words, source="vector_search"):
```
**Storage process:**
1. **Lines 186-193**: Enhance words with caching metadata
2. **Lines 196-207**: Create structured cache data with expiration
3. **Lines 210-213**: Save to disk (if permissions allow)
4. **Lines 216-217**: Update in-memory cache
## Complete Data Flow Summary
1. **API Request** β `/api/generate` with topics, difficulty, 2. **Route Handler** β Validates request, injects dependencies
3. **Wrapper** β Delegates to actual generator with vector service
4. **Word Selection** β AI vector search OR cached words OR static JSON fallback
5. **Vector Search** (if AI enabled):
- Load sentence-transformer model
- Perform hierarchical semantic search
- Filter by similarity threshold, difficulty, relevance
- Apply word exclusions and variety filtering
6. **Grid Creation**:
- Sort words by crossword viability
- Calculate appropriate grid size
- Use backtracking algorithm to place words
- Validate word boundaries and intersections
7. **Grid Optimization**:
- Trim excess empty space
- Assign proper crossword numbers
- Generate clue objects
8. **Response Assembly** β Return grid, clues, and metadata
9. **Caching** β Store successful AI discoveries for future use
The system gracefully degrades from AI β cached words β static words, ensuring crossword generation always succeeds even when AI components fail. |