File size: 13,593 Bytes
486eff6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
# Crossword Generation Code Walkthrough

This document provides a detailed, line-by-line walkthrough of how crossword puzzle generation works in the Python backend, tracing the complete flow from API request to finalized words and clues.

## Overview

The Python backend implements AI-powered crossword generation using:
- **FastAPI** for web framework and API endpoints
- **Vector similarity search** with sentence-transformers and FAISS for intelligent word discovery
- **Backtracking algorithm** for word placement in the crossword grid
- **Multi-layer caching system** for performance and fallback mechanisms

## 1. Application Startup (`app.py`)

### Entry Point and Service Initialization

```python
# Lines 66-95: Async lifespan context manager
@asynccontextmanager
async def lifespan(app: FastAPI):
    global vector_service
    
    # Startup initialization
    vector_service = VectorSearchService()
    await vector_service.initialize()
    app.state.vector_service = vector_service
```

**Flow:**
1. FastAPI application starts with lifespan context manager
2. Creates global `VectorSearchService` instance (line 79)
3. Calls `vector_service.initialize()` (line 82) which:
   - Loads sentence-transformer model (~30-60 seconds)
   - Builds or loads cached FAISS index for word embeddings
   - Initializes word cache manager for fallbacks
4. Makes vector service available to all API routes via `app.state`

## 2. API Request Handling (`src/routes/api.py`)

### Crossword Generation Endpoint

```python
# Lines 77-118: Main crossword generation endpoint
@router.post("/generate", response_model=PuzzleResponse)
async def generate_puzzle(
    request: GeneratePuzzleRequest,
    crossword_gen: CrosswordGenerator = Depends(get_crossword_generator)
):
```

**Flow:**
1. Client POST request to `/api/generate` with JSON body:
   ```json
   {
     "topics": ["animals", "science"], 
     "difficulty": "medium",
        }
   ```

2. FastAPI validates request against `GeneratePuzzleRequest` model (lines 20-23)

3. Dependency injection calls `get_crossword_generator()` (lines 57-63):
   ```python
   def get_crossword_generator(request: Request) -> CrosswordGenerator:
       global generator
       if generator is None:
           vector_service = getattr(request.app.state, 'vector_service', None)
           generator = CrosswordGenerator(vector_service)
       return generator
   ```

4. Creates or reuses `CrosswordGenerator` wrapper with vector service reference

## 3. Crossword Generation Wrapper (`src/services/crossword_generator_wrapper.py`)

### Simple Delegation Layer

```python
# Lines 20-51: Generate puzzle method
async def generate_puzzle(
    self, 
    topics: List[str], 
    difficulty: str = "medium", 
    ) -> Dict[str, Any]:
```

**Flow:**
1. Wrapper receives request from API route
2. **Line 41**: Imports actual generator to avoid circular imports:
   ```python
   from .crossword_generator import CrosswordGenerator as ActualGenerator
   ```
3. **Line 42**: Creates actual generator instance with vector service
4. **Line 44**: Delegates to actual generator's `generate_puzzle()` method

## 4. Core Crossword Generation (`src/services/crossword_generator.py`)

### 4.1 Main Generation Method

```python
# Lines 22-66: Core puzzle generation
async def generate_puzzle(self, topics: List[str], difficulty: str = "medium",  = False):
```

**Key steps:**

1. **Line 37**: Select words using AI or static sources:
   ```python
   words = await self._select_words(topics, difficulty, use_ai)
   ```

2. **Line 44**: Create crossword grid from selected words:
   ```python
   grid_result = self._create_grid(words)
   ```

3. **Lines 52-62**: Return structured response with grid, clues, and metadata

### 4.2 Word Selection Process

```python
# Lines 68-97: Word selection logic
async def _select_words(self, topics: List[str], difficulty: str, ):
```

**Flow branches based on `use_ai` parameter:**

#### AI-Powered Word Selection (Lines 72-83):
```python
if use_ai and self.vector_service:
    for topic in topics:
        ai_words = await self.vector_service.find_similar_words(
            topic, difficulty, self.max_words // len(topics)
        )
        all_words.extend(ai_words)
```

#### Fallback to Cached Words (Lines 86-91):
```python
if self.vector_service:
    for topic in topics:
        cached_words = await self.vector_service._get_cached_fallback(
            topic, difficulty, self.max_words // len(topics)
        )
```

#### Final Fallback to Static JSON Files (Lines 93-95):
```python
else:
    all_words = await self._get_static_words(topics, difficulty)
```

### 4.3 Word Sorting for Crossword Viability

```python
# Lines 129-168: Sort words by crossword suitability
def _sort_words_for_crossword(self, words: List[Dict[str, Any]]):
```

**Scoring algorithm:**
- **Lines 138-147**: Length-based scoring (shorter words preferred)
- **Lines 150-153**: Common letter bonus (E, A, R, I, O, T, N, S)
- **Lines 156-158**: Vowel distribution bonus
- **Lines 161-162**: Penalty for very long words

## 5. AI Word Discovery (`src/services/vector_search.py`)

### 5.1 Vector Search Initialization

```python
# Lines 71-143: Service initialization
async def initialize(self):
```

**Initialization flow:**
1. **Lines 90-92**: Load sentence-transformer model
2. **Lines 95-119**: Build or load cached FAISS index
3. **Lines 124-134**: Initialize word cache manager

### 5.2 Core Word Finding Algorithm

```python
# Lines 279-374: Main word discovery method
async def find_similar_words(
    self, 
    topic: str, 
    difficulty: str = "medium", 
    max_words: int = 15
):
```

**Search strategy branches:**

#### Hierarchical Search (Lines 296-325):
```python
if self.use_hierarchical_search:
    all_candidates = await self._hierarchical_search(topic, difficulty, max_words)
    combined_results = self._combine_hierarchical_results(all_candidates, max_words * 2)
```

#### Traditional Single Search (Lines 328-337):
```python
else:
    traditional_results = await self._traditional_single_search(topic, difficulty, max_words * 2)
```

### 5.3 Hierarchical Search Process

```python
# Lines 639-748: Multi-phase hierarchical search
async def _hierarchical_search(self, topic: str, difficulty: str, max_words: int):
```

**Three-phase approach:**

#### Phase 1: Topic Variations (Lines 652-694)
```python
topic_variations = self._expand_topic_variations(topic)  # "Animal" β†’ ["Animal", "Animals"]

for variation in topic_variations:
    topic_embedding = self.model.encode([variation], convert_to_numpy=True)
    scores, indices = self.faiss_index.search(topic_embedding, search_size)
    variation_candidates = self._collect_candidates_with_threshold(scores, indices, threshold, variation, difficulty)
```

#### Phase 2: Subcategory Identification (Lines 697-700)
```python
subcategories = self._identify_subcategories(main_topic_candidates, topic)
```

#### Phase 3: Subcategory Search (Lines 703-733)
```python
for subcategory in subcategories:
    subcat_embedding = self.model.encode([subcategory], convert_to_numpy=True)
    sub_scores, sub_indices = self.faiss_index.search(subcat_embedding, sub_search_size)
```

### 5.4 Word Quality Filtering

```python
# Lines 1164-1215: Candidate collection with filtering
def _collect_candidates_with_threshold(self, scores, indices, threshold, topic, difficulty):
```

**Multi-stage filtering:**
1. **Line 1176**: Similarity threshold check
2. **Line 1183**: Difficulty matching (word length)
3. **Line 1185**: Interest and topic relevance check:
   ```python
   if self._is_interesting_word(word, topic) and self._is_topic_relevant(word, topic):
   ```

## 6. Grid Creation and Word Placement

### 6.1 Grid Generation Entry Point

```python
# Lines 170-243: Main grid creation method
def _create_grid(self, words: List[Dict[str, Any]]):
```

**Flow:**
1. **Lines 184-203**: Process and sort words by length (longest first)
2. **Lines 209-213**: Calculate appropriate grid size
3. **Lines 212-237**: Multiple placement attempts with increasing grid size
4. **Lines 240-241**: Fallback to simple two-word cross

### 6.2 Word Placement Algorithm

```python
# Lines 259-295: Backtracking word placement
def _place_words_in_grid(self, word_list: List[str], word_objs: List[Dict[str, Any]], size: int):
```

**Setup:**
- **Line 263**: Initialize empty grid with dots
- **Line 270**: Call recursive backtracking algorithm
- **Lines 272-287**: Generate clues and assign crossword numbers

### 6.3 Backtracking Algorithm

```python
# Lines 297-357: Recursive backtracking placement
def _backtrack_placement(self, grid, word_list, word_objs, word_index, placed_words, start_time, timeout):
```

**Algorithm flow:**

#### Base Cases:
- **Lines 302-303**: Timeout check every 50 calls
- **Lines 305-306**: Success when all words placed

#### First Word Placement (Lines 312-332):
```python
if word_index == 0:
    center_row = size // 2
    center_col = (size - len(word)) // 2
    
    if self._can_place_word(grid, word, center_row, center_col, "horizontal"):
        original_state = self._place_word(grid, word, center_row, center_col, "horizontal")
```

#### Subsequent Word Placement (Lines 334-356):
```python
all_placements = self._find_all_intersection_placements(grid, word, placed_words)
all_placements.sort(key=lambda p: p["score"], reverse=True)

for placement in all_placements:
    if self._can_place_word(grid, word, row, col, direction):
        # Try placement and recurse
        if self._backtrack_placement(...):
            return True
        # Backtrack if failed
        self._remove_word(grid, original_state)
```

### 6.4 Word Placement Validation

```python
# Lines 359-417: Comprehensive placement validation
def _can_place_word(self, grid: List[List[str]], word: str, row: int, col: int, direction: str):
```

**Critical validation checks:**
1. **Lines 364-365**: Boundary checks
2. **Lines 372-375**: Word boundary enforcement (no adjacent letters)
3. **Lines 378-390**: Letter-by-letter placement validation
4. **Lines 388-390**: Perpendicular intersection validation

### 6.5 Intersection Finding

```python
# Lines 486-505: Find all possible intersections
def _find_all_intersection_placements(self, grid, word, placed_words):
```

**Process:**
1. **Lines 491-502**: For each placed word, find letter intersections
2. **Lines 496-502**: Calculate placement position for each intersection
3. **Lines 499-502**: Score placement quality

## 7. Clue Generation and Final Assembly

### 7.1 Grid Trimming and Optimization

```python
# Lines 589-642: Remove excess empty space
def _trim_grid(self, grid, placed_words):
```

**Process:**
1. **Lines 595-610**: Find bounding box of all placed words
2. **Lines 612-631**: Create trimmed grid with padding
3. **Lines 634-641**: Update word positions relative to new grid

### 7.2 Crossword Numbering

```python
# Lines 698-750: Assign proper crossword numbers and create clues
def _assign_numbers_and_clues(self, placed_words, clues_data):
```

**Crossword numbering rules:**
1. **Lines 710-714**: Group words by starting position
2. **Lines 716**: Sort by reading order (top-to-bottom, left-to-right)
3. **Lines 725-749**: Assign shared numbers for words starting at same position
4. **Lines 738-745**: Create clue objects with proper formatting

### 7.3 Final Response Assembly

**Back in `crossword_generator.py` lines 52-62:**
```python
return {
    "grid": grid_result["grid"],
    "clues": grid_result["clues"],
    "metadata": {
        "topics": topics,
        "difficulty": difficulty,
        "wordCount": len(grid_result["placed_words"]),
        "size": len(grid_result["grid"]),
        "aiGenerated":     }
}
```

## 8. Caching System (`src/services/word_cache.py`)

### 8.1 Cache Initialization

```python
# Lines 75-113: Load existing cache files
async def initialize(self):
```

**Process:**
1. **Lines 86-108**: Load all `.json` cache files from disk
2. **Lines 99-102**: Validate cache structure and load into memory
3. **Lines 110**: Report loaded cache statistics

### 8.2 Word Caching

```python
# Lines 166-224: Cache successful word discoveries
async def cache_words(self, topic, difficulty, words, source="vector_search"):
```

**Storage process:**
1. **Lines 186-193**: Enhance words with caching metadata
2. **Lines 196-207**: Create structured cache data with expiration
3. **Lines 210-213**: Save to disk (if permissions allow)
4. **Lines 216-217**: Update in-memory cache

## Complete Data Flow Summary

1. **API Request** β†’ `/api/generate` with topics, difficulty, 2. **Route Handler** β†’ Validates request, injects dependencies
3. **Wrapper** β†’ Delegates to actual generator with vector service
4. **Word Selection** β†’ AI vector search OR cached words OR static JSON fallback
5. **Vector Search** (if AI enabled):
   - Load sentence-transformer model
   - Perform hierarchical semantic search
   - Filter by similarity threshold, difficulty, relevance
   - Apply word exclusions and variety filtering
6. **Grid Creation**:
   - Sort words by crossword viability
   - Calculate appropriate grid size
   - Use backtracking algorithm to place words
   - Validate word boundaries and intersections
7. **Grid Optimization**:
   - Trim excess empty space
   - Assign proper crossword numbers
   - Generate clue objects
8. **Response Assembly** β†’ Return grid, clues, and metadata
9. **Caching** β†’ Store successful AI discoveries for future use

The system gracefully degrades from AI β†’ cached words β†’ static words, ensuring crossword generation always succeeds even when AI components fail.