Spaces:

vimalk78
/

abc123

Sleeping

App Files Files Community

abc123 / crossword-app /backend-py /CROSSWORD_GENERATION_WALKTHROUGH.md

vimalk78

feat(crossword): generated crosswords with clues

486eff6 4 months ago

preview code

raw

history blame contribute delete

13.6 kB

Crossword Generation Code Walkthrough

This document provides a detailed, line-by-line walkthrough of how crossword puzzle generation works in the Python backend, tracing the complete flow from API request to finalized words and clues.

Overview

The Python backend implements AI-powered crossword generation using:

FastAPI for web framework and API endpoints
Vector similarity search with sentence-transformers and FAISS for intelligent word discovery
Backtracking algorithm for word placement in the crossword grid
Multi-layer caching system for performance and fallback mechanisms

1. Application Startup (`app.py`)

Entry Point and Service Initialization

# Lines 66-95: Async lifespan context manager
@asynccontextmanager
async def lifespan(app: FastAPI):
    global vector_service
    
    # Startup initialization
    vector_service = VectorSearchService()
    await vector_service.initialize()
    app.state.vector_service = vector_service

Flow:

FastAPI application starts with lifespan context manager
Creates global VectorSearchService instance (line 79)
Calls vector_service.initialize() (line 82) which:
- Loads sentence-transformer model (~30-60 seconds)
- Builds or loads cached FAISS index for word embeddings
- Initializes word cache manager for fallbacks
Makes vector service available to all API routes via app.state

2. API Request Handling (`src/routes/api.py`)

Crossword Generation Endpoint

# Lines 77-118: Main crossword generation endpoint
@router.post("/generate", response_model=PuzzleResponse)
async def generate_puzzle(
    request: GeneratePuzzleRequest,
    crossword_gen: CrosswordGenerator = Depends(get_crossword_generator)
):

Flow:

Client POST request to /api/generate with JSON body:

{
  "topics": ["animals", "science"], 
  "difficulty": "medium",
     }

FastAPI validates request against GeneratePuzzleRequest model (lines 20-23)

Dependency injection calls get_crossword_generator() (lines 57-63):

def get_crossword_generator(request: Request) -> CrosswordGenerator:
    global generator
    if generator is None:
        vector_service = getattr(request.app.state, 'vector_service', None)
        generator = CrosswordGenerator(vector_service)
    return generator

Creates or reuses CrosswordGenerator wrapper with vector service reference

3. Crossword Generation Wrapper (`src/services/crossword_generator_wrapper.py`)

Simple Delegation Layer

# Lines 20-51: Generate puzzle method
async def generate_puzzle(
    self, 
    topics: List[str], 
    difficulty: str = "medium", 
    ) -> Dict[str, Any]:

Flow:

Wrapper receives request from API route

Line 41: Imports actual generator to avoid circular imports:

from .crossword_generator import CrosswordGenerator as ActualGenerator

Line 42: Creates actual generator instance with vector service
Line 44: Delegates to actual generator's generate_puzzle() method

4. Core Crossword Generation (`src/services/crossword_generator.py`)

4.1 Main Generation Method

# Lines 22-66: Core puzzle generation
async def generate_puzzle(self, topics: List[str], difficulty: str = "medium",  = False):

Key steps:

Line 37: Select words using AI or static sources:

words = await self._select_words(topics, difficulty, use_ai)

Line 44: Create crossword grid from selected words:
```
grid_result = self._create_grid(words)
```
Lines 52-62: Return structured response with grid, clues, and metadata

4.2 Word Selection Process

# Lines 68-97: Word selection logic
async def _select_words(self, topics: List[str], difficulty: str, ):

Flow branches based on use_ai parameter:

AI-Powered Word Selection (Lines 72-83):

if use_ai and self.vector_service:
    for topic in topics:
        ai_words = await self.vector_service.find_similar_words(
            topic, difficulty, self.max_words // len(topics)
        )
        all_words.extend(ai_words)

Fallback to Cached Words (Lines 86-91):

if self.vector_service:
    for topic in topics:
        cached_words = await self.vector_service._get_cached_fallback(
            topic, difficulty, self.max_words // len(topics)
        )

Final Fallback to Static JSON Files (Lines 93-95):

else:
    all_words = await self._get_static_words(topics, difficulty)

4.3 Word Sorting for Crossword Viability

# Lines 129-168: Sort words by crossword suitability
def _sort_words_for_crossword(self, words: List[Dict[str, Any]]):

Scoring algorithm:

Lines 138-147: Length-based scoring (shorter words preferred)
Lines 150-153: Common letter bonus (E, A, R, I, O, T, N, S)
Lines 156-158: Vowel distribution bonus
Lines 161-162: Penalty for very long words

5. AI Word Discovery (`src/services/vector_search.py`)

5.1 Vector Search Initialization

# Lines 71-143: Service initialization
async def initialize(self):

Initialization flow:

Lines 90-92: Load sentence-transformer model
Lines 95-119: Build or load cached FAISS index
Lines 124-134: Initialize word cache manager

5.2 Core Word Finding Algorithm

# Lines 279-374: Main word discovery method
async def find_similar_words(
    self, 
    topic: str, 
    difficulty: str = "medium", 
    max_words: int = 15
):

Search strategy branches:

Hierarchical Search (Lines 296-325):

if self.use_hierarchical_search:
    all_candidates = await self._hierarchical_search(topic, difficulty, max_words)
    combined_results = self._combine_hierarchical_results(all_candidates, max_words * 2)

Traditional Single Search (Lines 328-337):

else:
    traditional_results = await self._traditional_single_search(topic, difficulty, max_words * 2)

5.3 Hierarchical Search Process

# Lines 639-748: Multi-phase hierarchical search
async def _hierarchical_search(self, topic: str, difficulty: str, max_words: int):

Three-phase approach:

Phase 1: Topic Variations (Lines 652-694)

topic_variations = self._expand_topic_variations(topic)  # "Animal" → ["Animal", "Animals"]

for variation in topic_variations:
    topic_embedding = self.model.encode([variation], convert_to_numpy=True)
    scores, indices = self.faiss_index.search(topic_embedding, search_size)
    variation_candidates = self._collect_candidates_with_threshold(scores, indices, threshold, variation, difficulty)

Phase 2: Subcategory Identification (Lines 697-700)

subcategories = self._identify_subcategories(main_topic_candidates, topic)

Phase 3: Subcategory Search (Lines 703-733)

for subcategory in subcategories:
    subcat_embedding = self.model.encode([subcategory], convert_to_numpy=True)
    sub_scores, sub_indices = self.faiss_index.search(subcat_embedding, sub_search_size)

5.4 Word Quality Filtering

# Lines 1164-1215: Candidate collection with filtering
def _collect_candidates_with_threshold(self, scores, indices, threshold, topic, difficulty):

Multi-stage filtering:

Line 1176: Similarity threshold check
Line 1183: Difficulty matching (word length)

Line 1185: Interest and topic relevance check:

if self._is_interesting_word(word, topic) and self._is_topic_relevant(word, topic):

6. Grid Creation and Word Placement

6.1 Grid Generation Entry Point

# Lines 170-243: Main grid creation method
def _create_grid(self, words: List[Dict[str, Any]]):

Flow:

Lines 184-203: Process and sort words by length (longest first)
Lines 209-213: Calculate appropriate grid size
Lines 212-237: Multiple placement attempts with increasing grid size
Lines 240-241: Fallback to simple two-word cross

6.2 Word Placement Algorithm

# Lines 259-295: Backtracking word placement
def _place_words_in_grid(self, word_list: List[str], word_objs: List[Dict[str, Any]], size: int):

Setup:

Line 263: Initialize empty grid with dots
Line 270: Call recursive backtracking algorithm
Lines 272-287: Generate clues and assign crossword numbers

6.3 Backtracking Algorithm

# Lines 297-357: Recursive backtracking placement
def _backtrack_placement(self, grid, word_list, word_objs, word_index, placed_words, start_time, timeout):

Algorithm flow:

Base Cases:

Lines 302-303: Timeout check every 50 calls
Lines 305-306: Success when all words placed

First Word Placement (Lines 312-332):

if word_index == 0:
    center_row = size // 2
    center_col = (size - len(word)) // 2
    
    if self._can_place_word(grid, word, center_row, center_col, "horizontal"):
        original_state = self._place_word(grid, word, center_row, center_col, "horizontal")

Subsequent Word Placement (Lines 334-356):

all_placements = self._find_all_intersection_placements(grid, word, placed_words)
all_placements.sort(key=lambda p: p["score"], reverse=True)

for placement in all_placements:
    if self._can_place_word(grid, word, row, col, direction):
        # Try placement and recurse
        if self._backtrack_placement(...):
            return True
        # Backtrack if failed
        self._remove_word(grid, original_state)

6.4 Word Placement Validation

# Lines 359-417: Comprehensive placement validation
def _can_place_word(self, grid: List[List[str]], word: str, row: int, col: int, direction: str):

Critical validation checks:

Lines 364-365: Boundary checks
Lines 372-375: Word boundary enforcement (no adjacent letters)
Lines 378-390: Letter-by-letter placement validation
Lines 388-390: Perpendicular intersection validation

6.5 Intersection Finding

# Lines 486-505: Find all possible intersections
def _find_all_intersection_placements(self, grid, word, placed_words):

Process:

Lines 491-502: For each placed word, find letter intersections
Lines 496-502: Calculate placement position for each intersection
Lines 499-502: Score placement quality

7. Clue Generation and Final Assembly

7.1 Grid Trimming and Optimization

# Lines 589-642: Remove excess empty space
def _trim_grid(self, grid, placed_words):

Process:

Lines 595-610: Find bounding box of all placed words
Lines 612-631: Create trimmed grid with padding
Lines 634-641: Update word positions relative to new grid

7.2 Crossword Numbering

# Lines 698-750: Assign proper crossword numbers and create clues
def _assign_numbers_and_clues(self, placed_words, clues_data):

Crossword numbering rules:

Lines 710-714: Group words by starting position
Lines 716: Sort by reading order (top-to-bottom, left-to-right)
Lines 725-749: Assign shared numbers for words starting at same position
Lines 738-745: Create clue objects with proper formatting

7.3 Final Response Assembly

Back in crossword_generator.py lines 52-62:

return {
    "grid": grid_result["grid"],
    "clues": grid_result["clues"],
    "metadata": {
        "topics": topics,
        "difficulty": difficulty,
        "wordCount": len(grid_result["placed_words"]),
        "size": len(grid_result["grid"]),
        "aiGenerated":     }
}

8. Caching System (`src/services/word_cache.py`)

8.1 Cache Initialization

# Lines 75-113: Load existing cache files
async def initialize(self):

Process:

Lines 86-108: Load all .json cache files from disk
Lines 99-102: Validate cache structure and load into memory
Lines 110: Report loaded cache statistics

8.2 Word Caching

# Lines 166-224: Cache successful word discoveries
async def cache_words(self, topic, difficulty, words, source="vector_search"):

Storage process:

Lines 186-193: Enhance words with caching metadata
Lines 196-207: Create structured cache data with expiration
Lines 210-213: Save to disk (if permissions allow)
Lines 216-217: Update in-memory cache

Complete Data Flow Summary

API Request → /api/generate with topics, difficulty, 2. Route Handler → Validates request, injects dependencies
Wrapper → Delegates to actual generator with vector service
Word Selection → AI vector search OR cached words OR static JSON fallback
Vector Search (if AI enabled):
- Load sentence-transformer model
- Perform hierarchical semantic search
- Filter by similarity threshold, difficulty, relevance
- Apply word exclusions and variety filtering
Grid Creation:
- Sort words by crossword viability
- Calculate appropriate grid size
- Use backtracking algorithm to place words
- Validate word boundaries and intersections
Grid Optimization:
- Trim excess empty space
- Assign proper crossword numbers
- Generate clue objects
Response Assembly → Return grid, clues, and metadata
Caching → Store successful AI discoveries for future use

The system gracefully degrades from AI → cached words → static words, ensuring crossword generation always succeeds even when AI components fail.

Crossword Generation Code Walkthrough

Overview

1. Application Startup (app.py)

Entry Point and Service Initialization

2. API Request Handling (src/routes/api.py)

Crossword Generation Endpoint

3. Crossword Generation Wrapper (src/services/crossword_generator_wrapper.py)

Simple Delegation Layer

4. Core Crossword Generation (src/services/crossword_generator.py)

4.1 Main Generation Method

4.2 Word Selection Process

AI-Powered Word Selection (Lines 72-83):

Fallback to Cached Words (Lines 86-91):

Final Fallback to Static JSON Files (Lines 93-95):

4.3 Word Sorting for Crossword Viability

5. AI Word Discovery (src/services/vector_search.py)

5.1 Vector Search Initialization

5.2 Core Word Finding Algorithm

Hierarchical Search (Lines 296-325):

Traditional Single Search (Lines 328-337):

5.3 Hierarchical Search Process

Phase 1: Topic Variations (Lines 652-694)

Phase 2: Subcategory Identification (Lines 697-700)

Phase 3: Subcategory Search (Lines 703-733)

5.4 Word Quality Filtering

6. Grid Creation and Word Placement

6.1 Grid Generation Entry Point

6.2 Word Placement Algorithm

6.3 Backtracking Algorithm

Base Cases:

First Word Placement (Lines 312-332):

Subsequent Word Placement (Lines 334-356):

6.4 Word Placement Validation

6.5 Intersection Finding

7. Clue Generation and Final Assembly

7.1 Grid Trimming and Optimization

7.2 Crossword Numbering

7.3 Final Response Assembly

8. Caching System (src/services/word_cache.py)

8.1 Cache Initialization

8.2 Word Caching

Complete Data Flow Summary

1. Application Startup (`app.py`)

2. API Request Handling (`src/routes/api.py`)

3. Crossword Generation Wrapper (`src/services/crossword_generator_wrapper.py`)

4. Core Crossword Generation (`src/services/crossword_generator.py`)

5. AI Word Discovery (`src/services/vector_search.py`)

8. Caching System (`src/services/word_cache.py`)