Spaces:

vimalk78
/

abc123

Sleeping

vimalk78 commited on Aug 29

Commit

d5df3cd

1 Parent(s): 676533d

fix: Optimize word selection parameters to fix inverse difficulty selection

- Reduce temperature from 0.7 to 0.2 for more deterministic selection
- Increase difficulty_weight from 0.3 to 0.5 for stronger frequency influence
- Fix issue where easy mode selected rare words and hard mode selected common words
- Update documentation with parameter analysis and optimization results

Signed-off-by: Vimal Kumar <vimal78@gmail.com>

Files changed (2) hide show

crossword-app/backend-py/docs/composite_scoring_algorithm.md +127 -19
crossword-app/backend-py/src/services/thematic_word_service.py +41 -30

crossword-app/backend-py/docs/composite_scoring_algorithm.md CHANGED Viewed

@@ -14,9 +14,9 @@ This creates smooth, probabilistic selection that naturally favors appropriate w
 ```python
 composite_score = (1 - difficulty_weight) * similarity + difficulty_weight * frequency_alignment
-# Default values:
-# difficulty_weight = 0.3 (30% frequency influence)
-# Therefore: 70% similarity + 30% frequency alignment
 ```
 ## Frequency Alignment Using Gaussian Distributions
@@ -92,37 +92,37 @@ composite = (1 - difficulty_weight) * similarity + difficulty_weight * frequency
 ## Concrete Examples
-### Scenario: Theme = "animals", difficulty_weight = 0.3
 #### Example 1: Easy Mode
 **CAT** (common word):
 - similarity = 0.8
 - percentile = 0.95 (95th percentile)
 - frequency_alignment = exp(-((0.95 - 0.9)² / 0.02)) = exp(-0.00125) ≈ 0.999
-- composite = 0.7 * 0.8 + 0.3 * 0.999 = 0.56 + 0.3 = **0.86**
 **PLATYPUS** (rare word):
 - similarity = 0.9 (higher semantic relevance)
 - percentile = 0.15 (15th percentile)
 - frequency_alignment = exp(-((0.15 - 0.9)² / 0.02)) = exp(-28.125) ≈ 0.000
-- composite = 0.7 * 0.9 + 0.3 * 0.000 = 0.63 + 0 = **0.63**
-**Result**: CAT wins despite lower similarity (0.86 > 0.63)
 #### Example 2: Hard Mode
 **CAT** (common word):
 - similarity = 0.8
 - percentile = 0.95
 - frequency_alignment = exp(-((0.95 - 0.2)² / 0.045)) = exp(-12.5) ≈ 0.000
-- composite = 0.7 * 0.8 + 0.3 * 0.000 = **0.56**
 **PLATYPUS** (rare word):
 - similarity = 0.9
 - percentile = 0.15
 - frequency_alignment = exp(-((0.15 - 0.2)² / 0.045)) = exp(-0.056) ≈ 0.946
-- composite = 0.7 * 0.9 + 0.3 * 0.946 = 0.63 + 0.284 = **0.91**
-**Result**: PLATYPUS wins due to rarity bonus (0.91 > 0.56)
 ## Visual Understanding of Gaussian Curves
@@ -164,29 +164,117 @@ Frequency Score
 ```
 **Large target**: Very forgiving, wide acceptance range
 ## Configuration Guide
 ### Environment Variables
-- `DIFFICULTY_WEIGHT` (default: 0.3): Controls balance between similarity and frequency
-- `SIMILARITY_TEMPERATURE` (default: 0.7): Controls randomness in softmax selection
 - `USE_SOFTMAX_SELECTION` (default: true): Enable/disable the entire system
 ### Tuning difficulty_weight
-- **Lower values (0.1-0.2)**: Prioritize semantic relevance over difficulty
-- **Default value (0.3)**: Balanced approach
-- **Higher values (0.4-0.6)**: Stronger difficulty enforcement
-- **Very high values (0.7+)**: Frequency-dominant selection
 ### Example Configurations
 ```bash
 # Conservative: Prioritize semantic quality
-export DIFFICULTY_WEIGHT=0.2
 # Aggressive: Strong difficulty enforcement
-export DIFFICULTY_WEIGHT=0.5
 # Experimental: See pure frequency effects
-export DIFFICULTY_WEIGHT=0.8
 ```
 ## Design Decisions
@@ -232,6 +320,26 @@ export DIFFICULTY_WEIGHT=0.8
 - Verify percentile calculations are working correctly
 - Check that Gaussian curves produce expected frequency_alignment scores
 ---
 *This algorithm represents a modern ML approach to difficulty-aware word selection, replacing simple heuristics with probabilistic, feature-based scoring.*

 ```python
 composite_score = (1 - difficulty_weight) * similarity + difficulty_weight * frequency_alignment
+# Current default values:
+# difficulty_weight = 0.5 (50% frequency influence)
+# Therefore: 50% similarity + 50% frequency alignment
 ```
 ## Frequency Alignment Using Gaussian Distributions
 ## Concrete Examples
+### Scenario: Theme = "animals", difficulty_weight = 0.5
 #### Example 1: Easy Mode
 **CAT** (common word):
 - similarity = 0.8
 - percentile = 0.95 (95th percentile)
 - frequency_alignment = exp(-((0.95 - 0.9)² / 0.02)) = exp(-0.00125) ≈ 0.999
+- composite = 0.5 * 0.8 + 0.5 * 0.999 = 0.40 + 0.50 = **0.90**
 **PLATYPUS** (rare word):
 - similarity = 0.9 (higher semantic relevance)
 - percentile = 0.15 (15th percentile)
 - frequency_alignment = exp(-((0.15 - 0.9)² / 0.02)) = exp(-28.125) ≈ 0.000
+- composite = 0.5 * 0.9 + 0.5 * 0.000 = 0.45 + 0 = **0.45**
+**Result**: CAT wins despite lower similarity (0.90 > 0.45)
 #### Example 2: Hard Mode
 **CAT** (common word):
 - similarity = 0.8
 - percentile = 0.95
 - frequency_alignment = exp(-((0.95 - 0.2)² / 0.045)) = exp(-12.5) ≈ 0.000
+- composite = 0.5 * 0.8 + 0.5 * 0.000 = **0.40**
 **PLATYPUS** (rare word):
 - similarity = 0.9
 - percentile = 0.15
 - frequency_alignment = exp(-((0.15 - 0.2)² / 0.045)) = exp(-0.056) ≈ 0.946
+- composite = 0.5 * 0.9 + 0.5 * 0.946 = 0.45 + 0.473 = **0.92**
+**Result**: PLATYPUS wins due to rarity bonus (0.92 > 0.40)
 ## Visual Understanding of Gaussian Curves
 ```
 **Large target**: Very forgiving, wide acceptance range
+## Complete Parameter Analysis and Pipeline
+### Parameter Categories
+The word selection system uses multiple parameters that work **independently in sequence** without direct overlap:
+#### 1. Input Data Sources (Not Parameters)
+- **similarity**: Semantic similarity from sentence transformer (0-1)
+- **percentile**: Word frequency percentile from WordFreq data (0-1, higher = more common)
+#### 2. Tunable Parameters
+- **difficulty_weight**: Controls balance between similarity and frequency alignment (default: 0.5)
+- **temperature**: Controls randomness in softmax selection (default: 0.2)
+#### 3. Hardcoded Gaussian Parameters (Per Difficulty)
+- **Easy mode**: peak=0.9, variance=0.1
+- **Medium mode**: peak=0.5, variance=0.3, base_score=0.5
+- **Hard mode**: peak=0.2, variance=0.15
+### Processing Pipeline
+The parameters work in a **sequential pipeline** with no redundancy:
+```
+Input Stage:
+similarity (from ML model) ─────┐
+                                 ├─→ composite_score ─→ softmax(temperature) ─→ probabilities ─→ selection
+percentile (from WordFreq) ──→ Gaussian(μ,σ²) ──→ freq_score ─┘
+                                 ↑                    ↑
+                        Hardcoded Parameters    difficulty_weight
+```
+1. **Stage 1**: Gaussian transformation converts `percentile` → `freq_score` using hardcoded (μ, σ²)
+2. **Stage 2**: Linear blending combines `similarity` + `freq_score` → `composite_score` using `difficulty_weight`
+3. **Stage 3**: Temperature scaling applies `composite_score` → `probability_distribution` using `temperature`
+### Parameter Relationships
+#### Independent Operation
+- **No direct overlap**: Each parameter transforms data at different stages
+- **Sequential processing**: Output of one stage becomes input to next
+- **Multiplicative effects**: Parameters amplify/dampen effects rather than competing
+#### Interaction Effects
+1. **difficulty_weight × Gaussian parameters**: Higher difficulty_weight makes Gaussian curves more influential
+2. **composite_score × temperature**: Lower temperature makes composite score differences more decisive
+3. **All parameters together**: Create compound effects on final selection behavior
+### Current Parameter Values (After Recent Optimization)
+```python
+# Updated defaults after fixing inverse selection issue:
+difficulty_weight = 0.5      # Equal weight to similarity and frequency (was 0.3)
+temperature = 0.2            # More deterministic selection (was 0.7)
+# Hardcoded Gaussian parameters remain unchanged:
+easy_mode: μ=0.9, σ=0.1
+medium_mode: μ=0.5, σ=0.3, base=0.5
+hard_mode: μ=0.2, σ=0.15
+```
+### Potential Parameter Optimizations
+#### 1. Make Gaussian Variance Tunable
+Currently hardcoded, could be environment variable:
+```bash
+EASY_VARIANCE=0.1      # How strict easy mode is
+MEDIUM_VARIANCE=0.3    # How flexible medium mode is
+HARD_VARIANCE=0.15     # How strict hard mode is
+```
+#### 2. Derive Gaussian Peaks from Difficulty Weight
+Instead of hardcoded peaks, calculate dynamically:
+```python
+easy_peak = 1.0 - 0.1 * difficulty_weight     # High percentile for easy
+hard_peak = 0.0 + 0.4 * difficulty_weight     # Low percentile for hard
+medium_peak = 0.5                             # Always balanced
+```
+#### 3. Remove Medium Mode Base Score
+The `0.5 + 0.5 * gaussian` formula seems arbitrary - could use pure Gaussian like other modes.
 ## Configuration Guide
 ### Environment Variables
+- `DIFFICULTY_WEIGHT` (default: 0.5): Controls balance between similarity and frequency
+- `SIMILARITY_TEMPERATURE` (default: 0.2): Controls randomness in softmax selection
 - `USE_SOFTMAX_SELECTION` (default: true): Enable/disable the entire system
 ### Tuning difficulty_weight
+- **Lower values (0.1-0.3)**: Prioritize semantic relevance over difficulty
+- **Current default (0.5)**: Equal weight to similarity and frequency alignment
+- **Higher values (0.6-0.8)**: Stronger difficulty enforcement
+- **Very high values (0.9+)**: Frequency-dominant selection
 ### Example Configurations
 ```bash
 # Conservative: Prioritize semantic quality
+export DIFFICULTY_WEIGHT=0.3
+export SIMILARITY_TEMPERATURE=0.2
+# Current optimized settings (after inverse selection fix)
+export DIFFICULTY_WEIGHT=0.5
+export SIMILARITY_TEMPERATURE=0.2
 # Aggressive: Strong difficulty enforcement
+export DIFFICULTY_WEIGHT=0.7
+export SIMILARITY_TEMPERATURE=0.1
 # Experimental: See pure frequency effects
+export DIFFICULTY_WEIGHT=0.9
+export SIMILARITY_TEMPERATURE=0.3
 ```
 ## Design Decisions
 - Verify percentile calculations are working correctly
 - Check that Gaussian curves produce expected frequency_alignment scores
+## Recent Optimization (August 2025)
+### Inverse Selection Problem Fixed
+**Problem**: Despite correct composite scoring, the system was selecting words with low composite scores due to excessive randomness in softmax selection.
+**Symptoms**:
+- Easy mode selected rare words (PALEOECOLOGY, percentile=0.033)
+- Hard mode selected common words (HISTORIAN, percentile=0.936)
+- Composite scores were calculated correctly, but probabilistic selection was too random
+**Solution**: Reduced temperature from 0.7 → 0.2 and increased difficulty_weight from 0.3 → 0.5
+**Results After Fix**:
+- **Easy mode**: Now correctly selects common words (HISTORICALLY, CULTURALLY, PREDECESSOR)
+- **Medium mode**: Good balance of moderate-difficulty words
+- **Hard mode**: Much better rare word selection (HOLISM, TOPICALITY)
+**Key Insight**: The composite scoring algorithm was working correctly - the issue was purely in the final probabilistic selection stage being too random.
 ---
 *This algorithm represents a modern ML approach to difficulty-aware word selection, replacing simple heuristics with probabilistic, feature-based scoring.*

crossword-app/backend-py/src/services/thematic_word_service.py CHANGED Viewed

@@ -283,9 +283,9 @@ class ThematicWordService:
                                              os.getenv("MAX_VOCABULARY_SIZE", "100000"))))
         # Configuration parameters for softmax weighted selection
-        self.similarity_temperature = float(os.getenv("SIMILARITY_TEMPERATURE", "0.7"))
         self.use_softmax_selection = os.getenv("USE_SOFTMAX_SELECTION", "true").lower() == "true"
-        self.difficulty_weight = float(os.getenv("DIFFICULTY_WEIGHT", "0.3"))
         # Core components
         self.vocab_manager = VocabularyManager(str(self.cache_dir), self.vocab_size_limit)
@@ -591,9 +591,6 @@ class ThematicWordService:
         # Traverse top_indices from beginning to get most similar words first
         # Each idx is used to lookup the actual word in self.vocabulary[idx]
         for idx in top_indices:
-            if len(results) >= num_words * 3:  # Get extra candidates for filtering
-                break
             similarity_score = all_similarities[idx]
             word = self.vocabulary[idx]  # Get actual word using vocabulary index
@@ -601,6 +598,10 @@ class ThematicWordService:
             if similarity_score < min_similarity:
                 break  # All remaining words will also be below threshold since array is sorted
             # Skip input words themselves
             if word.lower() in input_words_set:
                 continue
@@ -612,23 +613,12 @@ class ThematicWordService:
             results.append((word, similarity_score, word_tier))
-        # Select words using either softmax weighted selection or traditional sorting
-        if self.use_softmax_selection and len(results) > num_words:
-            logger.info(f"🎲 Using difficulty-aware softmax selection (temperature: {self.similarity_temperature})")
-            # Convert to dict format for softmax selection
-            candidates = [{"word": word, "similarity": sim, "tier": tier} for word, sim, tier in results]
-            selected_candidates = self._softmax_weighted_selection(candidates, num_words, difficulty=difficulty)
-            # Convert back to tuple format
-            final_results = [(cand["word"], cand["similarity"], cand["tier"]) for cand in selected_candidates]
-            # Sort final results by similarity for consistent output format
-            final_results.sort(key=lambda x: x[1], reverse=True)
-        else:
-            logger.info("📊 Using traditional similarity-based sorting")
-            # Sort by similarity and return top results (original logic)
-            results.sort(key=lambda x: x[1], reverse=True)
-            final_results = results[:num_words]
-        logger.info(f"✅ Generated {len(final_results)} thematic words")
         return final_results
     def _compute_theme_vector(self, inputs: List[str]) -> np.ndarray:
@@ -800,14 +790,31 @@ class ThematicWordService:
         # Compute composite scores (similarity + difficulty alignment)
         composite_scores = []
         for word_data in candidates:
             similarity = word_data['similarity']
             word = word_data['word']
             composite = self._compute_composite_score(similarity, word, difficulty)
             composite_scores.append(composite)
         composite_scores = np.array(composite_scores)
         # Compute softmax probabilities using composite scores
         probabilities = self._softmax_with_temperature(composite_scores, temperature)
@@ -824,6 +831,16 @@ class ThematicWordService:
         logger.info(f"🎲 Composite softmax selection (T={temperature:.2f}, difficulty={difficulty}): {len(selected_candidates)} from {len(candidates)} candidates")
         return selected_candidates
     def _detect_multiple_themes(self, inputs: List[str], max_themes: int = 3) -> List[np.ndarray]:
@@ -1056,14 +1073,8 @@ class ThematicWordService:
         logger.info(f"🎯 Finding words for crossword - topics: {topics}, difficulty: {difficulty}{sentence_info}, mode: {theme_mode}")
         logger.info(f"📊 Generating {generation_target} candidates to select best {requested_words} words after clue filtering")
-        # Map difficulty to similarity thresholds
-        difficulty_similarity_map = {
-            "easy": 0.4,
-            "medium": 0.3,
-            "hard": 0.25
-        }
-        min_similarity = difficulty_similarity_map.get(difficulty, 0.3)
         # Build input list for thematic word generation
         input_list = topics.copy()  # Start with topics: ["Art"]
@@ -1076,7 +1087,7 @@ class ThematicWordService:
         # a result is a tuple of  (word, similarity, word_tier)
         raw_results = self.generate_thematic_words(
             input_list,
-            num_words=150,  # Get extra for difficulty filtering
             min_similarity=min_similarity,
             multi_theme=multi_theme,
             difficulty=difficulty

                                              os.getenv("MAX_VOCABULARY_SIZE", "100000"))))
         # Configuration parameters for softmax weighted selection
+        self.similarity_temperature = float(os.getenv("SIMILARITY_TEMPERATURE", "0.2"))
         self.use_softmax_selection = os.getenv("USE_SOFTMAX_SELECTION", "true").lower() == "true"
+        self.difficulty_weight = float(os.getenv("DIFFICULTY_WEIGHT", "0.5"))
         # Core components
         self.vocab_manager = VocabularyManager(str(self.cache_dir), self.vocab_size_limit)
         # Traverse top_indices from beginning to get most similar words first
         # Each idx is used to lookup the actual word in self.vocabulary[idx]
         for idx in top_indices:
             similarity_score = all_similarities[idx]
             word = self.vocabulary[idx]  # Get actual word using vocabulary index
             if similarity_score < min_similarity:
                 break  # All remaining words will also be below threshold since array is sorted
+            # Stop when we have enough candidates
+            if len(results) >= num_words:
+                break
             # Skip input words themselves
             if word.lower() in input_words_set:
                 continue
             results.append((word, similarity_score, word_tier))
+        # Always return candidates sorted by similarity (deterministic)
+        # Selection logic is handled by find_words_for_crossword
+        results.sort(key=lambda x: x[1], reverse=True)
+        final_results = results[:num_words]
+        logger.info(f"✅ Generated {len(final_results)} thematic words (deterministic)")
         return final_results
     def _compute_theme_vector(self, inputs: List[str]) -> np.ndarray:
         # Compute composite scores (similarity + difficulty alignment)
         composite_scores = []
+        debug_info = []
         for word_data in candidates:
             similarity = word_data['similarity']
             word = word_data['word']
             composite = self._compute_composite_score(similarity, word, difficulty)
             composite_scores.append(composite)
+            # Debug info for first few candidates
+            if len(debug_info) < 10:
+                percentile = self.word_percentiles.get(word.lower(), 0.0)
+                debug_info.append({
+                    'word': word,
+                    'similarity': similarity,
+                    'percentile': percentile,
+                    'composite': composite,
+                    'tier': word_data.get('tier', 'unknown')
+                })
         composite_scores = np.array(composite_scores)
+        # Log debug information
+        logger.info(f"🔍 Debug: Top 10 composite scores for difficulty={difficulty}:")
+        for info in debug_info:
+            logger.info(f"   {info['word']:<15} sim:{info['similarity']:.3f} perc:{info['percentile']:.3f} comp:{info['composite']:.3f} ({info['tier']})")
         # Compute softmax probabilities using composite scores
         probabilities = self._softmax_with_temperature(composite_scores, temperature)
         logger.info(f"🎲 Composite softmax selection (T={temperature:.2f}, difficulty={difficulty}): {len(selected_candidates)} from {len(candidates)} candidates")
+        # Debug: Log selected words with their properties
+        logger.info(f"🎯 Selected words for difficulty={difficulty}:")
+        for word_data in selected_candidates[:10]:  # Show first 10
+            word = word_data['word']
+            similarity = word_data['similarity']
+            percentile = self.word_percentiles.get(word.lower(), 0.0)
+            composite = self._compute_composite_score(similarity, word, difficulty)
+            tier = word_data.get('tier', 'unknown')
+            logger.info(f"   {word:<15} sim:{similarity:.3f} perc:{percentile:.3f} comp:{composite:.3f} ({tier})")
         return selected_candidates
     def _detect_multiple_themes(self, inputs: List[str], max_themes: int = 3) -> List[np.ndarray]:
         logger.info(f"🎯 Finding words for crossword - topics: {topics}, difficulty: {difficulty}{sentence_info}, mode: {theme_mode}")
         logger.info(f"📊 Generating {generation_target} candidates to select best {requested_words} words after clue filtering")
+        # Use consistent low threshold for all difficulties - let composite scoring handle difficulty
+        min_similarity = 0.25
         # Build input list for thematic word generation
         input_list = topics.copy()  # Start with topics: ["Art"]
         # a result is a tuple of  (word, similarity, word_tier)
         raw_results = self.generate_thematic_words(
             input_list,
+            num_words=400,  # Larger pool for composite scoring to work with
             min_similarity=min_similarity,
             multi_theme=multi_theme,
             difficulty=difficulty