fix: Optimize word selection parameters to fix inverse difficulty selection
Browse files- Reduce temperature from 0.7 to 0.2 for more deterministic selection
- Increase difficulty_weight from 0.3 to 0.5 for stronger frequency influence
- Fix issue where easy mode selected rare words and hard mode selected common words
- Update documentation with parameter analysis and optimization results
Signed-off-by: Vimal Kumar <vimal78@gmail.com>
crossword-app/backend-py/docs/composite_scoring_algorithm.md
CHANGED
|
@@ -14,9 +14,9 @@ This creates smooth, probabilistic selection that naturally favors appropriate w
|
|
| 14 |
```python
|
| 15 |
composite_score = (1 - difficulty_weight) * similarity + difficulty_weight * frequency_alignment
|
| 16 |
|
| 17 |
-
#
|
| 18 |
-
# difficulty_weight = 0.
|
| 19 |
-
# Therefore:
|
| 20 |
```
|
| 21 |
|
| 22 |
## Frequency Alignment Using Gaussian Distributions
|
|
@@ -92,37 +92,37 @@ composite = (1 - difficulty_weight) * similarity + difficulty_weight * frequency
|
|
| 92 |
|
| 93 |
## Concrete Examples
|
| 94 |
|
| 95 |
-
### Scenario: Theme = "animals", difficulty_weight = 0.
|
| 96 |
|
| 97 |
#### Example 1: Easy Mode
|
| 98 |
**CAT** (common word):
|
| 99 |
- similarity = 0.8
|
| 100 |
- percentile = 0.95 (95th percentile)
|
| 101 |
- frequency_alignment = exp(-((0.95 - 0.9)Β² / 0.02)) = exp(-0.00125) β 0.999
|
| 102 |
-
- composite = 0.
|
| 103 |
|
| 104 |
**PLATYPUS** (rare word):
|
| 105 |
- similarity = 0.9 (higher semantic relevance)
|
| 106 |
- percentile = 0.15 (15th percentile)
|
| 107 |
- frequency_alignment = exp(-((0.15 - 0.9)Β² / 0.02)) = exp(-28.125) β 0.000
|
| 108 |
-
- composite = 0.
|
| 109 |
|
| 110 |
-
**Result**: CAT wins despite lower similarity (0.
|
| 111 |
|
| 112 |
#### Example 2: Hard Mode
|
| 113 |
**CAT** (common word):
|
| 114 |
- similarity = 0.8
|
| 115 |
- percentile = 0.95
|
| 116 |
- frequency_alignment = exp(-((0.95 - 0.2)Β² / 0.045)) = exp(-12.5) β 0.000
|
| 117 |
-
- composite = 0.
|
| 118 |
|
| 119 |
**PLATYPUS** (rare word):
|
| 120 |
- similarity = 0.9
|
| 121 |
- percentile = 0.15
|
| 122 |
- frequency_alignment = exp(-((0.15 - 0.2)Β² / 0.045)) = exp(-0.056) β 0.946
|
| 123 |
-
- composite = 0.
|
| 124 |
|
| 125 |
-
**Result**: PLATYPUS wins due to rarity bonus (0.
|
| 126 |
|
| 127 |
## Visual Understanding of Gaussian Curves
|
| 128 |
|
|
@@ -164,29 +164,117 @@ Frequency Score
|
|
| 164 |
```
|
| 165 |
**Large target**: Very forgiving, wide acceptance range
|
| 166 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
## Configuration Guide
|
| 168 |
|
| 169 |
### Environment Variables
|
| 170 |
-
- `DIFFICULTY_WEIGHT` (default: 0.
|
| 171 |
-
- `SIMILARITY_TEMPERATURE` (default: 0.
|
| 172 |
- `USE_SOFTMAX_SELECTION` (default: true): Enable/disable the entire system
|
| 173 |
|
| 174 |
### Tuning difficulty_weight
|
| 175 |
-
- **Lower values (0.1-0.
|
| 176 |
-
- **
|
| 177 |
-
- **Higher values (0.
|
| 178 |
-
- **Very high values (0.
|
| 179 |
|
| 180 |
### Example Configurations
|
| 181 |
```bash
|
| 182 |
# Conservative: Prioritize semantic quality
|
| 183 |
-
export DIFFICULTY_WEIGHT=0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
|
| 185 |
# Aggressive: Strong difficulty enforcement
|
| 186 |
-
export DIFFICULTY_WEIGHT=0.
|
|
|
|
| 187 |
|
| 188 |
# Experimental: See pure frequency effects
|
| 189 |
-
export DIFFICULTY_WEIGHT=0.
|
|
|
|
| 190 |
```
|
| 191 |
|
| 192 |
## Design Decisions
|
|
@@ -232,6 +320,26 @@ export DIFFICULTY_WEIGHT=0.8
|
|
| 232 |
- Verify percentile calculations are working correctly
|
| 233 |
- Check that Gaussian curves produce expected frequency_alignment scores
|
| 234 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 235 |
---
|
| 236 |
|
| 237 |
*This algorithm represents a modern ML approach to difficulty-aware word selection, replacing simple heuristics with probabilistic, feature-based scoring.*
|
|
|
|
| 14 |
```python
|
| 15 |
composite_score = (1 - difficulty_weight) * similarity + difficulty_weight * frequency_alignment
|
| 16 |
|
| 17 |
+
# Current default values:
|
| 18 |
+
# difficulty_weight = 0.5 (50% frequency influence)
|
| 19 |
+
# Therefore: 50% similarity + 50% frequency alignment
|
| 20 |
```
|
| 21 |
|
| 22 |
## Frequency Alignment Using Gaussian Distributions
|
|
|
|
| 92 |
|
| 93 |
## Concrete Examples
|
| 94 |
|
| 95 |
+
### Scenario: Theme = "animals", difficulty_weight = 0.5
|
| 96 |
|
| 97 |
#### Example 1: Easy Mode
|
| 98 |
**CAT** (common word):
|
| 99 |
- similarity = 0.8
|
| 100 |
- percentile = 0.95 (95th percentile)
|
| 101 |
- frequency_alignment = exp(-((0.95 - 0.9)Β² / 0.02)) = exp(-0.00125) β 0.999
|
| 102 |
+
- composite = 0.5 * 0.8 + 0.5 * 0.999 = 0.40 + 0.50 = **0.90**
|
| 103 |
|
| 104 |
**PLATYPUS** (rare word):
|
| 105 |
- similarity = 0.9 (higher semantic relevance)
|
| 106 |
- percentile = 0.15 (15th percentile)
|
| 107 |
- frequency_alignment = exp(-((0.15 - 0.9)Β² / 0.02)) = exp(-28.125) β 0.000
|
| 108 |
+
- composite = 0.5 * 0.9 + 0.5 * 0.000 = 0.45 + 0 = **0.45**
|
| 109 |
|
| 110 |
+
**Result**: CAT wins despite lower similarity (0.90 > 0.45)
|
| 111 |
|
| 112 |
#### Example 2: Hard Mode
|
| 113 |
**CAT** (common word):
|
| 114 |
- similarity = 0.8
|
| 115 |
- percentile = 0.95
|
| 116 |
- frequency_alignment = exp(-((0.95 - 0.2)Β² / 0.045)) = exp(-12.5) β 0.000
|
| 117 |
+
- composite = 0.5 * 0.8 + 0.5 * 0.000 = **0.40**
|
| 118 |
|
| 119 |
**PLATYPUS** (rare word):
|
| 120 |
- similarity = 0.9
|
| 121 |
- percentile = 0.15
|
| 122 |
- frequency_alignment = exp(-((0.15 - 0.2)Β² / 0.045)) = exp(-0.056) β 0.946
|
| 123 |
+
- composite = 0.5 * 0.9 + 0.5 * 0.946 = 0.45 + 0.473 = **0.92**
|
| 124 |
|
| 125 |
+
**Result**: PLATYPUS wins due to rarity bonus (0.92 > 0.40)
|
| 126 |
|
| 127 |
## Visual Understanding of Gaussian Curves
|
| 128 |
|
|
|
|
| 164 |
```
|
| 165 |
**Large target**: Very forgiving, wide acceptance range
|
| 166 |
|
| 167 |
+
## Complete Parameter Analysis and Pipeline
|
| 168 |
+
|
| 169 |
+
### Parameter Categories
|
| 170 |
+
|
| 171 |
+
The word selection system uses multiple parameters that work **independently in sequence** without direct overlap:
|
| 172 |
+
|
| 173 |
+
#### 1. Input Data Sources (Not Parameters)
|
| 174 |
+
- **similarity**: Semantic similarity from sentence transformer (0-1)
|
| 175 |
+
- **percentile**: Word frequency percentile from WordFreq data (0-1, higher = more common)
|
| 176 |
+
|
| 177 |
+
#### 2. Tunable Parameters
|
| 178 |
+
- **difficulty_weight**: Controls balance between similarity and frequency alignment (default: 0.5)
|
| 179 |
+
- **temperature**: Controls randomness in softmax selection (default: 0.2)
|
| 180 |
+
|
| 181 |
+
#### 3. Hardcoded Gaussian Parameters (Per Difficulty)
|
| 182 |
+
- **Easy mode**: peak=0.9, variance=0.1
|
| 183 |
+
- **Medium mode**: peak=0.5, variance=0.3, base_score=0.5
|
| 184 |
+
- **Hard mode**: peak=0.2, variance=0.15
|
| 185 |
+
|
| 186 |
+
### Processing Pipeline
|
| 187 |
+
|
| 188 |
+
The parameters work in a **sequential pipeline** with no redundancy:
|
| 189 |
+
|
| 190 |
+
```
|
| 191 |
+
Input Stage:
|
| 192 |
+
similarity (from ML model) ββββββ
|
| 193 |
+
βββ composite_score ββ softmax(temperature) ββ probabilities ββ selection
|
| 194 |
+
percentile (from WordFreq) βββ Gaussian(ΞΌ,ΟΒ²) βββ freq_score ββ
|
| 195 |
+
β β
|
| 196 |
+
Hardcoded Parameters difficulty_weight
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
1. **Stage 1**: Gaussian transformation converts `percentile` β `freq_score` using hardcoded (ΞΌ, ΟΒ²)
|
| 200 |
+
2. **Stage 2**: Linear blending combines `similarity` + `freq_score` β `composite_score` using `difficulty_weight`
|
| 201 |
+
3. **Stage 3**: Temperature scaling applies `composite_score` β `probability_distribution` using `temperature`
|
| 202 |
+
|
| 203 |
+
### Parameter Relationships
|
| 204 |
+
|
| 205 |
+
#### Independent Operation
|
| 206 |
+
- **No direct overlap**: Each parameter transforms data at different stages
|
| 207 |
+
- **Sequential processing**: Output of one stage becomes input to next
|
| 208 |
+
- **Multiplicative effects**: Parameters amplify/dampen effects rather than competing
|
| 209 |
+
|
| 210 |
+
#### Interaction Effects
|
| 211 |
+
1. **difficulty_weight Γ Gaussian parameters**: Higher difficulty_weight makes Gaussian curves more influential
|
| 212 |
+
2. **composite_score Γ temperature**: Lower temperature makes composite score differences more decisive
|
| 213 |
+
3. **All parameters together**: Create compound effects on final selection behavior
|
| 214 |
+
|
| 215 |
+
### Current Parameter Values (After Recent Optimization)
|
| 216 |
+
```python
|
| 217 |
+
# Updated defaults after fixing inverse selection issue:
|
| 218 |
+
difficulty_weight = 0.5 # Equal weight to similarity and frequency (was 0.3)
|
| 219 |
+
temperature = 0.2 # More deterministic selection (was 0.7)
|
| 220 |
+
|
| 221 |
+
# Hardcoded Gaussian parameters remain unchanged:
|
| 222 |
+
easy_mode: ΞΌ=0.9, Ο=0.1
|
| 223 |
+
medium_mode: ΞΌ=0.5, Ο=0.3, base=0.5
|
| 224 |
+
hard_mode: ΞΌ=0.2, Ο=0.15
|
| 225 |
+
```
|
| 226 |
+
|
| 227 |
+
### Potential Parameter Optimizations
|
| 228 |
+
|
| 229 |
+
#### 1. Make Gaussian Variance Tunable
|
| 230 |
+
Currently hardcoded, could be environment variable:
|
| 231 |
+
```bash
|
| 232 |
+
EASY_VARIANCE=0.1 # How strict easy mode is
|
| 233 |
+
MEDIUM_VARIANCE=0.3 # How flexible medium mode is
|
| 234 |
+
HARD_VARIANCE=0.15 # How strict hard mode is
|
| 235 |
+
```
|
| 236 |
+
|
| 237 |
+
#### 2. Derive Gaussian Peaks from Difficulty Weight
|
| 238 |
+
Instead of hardcoded peaks, calculate dynamically:
|
| 239 |
+
```python
|
| 240 |
+
easy_peak = 1.0 - 0.1 * difficulty_weight # High percentile for easy
|
| 241 |
+
hard_peak = 0.0 + 0.4 * difficulty_weight # Low percentile for hard
|
| 242 |
+
medium_peak = 0.5 # Always balanced
|
| 243 |
+
```
|
| 244 |
+
|
| 245 |
+
#### 3. Remove Medium Mode Base Score
|
| 246 |
+
The `0.5 + 0.5 * gaussian` formula seems arbitrary - could use pure Gaussian like other modes.
|
| 247 |
+
|
| 248 |
## Configuration Guide
|
| 249 |
|
| 250 |
### Environment Variables
|
| 251 |
+
- `DIFFICULTY_WEIGHT` (default: 0.5): Controls balance between similarity and frequency
|
| 252 |
+
- `SIMILARITY_TEMPERATURE` (default: 0.2): Controls randomness in softmax selection
|
| 253 |
- `USE_SOFTMAX_SELECTION` (default: true): Enable/disable the entire system
|
| 254 |
|
| 255 |
### Tuning difficulty_weight
|
| 256 |
+
- **Lower values (0.1-0.3)**: Prioritize semantic relevance over difficulty
|
| 257 |
+
- **Current default (0.5)**: Equal weight to similarity and frequency alignment
|
| 258 |
+
- **Higher values (0.6-0.8)**: Stronger difficulty enforcement
|
| 259 |
+
- **Very high values (0.9+)**: Frequency-dominant selection
|
| 260 |
|
| 261 |
### Example Configurations
|
| 262 |
```bash
|
| 263 |
# Conservative: Prioritize semantic quality
|
| 264 |
+
export DIFFICULTY_WEIGHT=0.3
|
| 265 |
+
export SIMILARITY_TEMPERATURE=0.2
|
| 266 |
+
|
| 267 |
+
# Current optimized settings (after inverse selection fix)
|
| 268 |
+
export DIFFICULTY_WEIGHT=0.5
|
| 269 |
+
export SIMILARITY_TEMPERATURE=0.2
|
| 270 |
|
| 271 |
# Aggressive: Strong difficulty enforcement
|
| 272 |
+
export DIFFICULTY_WEIGHT=0.7
|
| 273 |
+
export SIMILARITY_TEMPERATURE=0.1
|
| 274 |
|
| 275 |
# Experimental: See pure frequency effects
|
| 276 |
+
export DIFFICULTY_WEIGHT=0.9
|
| 277 |
+
export SIMILARITY_TEMPERATURE=0.3
|
| 278 |
```
|
| 279 |
|
| 280 |
## Design Decisions
|
|
|
|
| 320 |
- Verify percentile calculations are working correctly
|
| 321 |
- Check that Gaussian curves produce expected frequency_alignment scores
|
| 322 |
|
| 323 |
+
## Recent Optimization (August 2025)
|
| 324 |
+
|
| 325 |
+
### Inverse Selection Problem Fixed
|
| 326 |
+
|
| 327 |
+
**Problem**: Despite correct composite scoring, the system was selecting words with low composite scores due to excessive randomness in softmax selection.
|
| 328 |
+
|
| 329 |
+
**Symptoms**:
|
| 330 |
+
- Easy mode selected rare words (PALEOECOLOGY, percentile=0.033)
|
| 331 |
+
- Hard mode selected common words (HISTORIAN, percentile=0.936)
|
| 332 |
+
- Composite scores were calculated correctly, but probabilistic selection was too random
|
| 333 |
+
|
| 334 |
+
**Solution**: Reduced temperature from 0.7 β 0.2 and increased difficulty_weight from 0.3 β 0.5
|
| 335 |
+
|
| 336 |
+
**Results After Fix**:
|
| 337 |
+
- **Easy mode**: Now correctly selects common words (HISTORICALLY, CULTURALLY, PREDECESSOR)
|
| 338 |
+
- **Medium mode**: Good balance of moderate-difficulty words
|
| 339 |
+
- **Hard mode**: Much better rare word selection (HOLISM, TOPICALITY)
|
| 340 |
+
|
| 341 |
+
**Key Insight**: The composite scoring algorithm was working correctly - the issue was purely in the final probabilistic selection stage being too random.
|
| 342 |
+
|
| 343 |
---
|
| 344 |
|
| 345 |
*This algorithm represents a modern ML approach to difficulty-aware word selection, replacing simple heuristics with probabilistic, feature-based scoring.*
|
crossword-app/backend-py/src/services/thematic_word_service.py
CHANGED
|
@@ -283,9 +283,9 @@ class ThematicWordService:
|
|
| 283 |
os.getenv("MAX_VOCABULARY_SIZE", "100000"))))
|
| 284 |
|
| 285 |
# Configuration parameters for softmax weighted selection
|
| 286 |
-
self.similarity_temperature = float(os.getenv("SIMILARITY_TEMPERATURE", "0.
|
| 287 |
self.use_softmax_selection = os.getenv("USE_SOFTMAX_SELECTION", "true").lower() == "true"
|
| 288 |
-
self.difficulty_weight = float(os.getenv("DIFFICULTY_WEIGHT", "0.
|
| 289 |
|
| 290 |
# Core components
|
| 291 |
self.vocab_manager = VocabularyManager(str(self.cache_dir), self.vocab_size_limit)
|
|
@@ -591,9 +591,6 @@ class ThematicWordService:
|
|
| 591 |
# Traverse top_indices from beginning to get most similar words first
|
| 592 |
# Each idx is used to lookup the actual word in self.vocabulary[idx]
|
| 593 |
for idx in top_indices:
|
| 594 |
-
if len(results) >= num_words * 3: # Get extra candidates for filtering
|
| 595 |
-
break
|
| 596 |
-
|
| 597 |
similarity_score = all_similarities[idx]
|
| 598 |
word = self.vocabulary[idx] # Get actual word using vocabulary index
|
| 599 |
|
|
@@ -601,6 +598,10 @@ class ThematicWordService:
|
|
| 601 |
if similarity_score < min_similarity:
|
| 602 |
break # All remaining words will also be below threshold since array is sorted
|
| 603 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 604 |
# Skip input words themselves
|
| 605 |
if word.lower() in input_words_set:
|
| 606 |
continue
|
|
@@ -612,23 +613,12 @@ class ThematicWordService:
|
|
| 612 |
|
| 613 |
results.append((word, similarity_score, word_tier))
|
| 614 |
|
| 615 |
-
#
|
| 616 |
-
|
| 617 |
-
|
| 618 |
-
|
| 619 |
-
candidates = [{"word": word, "similarity": sim, "tier": tier} for word, sim, tier in results]
|
| 620 |
-
selected_candidates = self._softmax_weighted_selection(candidates, num_words, difficulty=difficulty)
|
| 621 |
-
# Convert back to tuple format
|
| 622 |
-
final_results = [(cand["word"], cand["similarity"], cand["tier"]) for cand in selected_candidates]
|
| 623 |
-
# Sort final results by similarity for consistent output format
|
| 624 |
-
final_results.sort(key=lambda x: x[1], reverse=True)
|
| 625 |
-
else:
|
| 626 |
-
logger.info("π Using traditional similarity-based sorting")
|
| 627 |
-
# Sort by similarity and return top results (original logic)
|
| 628 |
-
results.sort(key=lambda x: x[1], reverse=True)
|
| 629 |
-
final_results = results[:num_words]
|
| 630 |
|
| 631 |
-
logger.info(f"β
Generated {len(final_results)} thematic words")
|
| 632 |
return final_results
|
| 633 |
|
| 634 |
def _compute_theme_vector(self, inputs: List[str]) -> np.ndarray:
|
|
@@ -800,14 +790,31 @@ class ThematicWordService:
|
|
| 800 |
|
| 801 |
# Compute composite scores (similarity + difficulty alignment)
|
| 802 |
composite_scores = []
|
|
|
|
| 803 |
for word_data in candidates:
|
| 804 |
similarity = word_data['similarity']
|
| 805 |
word = word_data['word']
|
| 806 |
composite = self._compute_composite_score(similarity, word, difficulty)
|
| 807 |
composite_scores.append(composite)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 808 |
|
| 809 |
composite_scores = np.array(composite_scores)
|
| 810 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 811 |
# Compute softmax probabilities using composite scores
|
| 812 |
probabilities = self._softmax_with_temperature(composite_scores, temperature)
|
| 813 |
|
|
@@ -824,6 +831,16 @@ class ThematicWordService:
|
|
| 824 |
|
| 825 |
logger.info(f"π² Composite softmax selection (T={temperature:.2f}, difficulty={difficulty}): {len(selected_candidates)} from {len(candidates)} candidates")
|
| 826 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 827 |
return selected_candidates
|
| 828 |
|
| 829 |
def _detect_multiple_themes(self, inputs: List[str], max_themes: int = 3) -> List[np.ndarray]:
|
|
@@ -1056,14 +1073,8 @@ class ThematicWordService:
|
|
| 1056 |
logger.info(f"π― Finding words for crossword - topics: {topics}, difficulty: {difficulty}{sentence_info}, mode: {theme_mode}")
|
| 1057 |
logger.info(f"π Generating {generation_target} candidates to select best {requested_words} words after clue filtering")
|
| 1058 |
|
| 1059 |
-
#
|
| 1060 |
-
|
| 1061 |
-
"easy": 0.4,
|
| 1062 |
-
"medium": 0.3,
|
| 1063 |
-
"hard": 0.25
|
| 1064 |
-
}
|
| 1065 |
-
|
| 1066 |
-
min_similarity = difficulty_similarity_map.get(difficulty, 0.3)
|
| 1067 |
|
| 1068 |
# Build input list for thematic word generation
|
| 1069 |
input_list = topics.copy() # Start with topics: ["Art"]
|
|
@@ -1076,7 +1087,7 @@ class ThematicWordService:
|
|
| 1076 |
# a result is a tuple of (word, similarity, word_tier)
|
| 1077 |
raw_results = self.generate_thematic_words(
|
| 1078 |
input_list,
|
| 1079 |
-
num_words=
|
| 1080 |
min_similarity=min_similarity,
|
| 1081 |
multi_theme=multi_theme,
|
| 1082 |
difficulty=difficulty
|
|
|
|
| 283 |
os.getenv("MAX_VOCABULARY_SIZE", "100000"))))
|
| 284 |
|
| 285 |
# Configuration parameters for softmax weighted selection
|
| 286 |
+
self.similarity_temperature = float(os.getenv("SIMILARITY_TEMPERATURE", "0.2"))
|
| 287 |
self.use_softmax_selection = os.getenv("USE_SOFTMAX_SELECTION", "true").lower() == "true"
|
| 288 |
+
self.difficulty_weight = float(os.getenv("DIFFICULTY_WEIGHT", "0.5"))
|
| 289 |
|
| 290 |
# Core components
|
| 291 |
self.vocab_manager = VocabularyManager(str(self.cache_dir), self.vocab_size_limit)
|
|
|
|
| 591 |
# Traverse top_indices from beginning to get most similar words first
|
| 592 |
# Each idx is used to lookup the actual word in self.vocabulary[idx]
|
| 593 |
for idx in top_indices:
|
|
|
|
|
|
|
|
|
|
| 594 |
similarity_score = all_similarities[idx]
|
| 595 |
word = self.vocabulary[idx] # Get actual word using vocabulary index
|
| 596 |
|
|
|
|
| 598 |
if similarity_score < min_similarity:
|
| 599 |
break # All remaining words will also be below threshold since array is sorted
|
| 600 |
|
| 601 |
+
# Stop when we have enough candidates
|
| 602 |
+
if len(results) >= num_words:
|
| 603 |
+
break
|
| 604 |
+
|
| 605 |
# Skip input words themselves
|
| 606 |
if word.lower() in input_words_set:
|
| 607 |
continue
|
|
|
|
| 613 |
|
| 614 |
results.append((word, similarity_score, word_tier))
|
| 615 |
|
| 616 |
+
# Always return candidates sorted by similarity (deterministic)
|
| 617 |
+
# Selection logic is handled by find_words_for_crossword
|
| 618 |
+
results.sort(key=lambda x: x[1], reverse=True)
|
| 619 |
+
final_results = results[:num_words]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 620 |
|
| 621 |
+
logger.info(f"β
Generated {len(final_results)} thematic words (deterministic)")
|
| 622 |
return final_results
|
| 623 |
|
| 624 |
def _compute_theme_vector(self, inputs: List[str]) -> np.ndarray:
|
|
|
|
| 790 |
|
| 791 |
# Compute composite scores (similarity + difficulty alignment)
|
| 792 |
composite_scores = []
|
| 793 |
+
debug_info = []
|
| 794 |
for word_data in candidates:
|
| 795 |
similarity = word_data['similarity']
|
| 796 |
word = word_data['word']
|
| 797 |
composite = self._compute_composite_score(similarity, word, difficulty)
|
| 798 |
composite_scores.append(composite)
|
| 799 |
+
|
| 800 |
+
# Debug info for first few candidates
|
| 801 |
+
if len(debug_info) < 10:
|
| 802 |
+
percentile = self.word_percentiles.get(word.lower(), 0.0)
|
| 803 |
+
debug_info.append({
|
| 804 |
+
'word': word,
|
| 805 |
+
'similarity': similarity,
|
| 806 |
+
'percentile': percentile,
|
| 807 |
+
'composite': composite,
|
| 808 |
+
'tier': word_data.get('tier', 'unknown')
|
| 809 |
+
})
|
| 810 |
|
| 811 |
composite_scores = np.array(composite_scores)
|
| 812 |
|
| 813 |
+
# Log debug information
|
| 814 |
+
logger.info(f"π Debug: Top 10 composite scores for difficulty={difficulty}:")
|
| 815 |
+
for info in debug_info:
|
| 816 |
+
logger.info(f" {info['word']:<15} sim:{info['similarity']:.3f} perc:{info['percentile']:.3f} comp:{info['composite']:.3f} ({info['tier']})")
|
| 817 |
+
|
| 818 |
# Compute softmax probabilities using composite scores
|
| 819 |
probabilities = self._softmax_with_temperature(composite_scores, temperature)
|
| 820 |
|
|
|
|
| 831 |
|
| 832 |
logger.info(f"π² Composite softmax selection (T={temperature:.2f}, difficulty={difficulty}): {len(selected_candidates)} from {len(candidates)} candidates")
|
| 833 |
|
| 834 |
+
# Debug: Log selected words with their properties
|
| 835 |
+
logger.info(f"π― Selected words for difficulty={difficulty}:")
|
| 836 |
+
for word_data in selected_candidates[:10]: # Show first 10
|
| 837 |
+
word = word_data['word']
|
| 838 |
+
similarity = word_data['similarity']
|
| 839 |
+
percentile = self.word_percentiles.get(word.lower(), 0.0)
|
| 840 |
+
composite = self._compute_composite_score(similarity, word, difficulty)
|
| 841 |
+
tier = word_data.get('tier', 'unknown')
|
| 842 |
+
logger.info(f" {word:<15} sim:{similarity:.3f} perc:{percentile:.3f} comp:{composite:.3f} ({tier})")
|
| 843 |
+
|
| 844 |
return selected_candidates
|
| 845 |
|
| 846 |
def _detect_multiple_themes(self, inputs: List[str], max_themes: int = 3) -> List[np.ndarray]:
|
|
|
|
| 1073 |
logger.info(f"π― Finding words for crossword - topics: {topics}, difficulty: {difficulty}{sentence_info}, mode: {theme_mode}")
|
| 1074 |
logger.info(f"π Generating {generation_target} candidates to select best {requested_words} words after clue filtering")
|
| 1075 |
|
| 1076 |
+
# Use consistent low threshold for all difficulties - let composite scoring handle difficulty
|
| 1077 |
+
min_similarity = 0.25
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1078 |
|
| 1079 |
# Build input list for thematic word generation
|
| 1080 |
input_list = topics.copy() # Start with topics: ["Art"]
|
|
|
|
| 1087 |
# a result is a tuple of (word, similarity, word_tier)
|
| 1088 |
raw_results = self.generate_thematic_words(
|
| 1089 |
input_list,
|
| 1090 |
+
num_words=400, # Larger pool for composite scoring to work with
|
| 1091 |
min_similarity=min_similarity,
|
| 1092 |
multi_theme=multi_theme,
|
| 1093 |
difficulty=difficulty
|