feat: implement distribution normalization with default disabled
Browse files- Add distribution normalization to ensure consistent difficulty across topics
- Support three methods: similarity_range, composite_zscore, percentile_recentering
- Set default to disabled based on analysis showing natural semantic relationships are preferable
- Add comprehensive analysis documentation and test suite
Signed-off-by: Vimal Kumar <vimal78@gmail.com>
crossword-app/backend-py/docs/distribution_normalization_analysis.md
ADDED
|
@@ -0,0 +1,176 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Distribution Normalization Analysis
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
|
| 5 |
+
Distribution normalization is a feature implemented to ensure consistent difficulty levels across different topics in the crossword generator. This document analyzes the trade-offs between normalized and non-normalized approaches and provides recommendations.
|
| 6 |
+
|
| 7 |
+
## The Problem
|
| 8 |
+
|
| 9 |
+
The original question was: *"Should we normalize the distribution before display? Perhaps the distribution will be centered at the same position for a difficulty level irrespective of topic."*
|
| 10 |
+
|
| 11 |
+
Different topics naturally have different semantic similarity ranges:
|
| 12 |
+
- **"Animals"**: Rich vocabulary, similarities often range 0.4-0.9
|
| 13 |
+
- **"Philosophy"**: Abstract concepts, similarities might range 0.1-0.6
|
| 14 |
+
- **"Technology"**: Mixed range, similarities around 0.2-0.8
|
| 15 |
+
|
| 16 |
+
This led to perceived "inconsistent difficulty" where "Easy Animals" felt easier than "Easy Philosophy" crosswords.
|
| 17 |
+
|
| 18 |
+
## Current Implementation
|
| 19 |
+
|
| 20 |
+
### Composite Score Formula
|
| 21 |
+
```
|
| 22 |
+
composite = (1 - difficulty_weight) * similarity + difficulty_weight * freq_score
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
With default `difficulty_weight = 0.5`:
|
| 26 |
+
```
|
| 27 |
+
composite = 0.5 * similarity + 0.5 * freq_score
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
### Normalization Methods
|
| 31 |
+
|
| 32 |
+
1. **`similarity_range` (default)**: Normalizes similarities to [0,1] before composite calculation
|
| 33 |
+
2. **`composite_zscore`**: Z-score normalization (unbounded, typically -3 to +3)
|
| 34 |
+
3. **`percentile_recentering`**: Boosts scores based on proximity to target percentile (can exceed 1.0)
|
| 35 |
+
|
| 36 |
+
### Configuration
|
| 37 |
+
- `ENABLE_DISTRIBUTION_NORMALIZATION=true` (default)
|
| 38 |
+
- `NORMALIZATION_METHOD=similarity_range` (default)
|
| 39 |
+
|
| 40 |
+
## Trade-offs Analysis
|
| 41 |
+
|
| 42 |
+
### Before Normalization (Original System)
|
| 43 |
+
|
| 44 |
+
#### Advantages β
|
| 45 |
+
1. **Natural semantic relationships preserved**
|
| 46 |
+
- Topics with broader vocabulary naturally had higher similarity ranges
|
| 47 |
+
- Reflected genuine linguistic density differences
|
| 48 |
+
- Authentic representation of semantic space
|
| 49 |
+
|
| 50 |
+
2. **Simpler and more predictable**
|
| 51 |
+
- Straightforward composite score calculation
|
| 52 |
+
- Always bounded to [0,1] naturally
|
| 53 |
+
- No artificial transformations
|
| 54 |
+
|
| 55 |
+
3. **Semantic honesty**
|
| 56 |
+
- Some topics ARE inherently harder to generate crosswords for
|
| 57 |
+
- System reflected this reality rather than masking it
|
| 58 |
+
- Valuable information for both system and users
|
| 59 |
+
|
| 60 |
+
4. **Computational efficiency**
|
| 61 |
+
- No additional normalization calculations
|
| 62 |
+
- Cleaner code path
|
| 63 |
+
|
| 64 |
+
#### Disadvantages β
|
| 65 |
+
1. **Inconsistent difficulty across topics**
|
| 66 |
+
- "Easy" for animals genuinely easier than "Easy" for philosophy
|
| 67 |
+
- Could confuse users expecting uniform difficulty
|
| 68 |
+
|
| 69 |
+
2. **User expectation mismatch**
|
| 70 |
+
- Players might expect same difficulty label = same challenge level
|
| 71 |
+
|
| 72 |
+
### After Normalization (Current System)
|
| 73 |
+
|
| 74 |
+
#### Advantages β
|
| 75 |
+
1. **Consistent difficulty intent**
|
| 76 |
+
- Attempts to make "Easy" equally easy across all topics
|
| 77 |
+
- Meets user expectations for uniform difficulty labels
|
| 78 |
+
|
| 79 |
+
2. **Debug visualization enhancements**
|
| 80 |
+
- Shows normalization effects in debug tab
|
| 81 |
+
- Helpful for analysis and understanding
|
| 82 |
+
|
| 83 |
+
#### Disadvantages β
|
| 84 |
+
1. **Artificial stretching of similarity ranges**
|
| 85 |
+
- Forces sparse topics to use full [0,1] range
|
| 86 |
+
- Genuinely dissimilar words appear artificially similar
|
| 87 |
+
- Loss of semantic authenticity
|
| 88 |
+
|
| 89 |
+
2. **Implementation complexity and bugs**
|
| 90 |
+
- Different methods produce different ranges
|
| 91 |
+
- Z-score normalization is unbounded
|
| 92 |
+
- Percentile recentering can exceed 1.0
|
| 93 |
+
- Softmax sensitivity to inconsistent ranges
|
| 94 |
+
|
| 95 |
+
3. **Loss of valuable information**
|
| 96 |
+
- Masks natural vocabulary density differences
|
| 97 |
+
- Hides genuine topic difficulty characteristics
|
| 98 |
+
- Makes debugging harder (what's "real" vs "normalized"?)
|
| 99 |
+
|
| 100 |
+
4. **Computational overhead**
|
| 101 |
+
- Additional calculations for normalization
|
| 102 |
+
- More complex code paths
|
| 103 |
+
- Potential for numerical issues
|
| 104 |
+
|
| 105 |
+
## Composite Score Ranges
|
| 106 |
+
|
| 107 |
+
### Without Normalization
|
| 108 |
+
- **Theoretical range**: [0, 1]
|
| 109 |
+
- **Practical range**: Depends on actual similarities in the 150-word thematic pool
|
| 110 |
+
- **Example**: If similarities range 0.3-0.7, composite β [0.15, 0.85]
|
| 111 |
+
|
| 112 |
+
### With Normalization
|
| 113 |
+
- **`similarity_range`**: ~[0, 1] (most consistent)
|
| 114 |
+
- **`composite_zscore`**: Unbounded (typically [-3, +3])
|
| 115 |
+
- **`percentile_recentering`**: Can exceed 1.0 due to boosting
|
| 116 |
+
|
| 117 |
+
## Problems with Current Implementation
|
| 118 |
+
|
| 119 |
+
1. **Range inconsistency**: Different normalization methods produce different ranges
|
| 120 |
+
2. **Unbounded z-scores**: Affect softmax probability calculations unpredictably
|
| 121 |
+
3. **Values exceeding [0,1]**: Break assumptions about composite score bounds
|
| 122 |
+
4. **Complexity without clear benefit**: Added complexity for questionable gains
|
| 123 |
+
|
| 124 |
+
## Recommendation
|
| 125 |
+
|
| 126 |
+
### **Revert to Non-Normalized Approach**
|
| 127 |
+
|
| 128 |
+
The original system was **better** for these reasons:
|
| 129 |
+
|
| 130 |
+
1. **The "problem" wasn't really a problem**
|
| 131 |
+
- Different topics having different difficulty distributions is natural and informative
|
| 132 |
+
- Philosophy IS harder to make crosswords for than animals - this is linguistic reality
|
| 133 |
+
|
| 134 |
+
2. **Normalization introduces distortions**
|
| 135 |
+
- Stretching narrow ranges doesn't make words more semantically similar
|
| 136 |
+
- Creates artificial relationships that don't exist
|
| 137 |
+
|
| 138 |
+
3. **Alternative solutions are better**:
|
| 139 |
+
- Show users the natural difficulty of each topic
|
| 140 |
+
- Adjust word count based on topic vocabulary density
|
| 141 |
+
- Provide topic difficulty ratings to set expectations
|
| 142 |
+
- Use adaptive difficulty within topics rather than across them
|
| 143 |
+
|
| 144 |
+
### If Normalization is Kept
|
| 145 |
+
|
| 146 |
+
If normalization must be retained:
|
| 147 |
+
|
| 148 |
+
1. **Make it opt-in, not default**: `ENABLE_DISTRIBUTION_NORMALIZATION=false`
|
| 149 |
+
2. **Fix range consistency**: Ensure all methods produce [0,1] outputs
|
| 150 |
+
3. **Add proper bounds checking**: Clamp scores to [0,1] after normalization
|
| 151 |
+
4. **Document trade-offs clearly**: Let users make informed choices
|
| 152 |
+
|
| 153 |
+
## Proposed Implementation Fixes
|
| 154 |
+
|
| 155 |
+
If keeping normalization, fix these issues:
|
| 156 |
+
|
| 157 |
+
```python
|
| 158 |
+
# After normalization, ensure consistent [0,1] range
|
| 159 |
+
if method == "composite_zscore":
|
| 160 |
+
# Map z-scores to [0,1] using sigmoid
|
| 161 |
+
scores = 1 / (1 + np.exp(-normalized_scores))
|
| 162 |
+
elif method == "percentile_recentering":
|
| 163 |
+
# Clamp boosted scores to valid range
|
| 164 |
+
scores = np.clip(boosted_scores, 0, 1)
|
| 165 |
+
|
| 166 |
+
# Final safety clamp for all methods
|
| 167 |
+
composite_scores = np.clip(composite_scores, 0, 1)
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
+
## Conclusion
|
| 171 |
+
|
| 172 |
+
The **non-normalized approach respects semantic reality** and provides more honest, interpretable results. The "inconsistency" across topics is actually valuable information about linguistic structure, not a bug to be fixed.
|
| 173 |
+
|
| 174 |
+
**Recommendation**: Disable normalization by default (`ENABLE_DISTRIBUTION_NORMALIZATION=false`) and let the natural semantic relationships guide difficulty distribution. This preserves the system's authenticity while maintaining simplicity and predictability.
|
| 175 |
+
|
| 176 |
+
The original system's variation across topics was a **feature representing real linguistic diversity**, not a problem requiring artificial correction.
|
crossword-app/backend-py/src/services/thematic_word_service.py
CHANGED
|
@@ -288,6 +288,13 @@ class ThematicWordService:
|
|
| 288 |
self.difficulty_weight = float(os.getenv("DIFFICULTY_WEIGHT", "0.5"))
|
| 289 |
self.thematic_pool_size = int(os.getenv("THEMATIC_POOL_SIZE", "150"))
|
| 290 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 291 |
# Debug tab configuration
|
| 292 |
self.enable_debug_tab = os.getenv("ENABLE_DEBUG_TAB", "false").lower() == "true"
|
| 293 |
|
|
@@ -359,6 +366,9 @@ class ThematicWordService:
|
|
| 359 |
logger.info(f"π² Softmax selection: {'ENABLED' if self.use_softmax_selection else 'DISABLED'}")
|
| 360 |
if self.use_softmax_selection:
|
| 361 |
logger.info(f"π‘οΈ Similarity temperature: {self.similarity_temperature}")
|
|
|
|
|
|
|
|
|
|
| 362 |
|
| 363 |
async def initialize_async(self):
|
| 364 |
"""Initialize the generator (async version for backend compatibility)."""
|
|
@@ -716,6 +726,85 @@ class ThematicWordService:
|
|
| 716 |
composite = final_alpha * similarity + final_beta * freq_score
|
| 717 |
return composite
|
| 718 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 719 |
def _softmax_with_temperature(self, scores: np.ndarray, temperature: float = 1.0) -> np.ndarray:
|
| 720 |
"""
|
| 721 |
Apply softmax with temperature control to similarity scores.
|
|
@@ -821,6 +910,12 @@ class ThematicWordService:
|
|
| 821 |
|
| 822 |
composite_scores = np.array(composite_scores)
|
| 823 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 824 |
# Log debug information
|
| 825 |
logger.info(f"π Debug: Top 10 composite scores for difficulty={difficulty}:")
|
| 826 |
for info in debug_info:
|
|
@@ -856,7 +951,7 @@ class ThematicWordService:
|
|
| 856 |
# Create probability distribution data for debug visualization
|
| 857 |
prob_distribution = []
|
| 858 |
for i, candidate in enumerate(candidates):
|
| 859 |
-
|
| 860 |
"word": candidate["word"],
|
| 861 |
"probability": float(probabilities[i]),
|
| 862 |
"composite_score": float(composite_scores[i]),
|
|
@@ -865,7 +960,17 @@ class ThematicWordService:
|
|
| 865 |
"similarity": candidate["similarity"],
|
| 866 |
"tier": candidate.get("tier", "unknown"),
|
| 867 |
"percentile": self.word_percentiles.get(candidate["word"].lower(), 0.0)
|
| 868 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 869 |
|
| 870 |
# Sort by probability descending for display
|
| 871 |
prob_distribution.sort(key=lambda x: x["probability"], reverse=True)
|
|
@@ -879,7 +984,9 @@ class ThematicWordService:
|
|
| 879 |
"temperature": temperature,
|
| 880 |
"difficulty": difficulty,
|
| 881 |
"total_candidates": len(candidates),
|
| 882 |
-
"selected_count": len(selected_candidates)
|
|
|
|
|
|
|
| 883 |
}
|
| 884 |
|
| 885 |
return selected_candidates, prob_data
|
|
|
|
| 288 |
self.difficulty_weight = float(os.getenv("DIFFICULTY_WEIGHT", "0.5"))
|
| 289 |
self.thematic_pool_size = int(os.getenv("THEMATIC_POOL_SIZE", "150"))
|
| 290 |
|
| 291 |
+
# Distribution normalization configuration
|
| 292 |
+
# Default: DISABLED based on analysis showing non-normalized approach is better
|
| 293 |
+
# See docs/distribution_normalization_analysis.md for detailed reasoning
|
| 294 |
+
# Preserves natural semantic relationships and avoids artificial distortions
|
| 295 |
+
self.enable_distribution_normalization = os.getenv("ENABLE_DISTRIBUTION_NORMALIZATION", "false").lower() == "true"
|
| 296 |
+
self.normalization_method = os.getenv("NORMALIZATION_METHOD", "similarity_range").lower() # "similarity_range", "composite_zscore", "percentile_recentering"
|
| 297 |
+
|
| 298 |
# Debug tab configuration
|
| 299 |
self.enable_debug_tab = os.getenv("ENABLE_DEBUG_TAB", "false").lower() == "true"
|
| 300 |
|
|
|
|
| 366 |
logger.info(f"π² Softmax selection: {'ENABLED' if self.use_softmax_selection else 'DISABLED'}")
|
| 367 |
if self.use_softmax_selection:
|
| 368 |
logger.info(f"π‘οΈ Similarity temperature: {self.similarity_temperature}")
|
| 369 |
+
logger.info(f"π― Distribution normalization: {'ENABLED' if self.enable_distribution_normalization else 'DISABLED'}")
|
| 370 |
+
if self.enable_distribution_normalization:
|
| 371 |
+
logger.info(f"π§ Normalization method: {self.normalization_method}")
|
| 372 |
|
| 373 |
async def initialize_async(self):
|
| 374 |
"""Initialize the generator (async version for backend compatibility)."""
|
|
|
|
| 726 |
composite = final_alpha * similarity + final_beta * freq_score
|
| 727 |
return composite
|
| 728 |
|
| 729 |
+
def _apply_distribution_normalization(self, composite_scores: np.ndarray, candidates: List[Dict[str, Any]], difficulty: str) -> np.ndarray:
|
| 730 |
+
"""
|
| 731 |
+
Apply distribution normalization to ensure consistent difficulty distributions across topics.
|
| 732 |
+
|
| 733 |
+
This method normalizes the composite score distribution to ensure that the same difficulty level
|
| 734 |
+
produces consistent selection patterns regardless of the topic's inherent semantic similarity range.
|
| 735 |
+
|
| 736 |
+
Args:
|
| 737 |
+
composite_scores: Raw composite scores from similarity + frequency alignment
|
| 738 |
+
candidates: List of candidate word dictionaries
|
| 739 |
+
difficulty: Difficulty level for target percentile calculation
|
| 740 |
+
|
| 741 |
+
Returns:
|
| 742 |
+
Normalized composite scores with consistent distribution shape
|
| 743 |
+
"""
|
| 744 |
+
if len(composite_scores) <= 1:
|
| 745 |
+
return composite_scores
|
| 746 |
+
|
| 747 |
+
method = self.normalization_method.lower()
|
| 748 |
+
|
| 749 |
+
if method == "similarity_range":
|
| 750 |
+
# Method 1: Normalize similarity ranges to [0,1] before composite scoring
|
| 751 |
+
# This ensures all topics use the full similarity spectrum
|
| 752 |
+
similarities = np.array([c['similarity'] for c in candidates])
|
| 753 |
+
if len(similarities) > 1 and np.std(similarities) > 0:
|
| 754 |
+
min_sim, max_sim = np.min(similarities), np.max(similarities)
|
| 755 |
+
if max_sim > min_sim: # Avoid division by zero
|
| 756 |
+
# Recalculate composite scores with normalized similarities
|
| 757 |
+
normalized_scores = []
|
| 758 |
+
for i, candidate in enumerate(candidates):
|
| 759 |
+
normalized_sim = (candidate['similarity'] - min_sim) / (max_sim - min_sim)
|
| 760 |
+
word = candidate['word']
|
| 761 |
+
# Recompute composite score with normalized similarity
|
| 762 |
+
percentile = self.word_percentiles.get(word.lower(), 0.0)
|
| 763 |
+
|
| 764 |
+
# Calculate difficulty alignment score (same as _compute_composite_score)
|
| 765 |
+
if difficulty == "easy":
|
| 766 |
+
freq_score = np.exp(-((percentile - 0.9) ** 2) / (2 * 0.1 ** 2))
|
| 767 |
+
elif difficulty == "hard":
|
| 768 |
+
freq_score = np.exp(-((percentile - 0.2) ** 2) / (2 * 0.15 ** 2))
|
| 769 |
+
else: # medium
|
| 770 |
+
freq_score = 0.5 + 0.5 * np.exp(-((percentile - 0.5) ** 2) / (2 * 0.3 ** 2))
|
| 771 |
+
|
| 772 |
+
# Apply difficulty weight with normalized similarity
|
| 773 |
+
final_alpha = 1.0 - self.difficulty_weight
|
| 774 |
+
final_beta = self.difficulty_weight
|
| 775 |
+
composite = final_alpha * normalized_sim + final_beta * freq_score
|
| 776 |
+
normalized_scores.append(composite)
|
| 777 |
+
|
| 778 |
+
return np.array(normalized_scores)
|
| 779 |
+
|
| 780 |
+
elif method == "composite_zscore":
|
| 781 |
+
# Method 2: Z-score normalization of composite scores
|
| 782 |
+
# Centers distribution at 0 with unit variance
|
| 783 |
+
mean_score = np.mean(composite_scores)
|
| 784 |
+
std_score = np.std(composite_scores)
|
| 785 |
+
if std_score > 0:
|
| 786 |
+
return (composite_scores - mean_score) / std_score
|
| 787 |
+
|
| 788 |
+
elif method == "percentile_recentering":
|
| 789 |
+
# Method 3: Force distribution center to match target percentile
|
| 790 |
+
target_percentiles = {"easy": 0.9, "medium": 0.5, "hard": 0.2}
|
| 791 |
+
target = target_percentiles.get(difficulty, 0.5)
|
| 792 |
+
|
| 793 |
+
# Calculate current probability-weighted percentile center
|
| 794 |
+
percentiles = np.array([self.word_percentiles.get(c['word'].lower(), 0.0) for c in candidates])
|
| 795 |
+
|
| 796 |
+
# Simple linear transformation to center distribution
|
| 797 |
+
current_center = np.mean(percentiles) # Simplified: use mean percentile
|
| 798 |
+
shift = target - current_center
|
| 799 |
+
|
| 800 |
+
# Apply proportional boost to scores based on how close they are to target
|
| 801 |
+
percentile_alignment = np.exp(-((percentiles - target) ** 2) / (2 * 0.2 ** 2))
|
| 802 |
+
boosted_scores = composite_scores * (1 + 0.5 * percentile_alignment)
|
| 803 |
+
return boosted_scores
|
| 804 |
+
|
| 805 |
+
# If no valid method or normalization not needed, return original scores
|
| 806 |
+
return composite_scores
|
| 807 |
+
|
| 808 |
def _softmax_with_temperature(self, scores: np.ndarray, temperature: float = 1.0) -> np.ndarray:
|
| 809 |
"""
|
| 810 |
Apply softmax with temperature control to similarity scores.
|
|
|
|
| 910 |
|
| 911 |
composite_scores = np.array(composite_scores)
|
| 912 |
|
| 913 |
+
# Apply distribution normalization if enabled
|
| 914 |
+
original_composite_scores = composite_scores.copy() # Keep for debug comparison
|
| 915 |
+
if self.enable_distribution_normalization:
|
| 916 |
+
composite_scores = self._apply_distribution_normalization(composite_scores, candidates, difficulty)
|
| 917 |
+
logger.info(f"π― Applied distribution normalization ({self.normalization_method})")
|
| 918 |
+
|
| 919 |
# Log debug information
|
| 920 |
logger.info(f"π Debug: Top 10 composite scores for difficulty={difficulty}:")
|
| 921 |
for info in debug_info:
|
|
|
|
| 951 |
# Create probability distribution data for debug visualization
|
| 952 |
prob_distribution = []
|
| 953 |
for i, candidate in enumerate(candidates):
|
| 954 |
+
prob_item = {
|
| 955 |
"word": candidate["word"],
|
| 956 |
"probability": float(probabilities[i]),
|
| 957 |
"composite_score": float(composite_scores[i]),
|
|
|
|
| 960 |
"similarity": candidate["similarity"],
|
| 961 |
"tier": candidate.get("tier", "unknown"),
|
| 962 |
"percentile": self.word_percentiles.get(candidate["word"].lower(), 0.0)
|
| 963 |
+
}
|
| 964 |
+
|
| 965 |
+
# Add normalization debug data if normalization was applied
|
| 966 |
+
if self.enable_distribution_normalization and 'original_composite_scores' in locals():
|
| 967 |
+
prob_item["original_composite_score"] = float(original_composite_scores[i])
|
| 968 |
+
prob_item["normalization_applied"] = True
|
| 969 |
+
prob_item["normalization_method"] = self.normalization_method
|
| 970 |
+
else:
|
| 971 |
+
prob_item["normalization_applied"] = False
|
| 972 |
+
|
| 973 |
+
prob_distribution.append(prob_item)
|
| 974 |
|
| 975 |
# Sort by probability descending for display
|
| 976 |
prob_distribution.sort(key=lambda x: x["probability"], reverse=True)
|
|
|
|
| 984 |
"temperature": temperature,
|
| 985 |
"difficulty": difficulty,
|
| 986 |
"total_candidates": len(candidates),
|
| 987 |
+
"selected_count": len(selected_candidates),
|
| 988 |
+
"normalization_enabled": self.enable_distribution_normalization,
|
| 989 |
+
"normalization_method": self.normalization_method if self.enable_distribution_normalization else None
|
| 990 |
}
|
| 991 |
|
| 992 |
return selected_candidates, prob_data
|
crossword-app/backend-py/test_distribution_normalization.py
ADDED
|
@@ -0,0 +1,219 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Test script for distribution normalization feature.
|
| 4 |
+
|
| 5 |
+
This script demonstrates how distribution normalization ensures consistent
|
| 6 |
+
difficulty levels across different topics by normalizing similarity ranges
|
| 7 |
+
and standardizing distribution shapes.
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
import os
|
| 11 |
+
import sys
|
| 12 |
+
import numpy as np
|
| 13 |
+
from collections import defaultdict
|
| 14 |
+
|
| 15 |
+
# Add src directory to path
|
| 16 |
+
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'src'))
|
| 17 |
+
|
| 18 |
+
def test_normalization_across_topics():
|
| 19 |
+
"""Test normalization consistency across different topics."""
|
| 20 |
+
print("π§ͺ Testing distribution normalization across topics...")
|
| 21 |
+
|
| 22 |
+
# Set up environment for testing normalization
|
| 23 |
+
os.environ['SIMILARITY_TEMPERATURE'] = '0.7'
|
| 24 |
+
os.environ['USE_SOFTMAX_SELECTION'] = 'true'
|
| 25 |
+
os.environ['DIFFICULTY_WEIGHT'] = '0.3'
|
| 26 |
+
os.environ['ENABLE_DEBUG_TAB'] = 'true'
|
| 27 |
+
|
| 28 |
+
# Test with normalization ENABLED
|
| 29 |
+
os.environ['ENABLE_DISTRIBUTION_NORMALIZATION'] = 'true'
|
| 30 |
+
os.environ['NORMALIZATION_METHOD'] = 'similarity_range'
|
| 31 |
+
|
| 32 |
+
from services.thematic_word_service import ThematicWordService
|
| 33 |
+
|
| 34 |
+
# Create service instance
|
| 35 |
+
service = ThematicWordService()
|
| 36 |
+
service.initialize()
|
| 37 |
+
|
| 38 |
+
# Test topics with expected different similarity ranges
|
| 39 |
+
test_topics = [
|
| 40 |
+
("animals", "Expected high similarity range - many animals in vocabulary"),
|
| 41 |
+
("technology", "Expected medium similarity range - some tech words"),
|
| 42 |
+
("geology", "Expected low similarity range - fewer geology terms"),
|
| 43 |
+
("food", "Expected high similarity range - many food words"),
|
| 44 |
+
("philosophy", "Expected very low similarity range - abstract concepts")
|
| 45 |
+
]
|
| 46 |
+
|
| 47 |
+
difficulty = "medium" # Use medium difficulty for consistent comparison
|
| 48 |
+
num_words = 15
|
| 49 |
+
|
| 50 |
+
print(f"\nπ― Testing normalization for difficulty: {difficulty.upper()}")
|
| 51 |
+
print(f"π Requesting {num_words} words per topic")
|
| 52 |
+
print(f"π§ Normalization: {service.enable_distribution_normalization} ({service.normalization_method})")
|
| 53 |
+
|
| 54 |
+
results = {}
|
| 55 |
+
|
| 56 |
+
for topic, description in test_topics:
|
| 57 |
+
print(f"\nπ Topic: {topic.upper()}")
|
| 58 |
+
print(f" {description}")
|
| 59 |
+
|
| 60 |
+
try:
|
| 61 |
+
# Generate words using crossword-specific method to get debug data
|
| 62 |
+
result = service.find_words_for_crossword([topic], difficulty, num_words)
|
| 63 |
+
words = result["words"]
|
| 64 |
+
debug_data = result.get("debug", {})
|
| 65 |
+
|
| 66 |
+
if debug_data and "probability_distribution" in debug_data:
|
| 67 |
+
prob_data = debug_data["probability_distribution"]
|
| 68 |
+
probabilities = prob_data["probabilities"]
|
| 69 |
+
|
| 70 |
+
# Calculate distribution statistics
|
| 71 |
+
similarities = [p["similarity"] for p in probabilities]
|
| 72 |
+
percentiles = [p["percentile"] for p in probabilities]
|
| 73 |
+
composite_scores = [p["composite_score"] for p in probabilities]
|
| 74 |
+
probs = [p["probability"] for p in probabilities]
|
| 75 |
+
|
| 76 |
+
# Check for normalization data
|
| 77 |
+
has_normalization_data = any(p.get("normalization_applied", False) for p in probabilities)
|
| 78 |
+
original_scores = []
|
| 79 |
+
if has_normalization_data:
|
| 80 |
+
original_scores = [p.get("original_composite_score", p["composite_score"]) for p in probabilities]
|
| 81 |
+
|
| 82 |
+
stats = {
|
| 83 |
+
"topic": topic,
|
| 84 |
+
"word_count": len(words),
|
| 85 |
+
"similarity_range": (min(similarities), max(similarities)),
|
| 86 |
+
"similarity_mean": np.mean(similarities),
|
| 87 |
+
"similarity_std": np.std(similarities),
|
| 88 |
+
"percentile_mean": np.mean(percentiles),
|
| 89 |
+
"percentile_std": np.std(percentiles),
|
| 90 |
+
"composite_mean": np.mean(composite_scores),
|
| 91 |
+
"composite_std": np.std(composite_scores),
|
| 92 |
+
"prob_entropy": -sum(p * np.log(p + 1e-10) for p in probs), # Selection entropy
|
| 93 |
+
"selected_words": [w["word"] for w in words[:5]], # First 5 words
|
| 94 |
+
"normalization_applied": has_normalization_data
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
if original_scores:
|
| 98 |
+
stats["original_composite_mean"] = np.mean(original_scores)
|
| 99 |
+
stats["original_composite_std"] = np.std(original_scores)
|
| 100 |
+
stats["normalization_effect"] = abs(stats["composite_mean"] - stats["original_composite_mean"])
|
| 101 |
+
|
| 102 |
+
results[topic] = stats
|
| 103 |
+
|
| 104 |
+
# Display key statistics
|
| 105 |
+
print(f" β
Generated {len(words)} words")
|
| 106 |
+
print(f" π Similarity range: {stats['similarity_range'][0]:.3f} - {stats['similarity_range'][1]:.3f}")
|
| 107 |
+
print(f" π Similarity meanΒ±std: {stats['similarity_mean']:.3f}Β±{stats['similarity_std']:.3f}")
|
| 108 |
+
print(f" π― Percentile meanΒ±std: {stats['percentile_mean']:.3f}Β±{stats['percentile_std']:.3f}")
|
| 109 |
+
print(f" π’ Composite meanΒ±std: {stats['composite_mean']:.3f}Β±{stats['composite_std']:.3f}")
|
| 110 |
+
if has_normalization_data:
|
| 111 |
+
print(f" π― Normalization applied: Original composite mean was {stats['original_composite_mean']:.3f}")
|
| 112 |
+
print(f" π Normalization effect: {stats['normalization_effect']:.3f} change in mean")
|
| 113 |
+
print(f" π Selected words: {', '.join(stats['selected_words'])}")
|
| 114 |
+
|
| 115 |
+
else:
|
| 116 |
+
print(f" β No debug data available for {topic}")
|
| 117 |
+
|
| 118 |
+
except Exception as e:
|
| 119 |
+
print(f" β Error testing {topic}: {e}")
|
| 120 |
+
continue
|
| 121 |
+
|
| 122 |
+
# Analyze consistency across topics
|
| 123 |
+
if len(results) >= 3:
|
| 124 |
+
print(f"\nπ NORMALIZATION CONSISTENCY ANALYSIS")
|
| 125 |
+
print(f"=" * 60)
|
| 126 |
+
|
| 127 |
+
# Compare similarity ranges (should be more consistent after normalization)
|
| 128 |
+
sim_ranges = [stats['similarity_range'][1] - stats['similarity_range'][0] for stats in results.values()]
|
| 129 |
+
sim_means = [stats['similarity_mean'] for stats in results.values()]
|
| 130 |
+
composite_stds = [stats['composite_std'] for stats in results.values()]
|
| 131 |
+
percentile_means = [stats['percentile_mean'] for stats in results.values()]
|
| 132 |
+
|
| 133 |
+
print(f"π― Similarity Range Consistency:")
|
| 134 |
+
print(f" Range spread: {np.std(sim_ranges):.4f} (lower = more consistent)")
|
| 135 |
+
print(f" Mean variation: {np.std(sim_means):.4f} (lower = more consistent)")
|
| 136 |
+
|
| 137 |
+
print(f"\nπ² Selection Distribution Consistency:")
|
| 138 |
+
print(f" Composite score std variation: {np.std(composite_stds):.4f} (lower = more consistent)")
|
| 139 |
+
print(f" Percentile targeting consistency: {np.std(percentile_means):.4f} (should be near 0.5 for medium)")
|
| 140 |
+
|
| 141 |
+
print(f"\nπ Normalization Effectiveness:")
|
| 142 |
+
if any(stats.get('normalization_applied', False) for stats in results.values()):
|
| 143 |
+
normalization_effects = [stats.get('normalization_effect', 0) for stats in results.values() if stats.get('normalization_effect') is not None]
|
| 144 |
+
if normalization_effects:
|
| 145 |
+
avg_effect = np.mean(normalization_effects)
|
| 146 |
+
print(f" Average normalization effect: {avg_effect:.4f}")
|
| 147 |
+
print(f" Normalization was {'SIGNIFICANT' if avg_effect > 0.05 else 'MINIMAL'}")
|
| 148 |
+
print(" β
Normalization data found in debug output")
|
| 149 |
+
else:
|
| 150 |
+
print(" β οΈ No normalization data found - check ENABLE_DISTRIBUTION_NORMALIZATION")
|
| 151 |
+
|
| 152 |
+
# Ideal targets for medium difficulty
|
| 153 |
+
target_percentile = 0.5
|
| 154 |
+
percentile_deviation = np.mean([abs(pm - target_percentile) for pm in percentile_means])
|
| 155 |
+
print(f"\nπ― Difficulty Targeting Accuracy:")
|
| 156 |
+
print(f" Target percentile (medium): {target_percentile}")
|
| 157 |
+
print(f" Average deviation: {percentile_deviation:.4f}")
|
| 158 |
+
print(f" Targeting accuracy: {'EXCELLENT' if percentile_deviation < 0.05 else 'GOOD' if percentile_deviation < 0.1 else 'NEEDS IMPROVEMENT'}")
|
| 159 |
+
|
| 160 |
+
print(f"\nβ
Distribution normalization test completed!")
|
| 161 |
+
return results
|
| 162 |
+
|
| 163 |
+
def test_normalization_methods():
|
| 164 |
+
"""Test different normalization methods."""
|
| 165 |
+
print(f"\nπ§ͺ Testing different normalization methods...")
|
| 166 |
+
|
| 167 |
+
methods = ["similarity_range", "composite_zscore", "percentile_recentering"]
|
| 168 |
+
topic = "animals" # Use consistent topic
|
| 169 |
+
difficulty = "easy" # Use easy difficulty to see clear effects
|
| 170 |
+
|
| 171 |
+
for method in methods:
|
| 172 |
+
print(f"\nπ§ Testing method: {method.upper()}")
|
| 173 |
+
|
| 174 |
+
os.environ['NORMALIZATION_METHOD'] = method
|
| 175 |
+
|
| 176 |
+
from services.thematic_word_service import ThematicWordService
|
| 177 |
+
|
| 178 |
+
service = ThematicWordService()
|
| 179 |
+
service.initialize()
|
| 180 |
+
|
| 181 |
+
try:
|
| 182 |
+
result = service.find_words_for_crossword([topic], difficulty, 10)
|
| 183 |
+
words = result["words"]
|
| 184 |
+
debug_data = result.get("debug", {})
|
| 185 |
+
|
| 186 |
+
if debug_data and "probability_distribution" in debug_data:
|
| 187 |
+
prob_data = debug_data["probability_distribution"]
|
| 188 |
+
probabilities = prob_data["probabilities"]
|
| 189 |
+
|
| 190 |
+
similarities = [p["similarity"] for p in probabilities]
|
| 191 |
+
percentiles = [p["percentile"] for p in probabilities]
|
| 192 |
+
|
| 193 |
+
print(f" π Similarity range: {min(similarities):.3f} - {max(similarities):.3f}")
|
| 194 |
+
print(f" π― Mean percentile: {np.mean(percentiles):.3f} (target for easy: 0.9)")
|
| 195 |
+
print(f" π Selected words: {', '.join([w['word'] for w in words[:5]])}")
|
| 196 |
+
|
| 197 |
+
if any(p.get("normalization_applied", False) for p in probabilities):
|
| 198 |
+
print(f" β
Normalization applied successfully")
|
| 199 |
+
else:
|
| 200 |
+
print(f" β οΈ Normalization not detected in debug data")
|
| 201 |
+
else:
|
| 202 |
+
print(f" β No debug data available")
|
| 203 |
+
|
| 204 |
+
except Exception as e:
|
| 205 |
+
print(f" β Error with method {method}: {e}")
|
| 206 |
+
|
| 207 |
+
if __name__ == "__main__":
|
| 208 |
+
print("π― Distribution Normalization Test Suite")
|
| 209 |
+
print("=" * 50)
|
| 210 |
+
|
| 211 |
+
test_normalization_across_topics()
|
| 212 |
+
test_normalization_methods()
|
| 213 |
+
|
| 214 |
+
print(f"\nπ All tests completed!")
|
| 215 |
+
print(f"\nπ‘ To see normalization effects in the UI:")
|
| 216 |
+
print(f" 1. Set ENABLE_DISTRIBUTION_NORMALIZATION=true")
|
| 217 |
+
print(f" 2. Set ENABLE_DEBUG_TAB=true")
|
| 218 |
+
print(f" 3. Generate crosswords with different topics at the same difficulty")
|
| 219 |
+
print(f" 4. Check the Debug tab for normalization indicators and tooltips")
|
crossword-app/frontend/src/components/DebugTab.jsx
CHANGED
|
@@ -313,13 +313,21 @@ const DebugTab = ({ debugData }) => {
|
|
| 313 |
},
|
| 314 |
label: function(context) {
|
| 315 |
const item = sortedByPercentile[context.dataIndex];
|
| 316 |
-
|
| 317 |
`Probability: ${(item.probability * 100).toFixed(2)}%`,
|
| 318 |
`Composite Score: ${item.composite_score.toFixed(3)}`,
|
| 319 |
`Similarity: ${item.similarity.toFixed(3)}`,
|
| 320 |
`Percentile: ${(item.percentile * 100).toFixed(1)}%`,
|
| 321 |
`Tier: ${item.tier.replace('tier_', '').replace('_', ' ')}`
|
| 322 |
];
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 323 |
}
|
| 324 |
},
|
| 325 |
backgroundColor: 'rgba(0, 0, 0, 0.8)',
|
|
@@ -394,13 +402,21 @@ const DebugTab = ({ debugData }) => {
|
|
| 394 |
},
|
| 395 |
label: function(context) {
|
| 396 |
const item = sortedByPercentile[context.dataIndex];
|
| 397 |
-
|
| 398 |
`Probability: ${(item.probability * 100).toFixed(2)}%`,
|
| 399 |
`Composite Score: ${item.composite_score.toFixed(3)}`,
|
| 400 |
`Similarity: ${item.similarity.toFixed(3)}`,
|
| 401 |
`Percentile: ${(item.percentile * 100).toFixed(1)}%`,
|
| 402 |
`Tier: ${item.tier.replace('tier_', '').replace('_', ' ')}`
|
| 403 |
];
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 404 |
}
|
| 405 |
},
|
| 406 |
backgroundColor: 'rgba(0, 0, 0, 0.8)',
|
|
@@ -500,6 +516,11 @@ const DebugTab = ({ debugData }) => {
|
|
| 500 |
<div><strong>Top Probability:</strong> {(Math.max(...sortedByPercentile.map(p => p.probability)) * 100).toFixed(1)}%</div>
|
| 501 |
<div><strong>Average:</strong> {((1/probData.total_candidates) * 100).toFixed(1)}%</div>
|
| 502 |
<div><strong>Temperature Effect:</strong> {probData.temperature < 1 ? 'More deterministic' : probData.temperature > 1 ? 'More random' : 'Balanced'}</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 503 |
<div><strong>Mean Position:</strong> Word #{meanWordIndex + 1} ({sortedByPercentile[meanWordIndex]?.word})</div>
|
| 504 |
<div><strong>Distribution Width (Ο):</strong> {sigma.toFixed(1)} words</div>
|
| 505 |
<div><strong>Ο Sampling Zone:</strong> {(sigmaRangeProbMass * 100).toFixed(1)}% of probability mass</div>
|
|
@@ -516,6 +537,9 @@ const DebugTab = ({ debugData }) => {
|
|
| 516 |
frequency percentile (100% β 0%, common β rare). This reveals whether the Gaussian frequency targeting
|
| 517 |
is working correctly for your selected difficulty level. Look for probability peaks at the intended percentile ranges:
|
| 518 |
<strong> Easy (90%+), Medium (50%), Hard (20%)</strong>.
|
|
|
|
|
|
|
|
|
|
| 519 |
</p>
|
| 520 |
</div>
|
| 521 |
|
|
|
|
| 313 |
},
|
| 314 |
label: function(context) {
|
| 315 |
const item = sortedByPercentile[context.dataIndex];
|
| 316 |
+
const labels = [
|
| 317 |
`Probability: ${(item.probability * 100).toFixed(2)}%`,
|
| 318 |
`Composite Score: ${item.composite_score.toFixed(3)}`,
|
| 319 |
`Similarity: ${item.similarity.toFixed(3)}`,
|
| 320 |
`Percentile: ${(item.percentile * 100).toFixed(1)}%`,
|
| 321 |
`Tier: ${item.tier.replace('tier_', '').replace('_', ' ')}`
|
| 322 |
];
|
| 323 |
+
|
| 324 |
+
// Add normalization data if available
|
| 325 |
+
if (item.normalization_applied && item.original_composite_score !== undefined) {
|
| 326 |
+
labels.splice(2, 0, `Original Score: ${item.original_composite_score.toFixed(3)}`);
|
| 327 |
+
labels.splice(3, 0, `π― Normalized: ${item.normalization_method}`);
|
| 328 |
+
}
|
| 329 |
+
|
| 330 |
+
return labels;
|
| 331 |
}
|
| 332 |
},
|
| 333 |
backgroundColor: 'rgba(0, 0, 0, 0.8)',
|
|
|
|
| 402 |
},
|
| 403 |
label: function(context) {
|
| 404 |
const item = sortedByPercentile[context.dataIndex];
|
| 405 |
+
const labels = [
|
| 406 |
`Probability: ${(item.probability * 100).toFixed(2)}%`,
|
| 407 |
`Composite Score: ${item.composite_score.toFixed(3)}`,
|
| 408 |
`Similarity: ${item.similarity.toFixed(3)}`,
|
| 409 |
`Percentile: ${(item.percentile * 100).toFixed(1)}%`,
|
| 410 |
`Tier: ${item.tier.replace('tier_', '').replace('_', ' ')}`
|
| 411 |
];
|
| 412 |
+
|
| 413 |
+
// Add normalization data if available
|
| 414 |
+
if (item.normalization_applied && item.original_composite_score !== undefined) {
|
| 415 |
+
labels.splice(2, 0, `Original Score: ${item.original_composite_score.toFixed(3)}`);
|
| 416 |
+
labels.splice(3, 0, `π― Normalized: ${item.normalization_method}`);
|
| 417 |
+
}
|
| 418 |
+
|
| 419 |
+
return labels;
|
| 420 |
}
|
| 421 |
},
|
| 422 |
backgroundColor: 'rgba(0, 0, 0, 0.8)',
|
|
|
|
| 516 |
<div><strong>Top Probability:</strong> {(Math.max(...sortedByPercentile.map(p => p.probability)) * 100).toFixed(1)}%</div>
|
| 517 |
<div><strong>Average:</strong> {((1/probData.total_candidates) * 100).toFixed(1)}%</div>
|
| 518 |
<div><strong>Temperature Effect:</strong> {probData.temperature < 1 ? 'More deterministic' : probData.temperature > 1 ? 'More random' : 'Balanced'}</div>
|
| 519 |
+
{probData.normalization_enabled && (
|
| 520 |
+
<div style={{backgroundColor: '#e8f5e8', padding: '4px', borderRadius: '4px'}}>
|
| 521 |
+
<strong>π― Distribution Normalization:</strong> ENABLED ({probData.normalization_method})
|
| 522 |
+
</div>
|
| 523 |
+
)}
|
| 524 |
<div><strong>Mean Position:</strong> Word #{meanWordIndex + 1} ({sortedByPercentile[meanWordIndex]?.word})</div>
|
| 525 |
<div><strong>Distribution Width (Ο):</strong> {sigma.toFixed(1)} words</div>
|
| 526 |
<div><strong>Ο Sampling Zone:</strong> {(sigmaRangeProbMass * 100).toFixed(1)}% of probability mass</div>
|
|
|
|
| 537 |
frequency percentile (100% β 0%, common β rare). This reveals whether the Gaussian frequency targeting
|
| 538 |
is working correctly for your selected difficulty level. Look for probability peaks at the intended percentile ranges:
|
| 539 |
<strong> Easy (90%+), Medium (50%), Hard (20%)</strong>.
|
| 540 |
+
{probData.normalization_enabled && (
|
| 541 |
+
<> <strong>π― Distribution normalization is ENABLED</strong> to ensure consistent difficulty across topics.</>
|
| 542 |
+
)}
|
| 543 |
</p>
|
| 544 |
</div>
|
| 545 |
|