Adaptive Threshold Fix for Hugging Face Spaces
Problem
The crossword generator was failing on Hugging Face Spaces with error:
β Not enough words: 3 < 6
β Error generating puzzle: Not enough words generated: 3 < 6
Root Cause
The fixed similarity threshold of WORD_SIMILARITY_THRESHOLD=0.65 was too strict, only allowing 3 words to pass the semantic similarity filter instead of the required minimum of 6.
Solution: Adaptive Threshold Strategy
1. Adaptive Threshold Logic
Instead of a single fixed threshold, the system now tries multiple thresholds in descending order:
thresholds_to_try = [
0.55, # High quality words (default base threshold)
0.50, # Good quality fallback
0.45, # Acceptable quality (minimum threshold)
0.45 # Never go below this
]
The system:
- Starts with high-quality threshold (0.55)
- Falls back to lower thresholds if insufficient words found
- Never goes below 0.45 to maintain semantic relevance
- Stops as soon as enough words are found
2. Enhanced Quality Filters
Topic Relevance Validation
Prevents cross-topic contamination:
# Example: Animals topic rejects tech words
if topic == "Animals" and "computer" in word:
reject_word() # Prevents "COMPUTER" in animal crosswords
# Example: Technology topic rejects animal words
if topic == "Technology" and "elephant" in word:
reject_word() # Prevents "ELEPHANT" in tech crosswords
Quality Filters
- Rejects overly generic words ("word", "thing", "stuff")
- Filters out meta-terms and abstract concepts
- Maintains crossword-appropriate word lengths
3. Environment Configuration
Current HF Spaces Settings
EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
WORD_SIMILARITY_THRESHOLD=0.65 # This can stay - adaptive system handles it
USE_AI_WORDS=true
FALLBACK_TO_STATIC=true
Recommended Additional Settings (Optional)
SEARCH_RANDOMNESS=0.02 # Adds variety to search results
MAX_CACHED_WORDS=150 # Increase cache size
Results Analysis
Before Fix (Fixed Threshold 0.65)
- 120 FAISS search results
- Only 3 words above threshold
- FAILURE: Insufficient words for crossword
After Fix (Adaptive Threshold)
- 120 FAISS search results
- Threshold 0.55: ~6 words (acceptable)
- Threshold 0.50: ~7 words (sufficient)
- SUCCESS: Generates 6+ relevant words
Semantic Quality Maintained
- Threshold never goes below 0.45
- Topic relevance filters prevent unrelated words
- No risk of "mobile phone" words in "animals" crosswords
Implementation Files Modified
src/services/vector_search.py- Added adaptive threshold logic
- Enhanced topic relevance validation
- Improved fallback mechanisms
- Added debugging logs
Environment Variables
WORD_SIMILARITY_THRESHOLDnow sets the base threshold (default 0.55)- System automatically adapts if insufficient words found
Deployment Instructions
For Hugging Face Spaces
Option 1: Keep existing settings
- Current
WORD_SIMILARITY_THRESHOLD=0.65will work - Adaptive system will fall back to 0.55, then 0.50, then 0.45 as needed
Option 2: Optimize for performance
- Change
WORD_SIMILARITY_THRESHOLD=0.55 - Will find sufficient words faster on first try
Testing
The fix has been validated with:
- β Crossword generation tests pass
- β Adaptive threshold logic verified
- β Topic relevance validation confirmed
- β Core algorithm integrity maintained
Expected Outcome
- Hugging Face Spaces: Should now generate 6+ words successfully
- Local Environment: Continues to work as before
- Quality: Maintains semantic relevance while ensuring sufficient words
- Performance: Finds words faster by starting with optimal thresholds